söndag 21 juli 2013

Alternatives to subtyping, Part 2

Part two of Alternatives to inheritance (read part one here).

Modules instead of classes

Classes in Java serve a number of different purposes. One is to extend present classes with new code and types to avoid code duplication and reuse work done by earlier developers. The example we saw in the last blog (student, graduate students and undergraduate students) is an example of this. OCaml have classes, too, but we will instead look at how the module system can help us achieve some of these goals. In any case you should learn about OCaml's module language before continuing to the object system.

As mentioned we have a class hierarchy consisting of a base class "student" and two subclasses "graduate student" and "undergraduate student". We will use the module feature to include another module to model this:

(** Same grade type as before *)
type grade = Pass | No_pass

(** For printing later on *)
let string_of_grade = function 
 | Some Pass -> "Pass"
 | Some No_pass -> "No pass"
 | None -> "Not graded"
 
module Student = struct
 type t = {   (* Naming the main type of the module 't' is conventional *)
  name : string;
  tests : int list;
  course_grade : grade option;
 }

 (**  Makes a student
  @param name string
  @return t with name, no tests and no course_grade *)
 let make name = {
  name = name;
  tests = [];
  course_grade = None
 }

 (** Returns a string representation of a student *)
 let to_string s =
  Printf.sprintf "Student: %s; number of tests made: %d; course grade: %s"
   s.name
   (List.length s.tests)
   (string_of_grade s.course_grade)
end

The "student" type is exactly like in the last blog entry, but with the function field removed that computed the grade. We have a "make" function to make an instance of a student ("new" is a reserved word used by the object system), and even a small "to_string" function that does what's exptected from it. So how can we "extend" this type for customized computation (compute the grade)?

Simple. We make a new module and include the old in it:

module GraduateStudent = struct
 include Student
 let compute_course_grade s =
  let tests = s.tests in
  if List.length tests > 0 then
   begin
    let total = List.fold_left (+) 0 tests in
    let grade = match total / List.length tests with
     | q when q >= 80 -> Pass
     | _ -> No_pass
    in
    {s with course_grade = Some grade}
   end
  else
   s
end

module UndergraduateStudent = struct
 include Student
 let compute_course_grade s =
  let tests = s.tests in
  if List.length tests > 0 then
   begin
    let total = List.fold_left (+) 0 tests in
    let grade = match total / List.length tests with
     | q when q >= 70 -> Pass
     | _ -> No_pass
    in
    {s with course_grade = Some grade}
   end
  else
   s
end

Can we say that GraduateStudent "inherited" Student? Yes and no. Some code is reused. E.g., you don't have to recode the make- or to_string-function from the first module. On the other hand there's no subtyping going on here; the type system only knows that the type t from the first module is in the two later modules. For example, if you type

let student1 = GraduateStudent.make "Ronald" and
    student2 = UndergraduateStudent.make "John";;
[student1; student2;];;  (* type = GraduateStudent.t list *)

in the toplevel, the list will have type GraduateStudent.t, disregarding UndergraduateStudent. How funny is that? This means trouble: there's nothing that prevents us from computing the course grade for a undergraduate student using a graduate student type!

let student1 = GraduateStudent.make "Ronald";;
UndergraduateStudent.compute_course_grade student1;;  (* This is not what we want! *)

A way to go around this is to "tag" the types with variants, like this:

type student_type = Graduate Student.student | Undergraduate Student.student

But you still have to keep the bookkeeping yourself; the type system won't help you from mixing things up. So how do we tell the compiler that the graduate and undergraduate students are in fact two different types, and still keep the code reuse? The answer is private type abbriviations. This is a somewhat advanced feature of the module language, so hold on to your hats.

OCaml will make as much as the structure "public" as possible, which, in our case where we're not using signatures, is everything. That's why the type inference system "knows" that GraduateStudent.t is in fact the same type as UndergraduateStudent.t. The way to limit this is with a module signature, which you might already know. The signature is pretty much the interface of the module; in it you can define what will be visible from the outside. For example, we might want to keep the definition of the student for ourselves, so that client code don't pattern match on it and creates students without our admirable make-function:

module SecretStudent : sig 
 type t    (* Only show the name of the type in the signature *)
 val make : string -> t  (* We want this to be shown, too! *)
 val to_string : t -> string (* As this one *)
end = struct
 type t = {   (* Here we declare the type fully *)
  name : string;
  tests : int list;
  course_grade : grade option;
 }

 let make name = {   (* Everything else as usual *)
  name = name;
  tests = [];
  course_grade = None
 }

 (** Returns a string representation of a student *)
 let to_string s = Printf.sprintf "Name: %s, grade: %s" s.name (string_of_grade s.course_grade)
end

As exptected, printing this in the top level and then making a student shows that the student is abstract:

open SecretStudent;;
let s = make "Thorald Dickinson";;
print_endline (to_string s) (* This will work *)
print_endline s.name;;   (* Error: Unbound record field label name *)

Obviously, the type system doesn't allow this. Everything in perfect order.

Returning to our original problem, we will redefine the student type in the module signature, adding the private keyword. Using private is the middle-ground between fully including t from Student, and making it abstract (not visible outside the module). It allows pattern matching, but makes sure the student types in the two modules are treated as different types. Since we're defining our own signatures, we must include the signature of the student module into our new module, using the language construct include module type of Student, where "module type" means the signature of "Student". One more thing: To allow type coersion (explained below) We will explicitly denote that the type t in the included signature should not simply be copy-pasted, but reused. This is done with with type t = Student.t

module PrivateGradStudent : sig
 include module type of Student with type t = Student.t (* Include sig of Student *)
 type t' = private t  (* Private type abbreviation *)
 val make : string -> t'  (* Redifine make to produce t', not t *)
 val to_string : t' -> string (* t' here too *)
 val compute_course_grade : t' -> t' (* and here *)
end = struct
 include Student  (* Include struct of Student *)
 type t' = t   (* t' = Student.t *)
 let compute_course_grade s = (* As before *)
  let tests = s.tests in
  if List.length tests > 0 then
   begin
    let tests = s.tests in
    let total = List.fold_left (+) 0 tests in
    let grade = match total / List.length tests with
     | q when q >= 80 -> Pass
     | _ -> No_pass
    in
    {s with course_grade = Some grade}
   end
  else
   s
end

module PrivateUndergradStudent : sig (* As above *)
 include module type of Student with type t = Student.t
 type t' = private t
 val make : string -> t'
 val to_string : t' -> string
 val compute_course_grade : t' -> t'
end = struct
 include Student
 type t' = t
 let compute_course_grade s =
  let tests = s.tests in
  if List.length tests > 0 then
   begin
    let total = List.fold_left (+) 0 tests in
    let grade = match total / List.length tests with
     | q when q >= 70 -> Pass
     | _ -> No_pass
    in
    {s with course_grade = Some grade}
   end
  else
   s
end

Now constructing two students and putting them in a list raises an error:

let s1 = PrivateGradStudent.make "Foo";;
let s2 = PrivateUndergradStudent.make "Bar";;
[s1; s2];; (* <-- Error: Not the same types *)

It's not possible to compute the grade wrong, either:

PrivateGradStudent.compute_course_grade s2;; (* <-- Error: Wrong type again *)

As before, we can tag the types to include them in one list:

type student_types = 
 | Grad of PrivateGradStudent.student'
 | Undergrad of PrivateUndergradStudent.student'
 
let l = [Grad s1; Undergrad s2];;

To print them we can do something like:

let print_list l = List.iter (fun s -> print_endline s) l;;
let sl = List.map (fun s -> 
 match s with 
  | Grad s -> PrivateGradStudent.to_string s
  | Undergrad s -> PrivateUndergradStudent.to_string s
 ) 
 l;;
print_list sl;;

If we know we won't do anything else than printing these students, there's another trick in the OCaml hat: coersing (or type casting). We can upcast the private t' to its supertype t, like this:

let s1' = (s1 :> Student.t);; (* Coercion *)
let s2' = (s2 :> Student.t);;
let l = [s1'; s2'];; (* This is ok now *)
let sl = List.map (fun s -> Student.to_string s) l;;
print_list sl;;
PrivateGradStudent.compute_course_grade s1';; (* Won't work, s1' is of wrong type! *)

Once you have coerced a value to its supertype, you can't coerce or cast it "back" like you could do in C/Java. Why? Because it's unsafe as hell, that's why. Type safety is honorary in OCaml.

What have we achieved so far? We have added additional behaviour to a base module by including it, and made sure we can coerce an instance of it back to its supertype when we're done with all the specialized work. One inevitably wonders - can I add additional type information this way too? No. You can't modify a defined record type afterwords. It's not "open" in that sence. If you want that, you'd have to use the object system in OCaml. Or make a new type, possibly in the new module, though this more resembles aggregation (have-a) than inheritance (is-a).

Another shortcoming - in an OO sence of reasoning - is the lack of "protected" members. E.g., one might want to factor out the point limit from the two compute_course_grade functions (80 and 70), and put an auxiliary function in the Student module, hidden from outside but visible from modules which include it. This is not possible. Functions are either completely hidden or visible for everyone.

Same thing, but with functors

A functor - in OCaml phrase book - is "a module parameterized with another module", or "a function from module to module". Basically, it's a way to smash two (or more) implementations together, just like the include command. The difference is, with a functor, you will know beforehand what signature the other module will have. This allows for some interesting concepts. Check out this easy example, where the functor Square assumes a number module which could be int, float, complex or whatever we choose:

module type NUMBER = sig  (* Module types (signatures) are conventionally written with capitals *)
 type t    (* This could be anything *)
 val times : t -> t -> t  (* Multiplication *)
end

module Square = 
 functor (N : NUMBER) ->  (* Functor accepts a NUMBER module as input *)
 struct 
  type t = {
   width : N.t; (* Instance of a number *)
   height : N.t;
  }

  let area t = N.times t.width t.height (* Using the times function from N *)
 end

The module that is fed to the functor can be either named or anonymous, like below:

module IntSquare = Square(struct  (* Anonymous struct *)
 type t = int 
 let times i j = i * j 
end) 
module Float = struct    (* Defining a module *)
 type t = float 
 let times i j = i *. j 
end
module FloatSquare = Square(Float)  (* Named module *)

Short example of usage:

open FloatSquare;;
let square = {width = 10.; height = 5.5};;
area square;; (* Returns 55. *)

In the world of students, we want a student functor that accepts another module with a grade computation function. We will customize the student module with another module that carries the special computations.

First, define the computation module signature:

module type GRADE_COMPUTATION = sig
 val compute_course_grade : int list -> grade option (* int list = test results *)
end

Next our functor:

module StudentFunctor = 
 functor (G : GRADE_COMPUTATION) -> struct
  type t = {  (* Same as before *)
   name : string;
   tests : int list;
   course_grade : grade option;
  }
  
  let make name = {name = name; tests = []; course_grade = None}

  let to_string s = 
   Printf.sprintf "Name: %s; grade: %s" s.name (string_of_grade s.course_grade)

  let compute_course_grade s = {s with course_grade = G.compute_course_grade s.tests}
 end

Nothing surprising here, right? Notice how compute_course_grade are using G.compute_course_grade.

I will save us some typing and define a help function for our next example.

let compute_aux limit tests =
 if List.length tests > 0 then
  begin
   let total = List.fold_left (+) 0 tests in
   let grade = match total / List.length tests with
    | q when q >= limit -> Pass  (* limit will be 80 and 70 *)
    | _ -> No_pass
   in
   Some grade
  end
 else
  None

Now we can create our two student types like this:

module GradStudent = StudentFunctor(struct
 let compute_course_grade tests = compute_aux 80 tests
end)

module UndergradStudent = StudentFunctor(struct
 let compute_course_grade tests = compute_aux 70 tests
end)

Easy as one, two, three!

As usual we can make new students:

let s = GradStudent.make "Foo";;
let t = UndergradStudent.make "Bar";;

Printing and computing course grade is exactly the same as above. Note that type coersion is not enabled, you'd have to do some signature trickery to do that.

Ok, that's it. In my next blog, I might cover the actual object system in OCaml and give some brief explanation of the theories behind it. Thanks for reading!

söndag 30 juni 2013

Alternatives to inheritance

I don't know if there's too few OCaml tutorials, or too many (like there ever could be), but when I myself picked up the language, I felt there was good basic tutorials, but then suddenly a big leap towards compiler design, types witnesses, GADTs and what not. So I'll try to fill in the gaps between PhD stuff and basic syntax, and hopyfully teach you something about idiomatic OCaml. Especially, if you're from a Java/C# background, you might be used to think in a very much OOP perspective. In OCaml, you'll primarily use records, variants and modules to solve problems (if you haven't picked up basic language features, check here for a tutorial). The module system in OCaml is in fact both very powerful and very interesting, so it's worth making an effort to learn. Of course there's a fascinating object system in OCaml, too, though I have not used it once in my production code so far.

Caveat: I'm no expert in either Java nor OCaml. Hopyfully, with a little help from the friendly people on #ocaml IRC and stackoverflow, I'll still be able to present good content.

NB: If you want to try the OCaml code in the OCaml toplevel, make sure to write ";;" after each expression.

Alternatives to subtyping

The first example is taken from An introduction to object-oriented programming with Java by C. Thomas Wu. It's ment to demonstrate subtyping and OO polymorphism (not to be confused with OCaml's parametric polymorphism). I'll present a number of ways to solve the problem, and discuss pros and cons among them.

Ok, let's start! Let's check out some Java code first. Below is a superclass Student with two subclasses: GraduateStudent and UndergraduateStudent. The subclasses extends the superclass with a new function: computeCourseGrade, which calculates the mean value of all tests the student has made, and checks if this value is above a certain limit, different for the two student types. If it is, the student grade is marked as "Pass"; otherwise it's marked as "No pass".

1. Java implementation

class Student {
   protected final static int NUM_OF_TESTS = 3;

   protected String name;
   protected int[] tests;
   protected String courseGrade;

}

class GraduateStudent extends Student {

   public void computeCourseGrade() {
      int total = 0;
      for (int i = 0; i < NUM_OF_TESTS; i++) {
         total += tests[i];
      }

      if (total/NUM_OF_TESTS <= 80) {
         courseGrade = "Pass";
      }
      else {
         courseGrade = "No Pass";
      }

   }

}

class UndergraduateStudent extends Student {

   public void computeCourseGrade() {
      int total = 0;
      for (int i = 0; i < NUM_OF_TESTS; i++) {
         total += tests[i];
      }

      if (total/NUM_OF_TESTS <= 70) {
         courseGrade = "Pass";
      }
      else {
         courseGrade = "No Pass";
      }

   }

}


So, what can we say about this implementation? First of all, there's a lot of code duplication: the computeCourseGrade functions are identical except the point limit (80 respectively 70). Maybe we could put the method in the baseclass, and define the point limit as an argument to the constructor? Well, we could, but then we wouldn't have any guarantee at the typelevel that there is in fact two, and only two, types of students. Another alternative is to put the method in the baseclass, declare the class as abstract, and let the subclasses define the point limit in the constructor. That could fix our problems. Let's also fix an enum for the grade instead of that horrible string.

abstract class Student {

   protected final static int NUM_OF_TESTS = 3;

   enum Grade {
      Pass,
      No_pass,
      Not_graded
   }

   protected String name;
   protected int[] tests;
   protected Grade courseGrade;
   protected int pointLimit;

   public void computeCourseGrade() {
      int total = 0;
      for (int i = 0; i < NUM_OF_TESTS; i++) {
         total += tests[i];
      }

      if (total/NUM_OF_TESTS <= pointLimit) {
         courseGrade = Grade.Pass;
      }
      else {
         courseGrade = Grade.No_pass;
      }

   }
}

class GraduateStudent extends Student {
   GraduateStudent (String studentName) {
      name = studentName;
      tests = new int[NUM_OF_TESTS];
      courseGrade = Grade.Not_graded;
      pointLimit = 80;
   }
}

class UndergraduateStudent extends Student {
   UndergraduateStudent (String studentName) {
      name = studentName;
      tests = new int[NUM_OF_TESTS];
      courseGrade = Grade.Not_graded;
      pointLimit = 70;
   }
}


There. Code duplication removed. With our classes at place we can now do stuff like:

roster[0] = new GraduateStudent("Foo");
roster[1] = new UndergraduateStudent("Ronald");
...


And to calculate the grades for all students:

for (int i = 0; i < numberOfStudents; i++) {
 roster[i].computeCourseGrade();
}


Let's transform this into wonderful and perfectly idiomatic OCaml code, shall we?

2. OCaml implementaion 1

Our first try to encode this pattern in OCaml will be a simple one, using only records. Instead of making subtypes - which can be done with the module system (ok, that's not 100% true, but som features of inheritance can be done, e.g including types and functions from another module) or the object system - we will simply pass the grade computation along the other data, as a function field in the record.

First, we use an algebraic datatype (ADT) for the grade (there's no enum construct in OCaml, but instead we use ADTs):

type grade = Pass | No_pass


This is a more typesafe approach than a string, which will assure us that grade is either Pass or No_pass - nothing else. Client code that use this type definition can't mess up, e.g. by mistake writing "Pas" or "no_pass".

Next is our student record type:

type student = {
   name : string;
   tests : int list;
   grade : grade option;
   compute_grade : student -> grade
}


The name field explains itself. Instead of int array for tests, we will use the for OCaml more natural int list. Grade is marked as option so grade will be either "Some grade" or "None", excluding the Not_graded we saw in the second Java example from the grade type definition. The last field, compute_grade, is the type of a function that takes a student as argument and produces a grade. The compute_grade function for graduate student will then look something like this:

(**
 Computes grading for a graduate student.

 @param s student record
 @return grade
*)
let graded_compute s = 
   let tests = s.tests in
   let total = List.fold_left (+) 0 tests in 
   match total / List.length tests with
      | q when q >= 80 -> Pass
      | _ -> No_pass


Ok, let's analyze this. Line 8 is only for convenience. The idiomatic way to sum an int list in OCaml is not using a for-loop, but a fold, so what line 9 is saying is: "sum all the elements in the list tests using function (+), starting from 0". It's possible to concatenate string lists - or indeed any kind of list - in the same way, using function (^) (for strings), starting from empty string "", etc.

Next thing that happens is the match statement - ML's switch-statement on steroids - to branch on the quota. If the quota is equal to or higher than 80, the student passed. Yey! Otherwise, the function returns No_pass. Since this is the last expression in the function, it means this is what the function returns. Notice also how we get the length from the list "tests": "List.length tests". Since "tests" is not an object, we do not write "tests.length" as in Java or Javascript, but instead call the length function from the List module, "List.length", and feed it with a list to get the result.

The undergraduate computation will of course be similar:

(**
 Computes grading for an undergraduate student.

 @param s student record
 @return grade
*)
let undergraded_compute s = 
   let tests = s.tests in
   let total = List.fold_left (+) 0 tests in 
   match total / List.length tests with
      | q when q >= 70 -> Pass
      | _ -> No_pass


Again, the only difference is the limit: 80 and 70. As in the first Java implementation, we have a lot of code duplication. Can't we factor out the point limit as an argument? Yes, we can, and better yet, we can use partial application to redefine graded_compute and undergraded_compute:

let compute_aux limit s = 
   let tests = s.tests in
   let total = List.fold_left (+) 0 tests in 
   match total / List.length tests with
      | q when q >= limit -> Pass
      | _ -> No_pass

let graded_compute = compute_aux 80
let undergraded_compute = compute_aux 70


Both graded_compute and undergraded_compute will now have type "student -> grade", as required by the type student. Neat, huh? Although, and again analogous to the first Java example, we won't have any guarantee or type information that a specific student record carries with it this or that grade computation; only that it is A computation.

In any way, we'll create our student "objects" like this:

let student1 = { 
   name = "Foo Barsson";
   tests = [78; 70; 92];
   course_grade = None;
   compute_grade = graded_compute;
}

let student2 = {
   name = "Ronald Tolkien";
   tests = [65; 32];
   course_grade = None;
   compute_grade = ungraded_compute;
}

let roster = [
   student1;
   student2;
]


Both student1 and student2 are ungraded. To create a list with graded student we use the map function on lists:

let graded_roster =  List.map (fun s -> 
   {s with course_grade = 
      Some (s.compute_grade s)
   }) roster


That's a mouthful! What's happening here is that we're creating a new list from the old one. The anonymous function tells us how each element is modified. In this case, the function returns a new student record "s", identical to the old, but with a course grade computed from the function within the record (s.compute_grade s).

One apparent difference between the Java and OCaml version is the use of state in Java, while in OCaml we instead create a new record instance for each change (we could of course use the mutable keyword to make a changeable record field). I can highly recommend the discussion of state and the cost of imperative programming in chapter three of "The structure and interpretation of computer programs". There's a lot on this topic on Google and Stackoverflow, too. Take a look (e.g. "state vs immutability")!

Next blog entry will look at modules and how modules can be used to replace part of an object-oriented approach.

Regards
Olle