1.32k likes | 1.49k Views
CS51: Abstract Data Types & OCaml Modules. Greg Morrisett. The reality of development. We rarely know the right algorithms or the right data structures when we start a design project.
E N D
CS51: Abstract Data Types & OCamlModules Greg Morrisett
The reality of development • We rarely know the right algorithms or the right data structureswhen we start a design project. • For Moogle, what data structures and algorithms should you use to build the index? To build the query evaluator? • Reality is that we often have to go back and change our code, once we’ve built a prototype. • Often, we don’t even know what the user wants (requirements) until they see a prototype. • Often, we don’t know where the performance problems are until we can run the software on realistic test cases.
Engineering for change We know the software will change, do how can we write the code so that the changes will be easier?
Engineering for change • We know the software will change, do how can we write the code so that the changes will be easier? • One trick: use a high-level language for the prototype. • languages like Python, Perl, Ruby, Ocaml, and Haskell make it relatively easy to throw together a prototype. • rule of thumb: 1 line of these modern high-level languages is worth about 10 lines of C/C++. • many details, like memory management, are handled automatically. • you may get lucky: the performance might be good enough that you never have to drop down to a lower-level. • more importantly, design exploration is easier on small pieces of code.
Engineering for change • We know the software will change, do how can we write the code so that the changes will be easier? • Another trick: steal code. • don’t develop your own libraries if you don’t have to. • e.g., string matching library.
Engineering for change We know the software will change, do how can we write the code so that the changes will be easier? Yet another trick: use data and algorithm abstraction.
Engineering for change • We know the software will change, do how can we write the code so that the changes will be easier? • Yet another trick: use data and algorithm abstraction. • Don’t code in terms of built-in concrete representations. • Design and code with high-level abstractions in mind that fit the problem domain. • Implement the abstractions using a well-defined interface. • This way, you can swap in different implementations for the abstractions. • You can also parallelize the development process.
Example For Moogle, one goal is to build a scalable dictionary (a.k.a. index) that maps words to the set of URLs for the pages on which the words appear. We want the index so that we can efficiently satisfy queries (e.g., all links to pages that contain “Greg” and “Victor”.) Wrong way to think about this: • Aha! A list of pairs of a word and a list of URLs. • We can look up “Greg” and “Victor” in the list to get back a list of URLs.
Moogle example type query = Word of string | And of query * query | Or of query * query let rec eval (q: query) (h: (string * url list) list) : url list = match qwith | Word x -> let (_,urls) = List.find (fun (w,_) -> w = x) in urls | And (q1,q2) -> merge_lists (eval q1 h) (eval q2 h) | Or (q1,q2) -> (eval q1 h) @ (eval q2 h)
Moogle example merge expects to be passed sorted lists. type query = Word of string | And of query * query | Or of query * query let rec eval (q: query) (h: (string * url list) list) : url list = match q with | Word x -> let (_,urls) = List.find (fun (w,_) -> w = x) in urls | And (q1,q2) -> merge_lists (eval q1 h) (eval q2 h) | Or (q1,q2) -> (eval q1 h) @ (eval q2 h)
Moogle example merge expects to be passed sorted lists. Oops! type query = Word of string | And of query * query | Or of query * query let rec eval (q: query) (h: (string * url list) list) : url list = match q with | Word x -> let (_,urls) = List.find (fun (w,_) -> w = x) in urls | And (q1,q2) -> merge_lists (eval q1 h) (eval q2 h) | Or (q1,q2) -> (eval q1 h) @ (eval q2 h)
Moogleexample I find out there’s a better hash-table implementation type query = Word of string | And of query * query | Or of query * query let rec eval (q: query)(h: (string, urllist) hashtable) : url list = match qwith | Word x -> let bucket= Array.get(hash_string x) h in let (_,urls) = List.find x bucket in urls | And (q1,q2) -> merge_lists(eval q1 h) (eval q2 h) | Or (q1,q2) -> (eval q1 h) @ (eval q2 h)
A better way type query = Word of string | And of query * query | Or of query * query let rec eval (q: query) (d: (string, url_set) dictionary) : url_set = match qwith | Word x -> Dict.lookupdx | And (q1,q2) -> Set.intersect (eval q1 d) (eval q2 d) | Or (q1,q2) -> Set.union (eval q1 d) (eval q2 d)
A better way The problem domain talked about an abstract type of dictionaries and sets of URLs. type query = Word of string | And of query * query | Or of query * query let rec eval (q: query) (d: (string, url_set) dictionary) : url_set = match qwith | Word x -> Dict.lookupdx | And (q1,q2) -> Set.intersect (eval q1 d) (eval q2 d) | Or (q1,q2) -> Set.union (eval q1 d) (eval q2 d)
A better way Once we’ve written the client, we know what operations we need on these abstract types. type query = Word of string | And of query * query | Or of query * query let rec eval (q: query) (d: (string, url_set) dictionary) : url_set = match q with | Word x -> Dict.lookup d x | And (q1,q2) -> Set.intersect (eval q1 d) (eval q2 d) | Or (q1,q2) -> Set.union (eval q1 d) (eval q2 d)
A better way So we can define an interface, and send our partner off to implement the abstract types dictionary and set. type query = Word of string | And of query * query | Or of query * query let rec eval (q: query) (d: (string, url_set) dictionary) : url_set = match q with | Word x -> Dict.lookup d x | And (q1,q2) -> Set.intersect (eval q1 d) (eval q2 d) | Or (q1,q2) -> Set.union (eval q1 d) (eval q2 d)
A better way Later on, when we find out linked lists aren’t so good for sets, we can replace them with balanced trees. type query = Word of string | And of query * query | Or of query * query let rec eval (q: query) (d: (string, url_set) dictionary) : url_set = match q with | Word x -> Dict.lookup d x | And (q1,q2) -> Set.intersect (eval q1 d) (eval q2 d) | Or (q1,q2) -> Set.union (eval q1 d) (eval q2 d)
Building abstract types in OCaml • We can use the module system of OCamlto build new abstract data types. • signature: an interface. • specifies the abstract type(s) without specifying their implementation • specifies the set of operations on the abstract types • structure: an implementation. • a collection of type and value definitions • notion of an implementation matching or satisfying an interface • gives rise to a notion of sub-typing • functor: a parameterized module • really, a function from modules to modules • allows us to factor out and re-use modules
Example signature module type INT_STACK = sig type stack valempty : unit -> stack valpush : int -> stack -> stack valis_empty : stack -> bool exceptionEmptyStack valpop : stack -> stack valtop : stack -> int end
Example signature empty and push are abstract constructors: functions that build our abstract type. module type INT_STACK = sig type stack valempty : unit -> stack valpush : int -> stack -> stack valis_empty : stack -> bool exceptionEmptyStack valpop : stack -> stack valtop : stack -> int end
Example signature is_empty is an observer – useful for determining properties of the ADT. module type INT_STACK = sig type stack valempty : unit -> stack valpush : int -> stack -> stack valis_empty : stack -> bool exceptionEmptyStack valpop : stack -> stack valtop : stack -> int end
Example signature pop is sometimes called a “mutator” (though it doesn’t really change the input) module type INT_STACK = sig type stack valempty : unit -> stack valpush : int -> stack -> stack valis_empty : stack -> bool exceptionEmptyStack valpop : stack -> stack valtop : stack -> int end
Example signature top is also an observer, in this functional setting since it doesn’t change the stack. module type INT_STACK = sig type stack valempty : unit -> stack valpush : int -> stack -> stack valis_empty : stack -> bool exceptionEmptyStack valpop : stack -> stack valtop : stack -> int end
A better signature module type INT_STACK = sig type stack (* create an empty stack *) valempty : unit -> stack (* push an element on the top of the stack *) valpush : int -> stack -> stack (* returns true iff the stack is empty *) valis_empty : stack -> bool (* raised on pop or top of empty stack *) exceptionEmptyStack (* pops top element and returns the rest *) valpop : stack -> stack (* returns the top element of the stack *) valtop : stack -> int end
Example structure module ListIntStack : INT_STACK = struct type stack = int list exceptionEmptyStack let empty () : stack = [] let push (i:int) (s:stack) = i::s let is_empty (s:stack) = match s with | [] -> true | _::_ -> false let pop (s:stack) = match s with | [] -> raiseEmptyStack | _::t -> t let top (s:stack) = match s with | [] -> raiseEmptyStack | h::_ -> h end
Example structure Inside the module, we know the concrete type used to implement the abstract type. module ListIntStack : INT_STACK = struct type stack = int list exceptionEmptyStack let empty () : stack = [] let push (i:int) (s:stack) = i::s let is_empty (s:stack) = match s with | [] -> true | _::_ -> false let pop (s:stack) = match s with | [] -> raiseEmptyStack | _::t -> t let top (s:stack) = match s with | [] -> raiseEmptyStack | h::_ -> h end
Example structure But by giving the module the INT_STACK interface, which does not reveal how stacks are being represented, we prevent code outside the module from knowing stacks are lists. module ListIntStack : INT_STACK = struct type stack = int list exceptionEmptyStack let empty () : stack = [] let push (i:int) (s:stack) = i::s let is_empty (s:stack) = match s with | [] -> true | _::_ -> false let pop (s:stack) = match s with | [] -> raiseEmptyStack | _::t -> t let top (s:stack) = match s with | [] -> raiseEmptyStack | h::_ -> h end
An example client Notice that the client is not allowed to know that the stack is a list. module ListIntStack : INT_STACK = struct … end # let s= ListIntStack.empty ();; # let s1 = ListIntStack.push 3 s;; # let s2 = ListIntStack.push 4 s1;; # ListIntStack.tops2 ;; • : int= 4 # open ListIntStack ;; # top (pop s2) ;; • : int= 3 # top (pop (pop s2)) ;; Exception: EmptyStack # List.revs2 ;; Error: This expression has type stack but an expression was expected of type ‘a list.
Example structure Note that when you are debugging, you probably want to comment out the signature ascription so that you can access the contents of the module. module ListIntStack(* : INT_STACK *) = struct type stack = int list exceptionEmptyStack let empty () : stack = [] let push (i:int) (s:stack) = i::s let is_empty (s:stack) = match s with | [] -> true | _::_ -> false let pop (s:stack) = match s with | [] -> raiseEmptyStack | _::t -> t let top (s:stack) = match s with | [] -> raiseEmptyStack | h::_ -> h end
The client without the signature If we don’t seal the module with a signature, the client can know that stacks are lists. module ListIntStack(* : INT_STACK *) = struct … end # let s= ListIntStack.empty ();; # let s1 = ListIntStack.push 3 s;; # let s2 = ListIntStack.push 4 s1;; # ListIntStack.top s2 ;; • : int = 4 # open ListIntStack ;; # top (pop s2) ;; • : int = 3 # top (pop (pop s2)) ;; Exception: EmptyStack # List.rev s2 ;; - : int list = [3; 4]
Example structure When you put the signature on here, you are restricting client access to the information in the signature (which does not reveal that stack = int list.) So clients can only use the stack operations on a stack value (not list operations.) module ListIntStack: INT_STACK = struct type stack = int list exception EmptyStack let empty () : stack = [] let push (i:int) (s:stack) = i::s let is_empty (s:stack) = match swith | [] -> true | _::_ -> false let pop (s:stack) = match swith | [] -> raise EmptyStack | _::t -> t let top (s:stack) = match swith | [] -> raise EmptyStack | h::_ -> h end
A more familiar example moduletype UNSIGNED_BIGNUM = sigtypeubignum valfromInt : int -> ubignum valtoInt : ubignum -> int val plus : ubignum -> ubignum -> ubignum val minus : ubignum -> ubignum -> ubignum val times : ubignum -> ubignum -> ubignum … end
A simple (but slow) implementation module BIGNUM_UNARY : UNSIGNED_BIGNUM = structtypeubignum = Zero | Succofubignum let rec utoInt (u:ubignum) : int = match u with | Zero -> 0 | Succ u’ -> 1 + (toInt u’) let rec plus (u1:ubignum) (u2:ubignum) : ubignum = match u1 with | Zero -> u2 | Succ u1’ -> uplus u1’ (Succ u2) … end
The Abstraction Barrier • It is tempting to break through the abstraction barrier. • I know a bignum is either Zero or Succ • I know a stack is a list of elements • Why don’t we want to reveal this to clients?
A faster implementation module My_UBignum_1000 : UNSIGNED_BIGNUM = struct let base = 1000 typeubignum = int list lettoInt(b:ubignum):int = … let plus(b1:ubignum)(b2:ubignum):ubignum = … letminus(b1:ubignum)(b2:ubignum):ubignum= … let times(b1:ubignum)(b2:ubignum):ubignum= … … end
The Abstraction Barrier • It is tempting to break through the abstraction barrier. • I know a bignum is either Zero or Succ • I know a stack is a list of elements • Why don’t we want to reveal this to clients? • Two things: • might want to swap in a more sophisticated version • e.g., replace unary bignums with our more efficient list version • e.g., replace list-based stacks with growable arrays • e.g., replace association list with a balanced tree • A change would break clients • clients might break our internal representation invariants • e.g., might pass a list of coefficients with leading zeros • e.g., might pass in a zero bignum with the neg flag set • e.g., might pass in coefficients that are larger than the base • Forces us to code defensively
The Abstraction Barrier Rule of thumb: try to use the language mechanisms (e.g., modules, interfaces, etc.) to enforce the abstraction barrier. • reveal as little information about how something is implemented as you can. • provides maximum flexibility for change moving forward. • pays off down the line However, like all design rules, we must be able to recognize when the barrier is causing more trouble than it’s worth and abandon it. • may want to reveal more information for debugging purposes • e.g., may forget to put a function in the interface to print out bignums. • (usually means you have a poor interface…)
Another example: Queues or FIFOs module type QUEUE = sig type ‘a queue valempty : unit -> ‘a queue valenqueue : ‘a -> ‘a queue -> ‘a queue valis_empty : ‘a queue -> bool exception EmptyQueue valdequeue : ‘a queue -> ‘a queue valfront : ‘a queue -> ‘a end
Another example: Queues or FIFOs These queues are re-usable for different element types. module type QUEUE = sig type ‘a queue valempty : unit -> ‘a queue valenqueue : ‘a -> ‘a queue -> ‘a queue valis_empty : ‘a queue -> bool exception EmptyQueue valdequeue : ‘a queue -> ‘a queue valfront : ‘a queue -> ‘a end
Another example: Queues or FIFOs So we parameterize the queue type by a type variable representing the element type (‘a). module type QUEUE = sig type ‘a queue valempty : unit -> ‘a queue valenqueue : ‘a -> ‘a queue -> ‘a queue valis_empty : ‘a queue -> bool exception EmptyQueue valdequeue : ‘a queue -> ‘a queue valfront : ‘a queue -> ‘a end
One implementation module AppendListQueue : QUEUE = struct type ‘a queue = ‘a list let empty () = [] let enqueue (x:’a) (q:’a queue) : ‘a queue = q @ [x] let is_empty (q:’a queue) = match qwith | [] -> true | _::_ -> false exception EmptyQueue let deq (q:’a queue) : (‘a * ‘a queue) = match qwith | [] -> raise EmptyQueue | h::t -> (h,t) let dequeue (q:’a queue) : ‘a queue = snd (deq q) let front (q:’a queue) : ‘a = fst (deq q) end
One implementation Notice deq is a helper function that doesn’t show up in the signature. module AppendListQueue : QUEUE = struct type ‘a queue = ‘a list let empty () = [] let enqueue (x:’a) (q:’a queue) : ‘a queue = q @ [x] let is_empty (q:’a queue) = match q with | [] -> true | _::_ -> false exception EmptyQueue let deq (q:’a queue) : (‘a * ‘a queue) = match q with | [] -> raise EmptyQueue | h::t -> (h,t) let dequeue (q:’a queue) : ‘a queue = snd (deq q) let front (q:’a queue) : ‘a = fst (deq q) end
One implementation That means it is not accessible outside the module. module AppendListQueue : QUEUE = struct type ‘a queue = ‘a list let empty () = [] let enqueue (x:’a) (q:’a queue) : ‘a queue = q @ [x] let is_empty (q:’a queue) = match q with | [] -> true | _::_ -> false exception EmptyQueue let deq (q:’a queue) : (‘a * ‘a queue) = match q with | [] -> raise EmptyQueue | h::t -> (h,t) let dequeue (q:’a queue) : ‘a queue = snd (deq q) let front (q:’a queue) : ‘a = fst (deq q) end
One implementation Notice also that deq runs in constant time. module AppendListQueue : QUEUE = struct type ‘a queue = ‘a list let empty () = [] let enqueue (x:’a) (q:’a queue) : ‘a queue = q @ [x] let is_empty (q:’a queue) = match q with | [] -> true | _::_ -> false exception EmptyQueue let deq (q:’a queue) : (‘a * ‘a queue) = match q with | [] -> raise EmptyQueue | h::t -> (h,t) let dequeue (q:’a queue) : ‘a queue = snd (deq q) let front (q:’a queue) : ‘a = fst (deq q) end
One implementation Whereas append takes time proportional to the length of the queue module AppendListQueue : QUEUE = struct type ‘a queue = ‘a list let empty () = [] let enqueue (x:’a) (q:’a queue) : ‘a queue = q @ [x] let is_empty (q:’a queue) = match q with | [] -> true | _::_ -> false exception EmptyQueue let deq (q:’a queue) : (‘a * ‘a queue) = match q with | [] -> raise EmptyQueue | h::t -> (h,t) let dequeue (q:’a queue) : ‘a queue = snd (deq q) let front (q:’a queue) : ‘a = fst (deq q) end
An alternate implementation module DoubleListQueue : QUEUE = struct type ‘a queue = {front: ’a list; rear: ’a list} let empty () = {front = []; rear = []} let enqueue x q = {front = q.front; rear= x::q.rear} let is_emptyq = match q.front, q.rearwith | [], [] -> true | _, _ -> false exception EmptyQueue let deq (q:’a queue) : ‘a * ‘a queue = match q.frontwith | h::t -> (h, {front=t; rear=q.rear}) | [] -> match List.revq.rearwith | h::t -> (h, {front=t; rear=[]}) | [] -> raise EmptyQueue let dequeue (q:’a queue) : ‘a queue = snd(deqq) let front (q:’a queue) : ‘a = fst(deqq) end
An alternate implementation The worst-case running time for deq is proportional to the length of the rear list. module DoubleListQueue : QUEUE = struct type ‘a queue = {front:’a list; rear:’a list} let empty() = {front=[]; rear=[]} let enqueuexq = {front=q.front; rear=x::q.rear} let is_emptyq = match q.front, q.rearwith | [], [] -> true | _, _ -> false exception EmptyQueue let deq (q: ’a queue) : ‘a * ‘a queue = match q.frontwith | h::t -> (h, {front=t; rear=q.rear}) | [] -> match List.revq.rearwith | h::t -> (h, {front=t; rear=[]}) | [] -> raise EmptyQueue let dequeue (q: ’a queue) : ‘a queue = snd (deq q) let front (q: ’a queue) : ‘a = fst (deqq1) end
An alternate implementation But if we do a series of enqueues and dequeues, then each element will be flipped once from the rear to the front. module DoubleListQueue : QUEUE = struct type ‘a queue = {front:’a list; rear:’a list} let empty() = {front=[]; rear=[]} let enqueuexq = {front=q.front; rear=x::q.rear} let is_emptyq = match q.front, q.rearwith | [], [] -> true | _, _ -> false exception EmptyQueue let deq (q:’a queue) : ‘a * ‘a queue = match q.frontwith | h::t -> (h, {front=t; rear=q.rear}) | [] -> match List.revq.rearwith | h::t -> (h, {front=t; rear=[]}) | [] -> raise EmptyQueue let dequeue (q:’a queue) : ‘a queue = snd(deqq) let front (q:’a queue) : ‘a = fst(deqq) end
An alternate implementation So the amortized cost of processing each element is constant time. module DoubleListQueue : QUEUE = struct type ‘a queue = {front:’a list; rear:’a list} let empty() = {front=[]; rear=[]} let enqueuexq = {front=q.front; rear=x::q.rear} let is_emptyq = match q.front, q.rearwith | [], [] -> true | _, _ -> false exception EmptyQueue let deq (q:’a queue) : ‘a * ‘a queue = match q.frontwith | h::t -> (h, {front=t; rear=q.rear}) | [] -> match List.revq.rearwith | h::t -> (h, {front=t; rear=[]}) | [] -> raise EmptyQueue let dequeue (q:’a queue) : ‘a queue = snd(deqq) let front (q:’a queue) : ‘a = fst(deqq) end
In pictures let q0 = empty ();; {front=[];rear=[]} let q1 = enqueue 3 q0;; {front=[];rear=[3]} let q2 = enqueue 4 q1 ;; {front=[];rear=[4;3]} let q3 = enqueue 5 q2 ;; {front=[];rear=[5;4;3]} let q4 = dequeue q3 ;; {front=[4;5];rear=[]} let q5 = dequeue q4 ;; {front=[5];rear=[]} let q6 = enqueue 6 q5 ;; {front=[5];rear=[6]} let q7 = enqueue 7 q6 ;; {front=[5];rear=[7;6]}