460 likes | 477 Views
Understand boilerplate code issues & solutions in Haskell for code elegance & efficiency. Learn to eliminate repetitive coding patterns effectively.
E N D
Scrap your boilerplate:generic programming in Haskell Ralf Lämmel, Vrije University Simon Peyton Jones, Microsoft Research
The problem: boilerplate code Company Dept “Research” Dept “Production” Manager Manager Dept “Devt” “Bill” £15k “Fred” £10k Dept “Manuf” Employee Find all people in tree and increase their salary by 10% “Fred” £10k
The problem: boilerplate code data Company = C [Dept] data Dept = D Name Manager [SubUnit] data SubUnit = PU Employee | DU Dept data Employee = E Person Salary data Person = P Name Address data Salary = S Float type Manager = Employee type Name = String type Address = String incSal :: Float -> Company -> Company
The problem: boilerplate code incSal :: Float -> Company -> Company incSal k (C ds) = C (map (incD k) ds) incD :: Float -> Dept -> Dept incD k (D n m us) = D n (incE k m) (map (incU k) us) incU :: Float -> SubUnit -> SubUnit incU k (PU e) = incE k e incU k (DU d) = incD k d incE :: Float -> Employee -> Employee incE k (E p s) = E p (incS k s) incS :: Float -> Salary -> Salary incS k (S f) = S (k*f)
Boilerplate is bad • Boilerplate is tedious to write • Boilerplate is fragile: needs to be changed when data type changes (“schema evolution”) • Boilerplate obscures the key bits of code
Getting rid of boilerplate • Use an un-typed language, with a fixed collection of data types • Convert to a universal type and write (untyped) traversals over that • Use “reflection” to query types and traverse child nodes
Getting rid of boilerplate • Generic (aka polytypic) programming: define function by induction over the (structure of the) type of its argument • PhD required. Elegant only for “totally generic” functions (read, show, equality) generic inc<t> :: Float -> t -> t inc<1> k Unit = Unit inc<a+b> k (Inl x) = Inl (inc<a> k x) inc<a+b> k (Inr y) = Inr (inc<b> k y) inc<a*b> k (x, y) = (inc<a> k x, inc<a> k y)
Our solution Generic programming for the rest of us Typed language Works for arbitrary data types: parameterised, mutually recursive, nested... No encoding to/from some other type Very modest language support Elegant application of Haskell's type classes
Our solution incSal :: Float -> Company -> Company incSal k = everywhere (mkT (incS k)) incS :: Float -> Salary -> Salary incS k (S f) = S (k*f)
Two ingredients incSal :: Float -> Company -> Company incSal k = everywhere (mkT (incS k)) incS :: Float -> Salary -> Salary incS k (S f) = S (k*f) 2. Apply a function to every node in the tree 1. Build the function to apply to every node, from incS
Type classes member :: a -> [a] -> Bool member x [] = False member x (y:ys) | x==y = True | otherwise = member x ys No! member is not truly polymorphic: it does not work for any type a, only for those on which equality is defined.
Type classes member :: Eq a => a -> [a] -> Bool member x [] = False member x (y:ys) | x==y = True | otherwise = member x ys The class constraint "Eq a" says that member only works on types that belong to class Eq.
Type classes class Eq a where (==) :: a -> a -> Bool instance Eq Int where (==) i1 i2 = eqInt i1 i2 instance (Eq a) => Eq [a] where (==)[] [] = True (==)(x:xs) (y:ys) = (x == y) && (xs == ys) (==)xs ys = False member :: Eq a => a -> [a] -> Bool member x [] = False member x (y:ys) | x==y = True | otherwise = member x ys
Implementing type classes data Eq a = MkEq (a->a->Bool) eq (MkEq e) = e dEqInt :: Eq Int dEqInt = MkEq eqInt dEqList :: Eq a -> Eq [a] dEqList (MkEq e) = MkEq el where el [] [] = True el (x:xs) (y:ys) = x `e` y && xs `el` ys el xs ys = False member :: Eq a -> a -> [a] -> Bool member d x [] = False member d x (y:ys) | eq d x y = True | otherwise = member d x ys Class witnessed by a “dictionary” of methods Instance declarations create dictionaries Overloaded functions take extra dictionary parameter(s)
Ingredient 1: type extension (mkT f) is a function that • behaves just like f on arguments whose type is compatible with f's, • behaves like the identity function on all other arguments So applying (mkT (incS k)) to all nodes in the tree will do what we want.
Type safe cast cast :: (Typeable a, Typeable b) => a -> Maybe b ghci> (cast 'a') :: Maybe Char Just 'a' ghci> (cast 'a') :: Maybe Bool Nothing ghci> (cast True) :: Maybe Bool Just True
Type extension mkT :: (Typeable a, Typeable b) => (a->a) -> (b->b) mkT f = case cast f of Just g -> g Nothing -> id ghci> (mkT not) True False ghci> (mkT not) 'a' 'a'
Implementing cast An Int, perhaps data TypeRep instance Eq TypeRep mkRep :: String -> [TypeRep] -> TypeRep class Typeable a where typeOf :: a -> TypeRep instance Typeable Int where typeOf i = mkRep "Int" [] Guaranteed not to evaluate its argument
Implementing cast class Typeable a where typeOf :: a -> TypeRep instance (Typeable a, Typeable b) => Typeable (a,b) where typeOf p = mkRep "(,)" [ta,tb] where ta = typeOf (fst p) tb = typeOf (snd p)
Implementing cast cast :: (Typeable a, Typeable b) => a -> Maybe b cast x = r where r = if typeOf x = typeOf (get r) then Just (unsafeCoerce x) else Nothing get :: Maybe a -> a get x = undefined
Implementing cast • In GHC: • Typeable instances are generated automatically by the compiler for any data type • The definition of cast is in a library • Then cast is sound • Bottom line: cast is best thought of as a language extension, but it is an easy one to implement. All the hard work is done by type classes
Two ingredients incSal :: Float -> Company -> Company incSal k = everywhere (mkT (incS k)) incS :: Float -> Salary -> Salary incS k (S f) = S (k*f) 2. Apply a function to every node in the tree 1. Build the function to apply to every node, from incS
Ingredient 2: traversal • Step 1: implement one-layer traversal • Step 2: extend one-layer traversal to recursive traversal of the entire tree
One-layer traversal class Typeable a => Data a where gmapT :: (forall b. Data b => b -> b) -> a -> a instance Data Int where gmapT f x = x instance (Data a,Data b) => Data (a,b) where gmapT f (x,y) = (f x, f y) (gmapT f x) applies f to the IMMEDIATE CHILDREN of x
One-layer traversal class Typeable a => Data a where gmapT :: (forall b. Data b => b -> b) -> a -> a instance (Data a) => Data [a] where gmapT f [] = [] gmapT f (x:xs) = f x : f xs -- !!! gmapT's argument is a polymorphic function; so gmapT has a rank-2 type
Step 2: Now traversals are easy! everywhere :: Data a => (forall b. Data b => b -> b) -> a -> a everywhere f x = f (gmapT (everywhere f) x)
Many different traversals! everywhere, everywhere' :: Data a => (forall b. Data b => b -> b) -> a -> a everywhere f x = f (gmapT (everywhere f) x) -- Bottom up everywhere' f x = gmapT (everywhere' f) (f x)) -- Top down
More perspicuous types everywhere :: Data a => (forall b. Data b => b -> b) -> a -> a everywhere :: (forall b. Data b => b -> b) -> (forall a. Data a => a -> a) type GenericT = forall a. Data a => a -> a everywhere :: GenericT -> GenericT Aha!
What is "really going on"? inc :: Data t => Float -> t -> t • The magic of type classes passes an extra argument to inc that contains: • The function gmapT • The function typeOf • A call of (mkTincS), done at every node in tree, entails a comparison of the TypeRep returned by the passed-in typeOf with a fixed TypeRep for Salary; this is precisely a dynamic type check
Summary so far • Solution consists of: • A little user-written code • Mechanically generated instances for Typeable and Data for each data type • A library of combinators (cast, mkT, everywhere, etc) • Language support: • cast • rank-2 types • Efficiency is so-so (factor of 2-3 with no effort)
Summary so far • Robust to data type evolution • Works easily for weird data types data Rose a = MkR a [Rose a] instance (Data a) => Data (Rose a) where gmapT f (MkR x rs) = MkR (f x) (f rs) data Flip a b = Nil | Cons a (Flip b a) -- Etc...
Generalisations • With this same language support, we can do much more • generic queries • generic monadic operations • generic folds • generic zips (e.g. equality)
Generic queries • Add up the salaries of all the employees in the tree salaryBill :: Company -> Float salaryBill = everything (+) (0 `mkQ` billS) billS :: Salary -> Float billS (S f) = f 2. Apply the function to every node in the tree, and combine results with (+) 1. Build the function to apply to every node, from billS
Type extension again mkQ :: (Typeable a, Typeable b) => d -> (b->d) -> a -> d (d `mkQ` q) a = case cast a of Just b -> q b Nothing -> d ghci> (22 `mkQ` ord) 'a' 97 ghci> (22 `mkQ` ord) True 22 Apply 'q' if its type fits, otherwise return 'd' ord :: Char -> Int
Traversal again class Typeable a => Data a where gmapT :: (forall b. Data b => b -> b) -> a -> a gmapQ :: forall r. (forall b. Data b => b -> r) -> a -> [r] Apply a function to all children of this node, and collect the results in a list
Traversal again class Typeable a => Data a where gmapT :: (forall b. Data b => b -> b) -> a -> a gmapQ :: forall r. (forall b. Data b => b -> r) -> a -> [r] instance Data Int where gmapQ f x = [] instance (Data a,Data b) => Data (a,b) where gmapQ f (x,y) = f x ++ f y
The query traversal everything :: Data a => (r->r->r) -> (forall b. Data b => b -> r) -> a -> r everything k f x = foldl k (f x) (gmapQ (everything f) x) Note that foldr vs foldl is in the traversal, not gmapQ
Looking for one result • By making the result type be (Maybe r), we can find the first (or last) satisfying value [laziness] findDept :: String -> Company -> Maybe Dept findDept s = everything `orElse` (Nothing `mkQ` findD s) findD :: String -> Dept -> Maybe Dept findD s d@(D s' _ _) = if s==s' then Just d else Nothing
Monadic transforms class Typeable a => Data a where gmapT :: (forall b. Data b => b -> b) -> a -> a gmapQ :: forall r. (forall b. Data b => b -> r) -> a -> [r] gmapM :: Monad m => (forall b. Data b => b -> m b) -> a -> m a • Uh oh! Where do we stop?
Where do we stop? • Happily, we can generalise all three gmaps into one data Employee = E Person Salary instance Data Employee where gfoldl k z (E p s) = (z E `k` p) `k` s • We can define gmapT, gmapQ, gmapM in terms of (suitably parameterised) gfoldl • The type of gfoldl hurts the brain (but the definitions are all easy)
Where do we stop? class Typeable a => Data a where gfoldl :: (forall a b. Data a => c (a -> b) -> a -> c b) -> (forall g. g -> c g) -> a -> c a
But we still can't do show! • Want show :: Data a => a -> String show :: Data a => a -> String show t = ??? ++ concat (gmapQ show t) show the children and concatenate the results But how to show the constructor?
Add more to class Data class Data a where toConstr :: a -> Constr data Constr -- abstract conString :: Constr -> String conFixity :: Constr -> Fixity • Very like typeOf :: Typeablea=>a->TypeRepexcept only for data types, not functions
So here is show • Simple refinements to deal with parentheses, infix constructors etc • toConstr on a primitive type (like Int) yields a Constr whose conString displays the value show :: Data a => a -> String show t = conString (toConstr t) ++ concat (gmapQ show t)
Further generic functions • read :: Data a => String -> a • toBin :: Data a => a -> [Bit]fromBin :: Data a => [Bit] -> a • testGen :: Data a => RandomGen -> a class Data a where toConstr :: a -> Constr fromConstr :: Constr -> a dataTypeOf :: a -> DataType data DataType -- Abstract stringCon :: DataType -> String -> Maybe Constr indexCon :: DataType -> Int -> Constr dataTypeCons :: DataType -> [Constr]
Conclusions • “Simple”, elegant • Modest language extensions • Rank-2 types • Auto-generation of Typeable, Data instances Fully implemented in GHC • Shortcomings: • Stop conditions • Types are a bit uninformative Paper: http://research.microsoft.com/~simonpj