310 likes | 324 Views
Learn about Type Inference Algorithm for programming, unification problem, examples, and regular expressions explained in OCamllex. Dive into BNF derivations and grammars to enhance programming skills.
E N D
Programming Languages and Compilers (CS 421) Munawar Hafiz 2219 SC, UIUC http://www.cs.illinois.edu/class/cs421/ Based in part on slides by Mattox Beckman, as updated by Vikram Adve and Gul Agha
Type Inference - Example • Eliminate : [f : ; x : ] |- f : [f : ; x : ] |- x : [f : ; x : ] |- (f x) : [x : ] |- (fun f -> f x) : [ ] |- (fun x -> fun f -> f x) : • (); ( ); ( );
Type Inference Algorithm Let has_type (, e, ) = S • is a typing environment • e is an expression • is a (generalized) type, • S is a set of equations between generalized types • Idea: S is the constraints on type variables necessary for |- e : • LetUnif(S) be a substitution of generalized types for type variables solving S • Solution: Unif(S)() |- e : Unif(S)()
Type Inference Algorithm has_type (, exp, ) = • Case exp of • Var v --> return { (v)} • Const c --> return { } where |- c : by the constant rules • fun x -> e --> • Let , be fresh variables • Let S = has_type ([x: ] + , e, ) • Return { } S
Type Inference Algorithm (cont) • Case exp of • App (e1e2) --> • Let be a fresh variable • Let S1 = has_type(, e1, ) • Let S2 = has_type(, e2, ) • Return S1 S2
Type Inference Algorithm (cont) • Case exp of • If e1 then e2 else e3 --> • Let S1 = has_type(, e1, bool) • Let S2 = has_type(, e2, ) • Let S2 = has_type(, e2, ) • Return S1 S2 S3
Unification Problem Given a set of pairs of terms (“equations”) {(s1, t1), (s2, t2), …, (sn, tn)} (theunification problem) does there exist a substitution (the unification solution) of terms for variables such that (si) = (ti), for all i = 1, …, n?
Unification Algorithm • Let S = {(s1, t1), (s2, t2), …, (sn, tn)} be a unification problem. • Case S = { }: Unif(S) = Identity function (ie no substitution) • Case S = {(s, t)} S’): Four main steps
Unification Algorithm • Delete: if s = t (they are the same term) then Unif(S) = Unif(S’) • Decompose: if s = f(q1, … , qm) and t =f(r1, … , rm) (same f, same m!), then Unif(S) = Unif({(q1, r1), …, (qm, rm)} S’) • Orient: if t = x is a variable, and s is not a variable, Unif(S) = Unif ({(x,s)} S’)
Unification Algorithm • Eliminate: if s = x is a variable, and x does not occur in t (the occurs check), then • Let = x | t • Let = Unif((S’)) • Unif(S) = {x | (t)} o • Note: {x | a} o {y | b} = {y | ({x | a}(b)} o {x | a} if y not in a
Example S = {(f(x), f(g(y,z))), (g(y,f(y)),x)} Solved by {x | g(y,f(y))} o {(z | f(y))} f(g(y,f(y))) = f(g(y,f(y))) x z and g(y,f(y)) = g(y,f(y)) x
Example of Failure • S = {(f(x,g(y)), f(h(y),x))} • Decompose • S -> {(x,h(y)), (g(y),x)} • Orient • S -> {(x,h(y)), (x,g(y))} • Substitute • S -> {(h(y), g(y))} with {x | h(y)} • No rule to apply! Decompose fails!
Example Regular Expressions • (01)*1 • The set of all strings of 0’s and 1’s ending in 1, {1, 01, 11,…} • a*b(a*) • The set of all strings of a’s and b’s with exactly one b • ((01) (10))* • You tell me • Regular expressions (equivalently, regular grammars) important for lexing, breaking strings into recognized words
Start State Example FSA 1 0 1 Final State 0 0 1 1 Final State 0
Ocamllex Regular Expression • Single quoted characters for letters: ‘a’ • _: (underscore) matches any letter • Eof: special “end_of_file” marker • Concatenation same as usual • “string”: concatenation of sequence of characters • e1 | e2: choice - what was e1 e2
Ocamllex Regular Expression • [c1 - c2]: choice of any character between first and second inclusive, as determined by character codes • [^c1 - c2]: choice of any character NOT in set • e*: same as before • e+: same as e e* • e?: option - was e1
Ocamllex Regular Expression • e1 # e2: the characters in e1 but not in e2; e1 and e2 must describe just sets of characters • ident: abbreviation for earlier reg exp in let ident = regexp • e1 as id: binds the result of e1 to id to be used in the associated action
Sample Grammar • Language: Parenthesized sums of 0’s and 1’s • <Sum> ::= 0 • <Sum >::= 1 • <Sum> ::= <Sum> + <Sum> • <Sum> ::= (<Sum>)
BNF Derivations • Pick a rule and substitute: • <Sum> ::= <Sum> + <Sum> <Sum> => <Sum> + <Sum >
Example cont. • 1 * 1 + 0: <exp> <factor> <bin> * <exp> 1 <factor> + <factor> <bin> <bin> 1 0 Fringe of tree is string generated by grammar
Example: Ambiguous Grammar • 0 + 1 + 0 <Sum> <Sum> <Sum> + <Sum> <Sum> + <Sum> <Sum> + <Sum> 0 0 <Sum> + <Sum> 0 1 1 0
Two Major Sources of Ambiguity • Lack of determination of operator precedence • Lack of determination of operator assoicativity • Not the only sources of ambiguity
How to Enforce Associativity • Have at most one recursive call per production • When two or more recursive calls would be natural leave right-most one for right assoicativity, left-most one for left assoiciativity
Example • <Sum> ::= 0 | 1 | <Sum> + <Sum> | (<Sum>) • Becomes • <Sum> ::= <Num> | <Num> + <Sum> • <Num> ::= 0 | 1 | (<Sum>)
Operator Precedence • Operators of highest precedence evaluated first (bind more tightly). • Precedence for infix binary operators given in following table • Needs to be reflected in grammar
Predence in Grammar • Higher precedence translates to longer derivation chain • Example: <exp> ::= <id> | <exp> + <exp> | <exp> * <exp> • Becomes <exp> ::= <mult_exp> | <exp> + <mult_exp> <mult_exp> ::= <id> | <mult_exp> * <id>
Problems for Recursive-Descent Parsing • Left Recursion: A ::= Aw translates to a subroutine that loops forever • Indirect Left Recursion: A ::= Bw B ::= Av causes the same problem
Problems for Recursive-Descent Parsing • Parser must always be able to choose the next action based only only the next very next token • Pairwise Disjointedness Test: Can we always determine which rule (in the non-extended BNF) to choose based on just the first token
Pairwise Disjointedness Test • For each rule A ::= y Calculate FIRST (y) = {a | y =>* aw} { | if y =>* } • For each pair of rules A ::= y and A ::= z, require FIRST(y) FIRST(z) = { }
Factoring Grammar • Test too strong: Can’t handle <expr> ::= <term> [ ( + | - ) <expr> ] • Answer: Add new non-terminal and replace above rules by <expr> ::= <term><e> <e> ::= + <term><e> <e> ::= • You are delaying the decision point
Both <A> and <B> have problems: <S> ::= <A> a <B> b <A> ::= <A> b | b <B> ::= a <B> | a Transform grammar to: <S> ::= <A> a <B> b <A> ::-= b<A1> <A1> :: b<A1> | <B> ::= a<B1> <B1> ::= a<B1> | Example