660 likes | 814 Views
I get an eerie, empty feeling around you…. Next week in CS 60. NullPointerException s?. Quantity Tokenizer Parser Evaluator. Implementing a small, special-purpose programming language: Unilang. Homework 7. classes you'll implement. Recursive Descent.
E N D
I get an eerie, empty feeling around you… Next week in CS 60 NullPointerExceptions? Quantity Tokenizer Parser Evaluator Implementing a small, special-purpose programming language: Unilang • Homework 7 classes you'll implement • Recursive Descent a decent foundation for our language the adventure continues!
class Quantity { private double m; // the multiplier private OpenList N; // the numerator private OpenList D; // the denominator private double u; // the uncertainty public Quantity(double m, OpenList N, OpenList D, double u) { this.m = m; this.N = N; this.D = D; this.u = u; } public double getm() { return this.m; } public OpenList getN() { return this.N; } public OpenList getD() { return this.D; } public double getu() { return this.u; } public String toString() { return "{" + this.m + "," + this.N + "," + this.D + "," + this.u + "}"; } Quantity Unicalc, but in Java "constructor" - method for creating a new object "getters" - methods for obtaining data members without making the data itself public { 42.0, ["m"], ["s", "s"] } Java's quantity object: Brace yourself for this… ( 42.0 ("m") ("s" "s") ) Scheme's quantity list:
class Quantity { // data members m, N, D, u // constructor: Quantity( m, N, D, u ) // getters: getm(), getN(), getD(), getu() public static Quantity simplify(Quantity Q) … public static Quantity multiply(Quantity Q1, Quantity Q2) … public static Quantity divide(Quantity Q1, Quantity Q2) … public static Quantity norm_unit(String unit, OpenList DB) … public static Quantity norm(Quantity Q, OpenList DB) … public static Quantity add(Quantity Q1, Quantity Q2, OpenList DB) … public static Quantity negate(Quantity Q) … Quantity Unicalc, but in Java "porting code" try this last one…
A unicalc language % java Evaluator def yard 3 foot { 3.0, [foot], [] } def furlong 0.125 mile { 0.125, [mile], [] } def mile 5280 foot { 5280.0, [foot], [] } 1 foot + 2 yard { 7.0, [foot], [] } def hour 3600 second { 3600.0, [second], [] } def day 24 hour { 24.0, [hour], [] } # 1 mile / day ??? An example read-eval-print loop (REPL) A compiled, working version is available to try out… Use control-d to quit. What are the pieces? decompose this…
Behold! minimath Dr. Racket Dr. Prolog Dr. Java … and Dr. Evil!
% java Evaluator 5 + 4 9.0 5*5 25.0 5 + 9 * 3 32.0 (5 + 9)*3 42.0 x = 7 7.0 x * 6 42.0 Minimath! example type Input? Output? Challenges? These last two features aren't so mini!
Implementing a language "(5 + 9) * 3" Stringinput Tokenizer String[]tokens ( 5 + 9 ) * 3 Parser * OpenListparseTree + 3 5 9 Evaluator Quantityresult 42.0
Implementing a language "(5 + 9) * 3" Stringinput Tokenizer String[]tokens ( 5 + 9 ) * 3 Parser * OpenListparseTree + 3 5 9 Evaluator Quantityresult 42.0
Background:++ In Java, C, C++, and other languages, there are shortcuts for adding or subtracting 1: int x = 41; int y = ++x; // x is now 42 // y is now 42 int x = 41; int y = x++; // x is now 42 // y is now __ preincrement postincrement
Tokenizing ? Name(s) _______________ or, Java is just crazy… public static void main(String[] args) { int x = 32; int y = 9; int z; z = x+++y; System.out.println("x is " + x); System.out.println("y is " + y); System.out.println("z is " + z); } This is legal! What will be printed?
Tokenizing ? public static void main(String[] args) { int x = 32; int y = 9; int z; z = x++++y; System.out.println(“x is ” + x); System.out.println(“y is ” + y); System.out.println(“z is ” + z); } How about this?
Tokenizing ? public static void main(String[] args) { int x = 32; int y = 9; int z; z = x++-+y; System.out.println(“x is ” + x); System.out.println(“y is ” + y); System.out.println(“z is ” + z); } this one?
Tokenizing ? public static void main(String[] args) { int x = 32; int y = 9; int z; z = x++-+-+-+-+-+-+-+-+-+-+-++y; System.out.println(“x is ” + x); System.out.println(“y is ” + y); System.out.println(“z is ” + z); } shouldthis be OK ?!?!?!?
Tokenizing ~ string power class Tokenizer { private Scanner inputStreamScanner; // gets input public String[] nextTokens() { try { String line = inputStreamScanner.nextLine(); line = line.replaceAll( "\\+", " + " ); line = line.replaceAll( "\\*", " * " ); String[] tokens = line.trim().split("\\s+"); return tokens; } catch (Exception e) { // if the end of the file is reached return null; } } Never underestimate the power of Strings! splits into tokens based on whitespece Java's error-handling mechanism
Java's API (String class) application programming interface
Java's API (String class) application programming interface
Java's regular expressions String s = "regex"; s.split( "g" ) s.split( "e" ) s.replaceAll( "eg", "ol" ) { "re", "ex" } Tons of special stuff: + means one or more *means zero or more |means or
What's wrong here? and what's a fix? class Tokenizer { private Scanner inputStreamScanner; // gets input public String[] nextTokens() { try { String line = inputStreamScanner.nextLine(); line = line.replaceAll( "\\+", " + " ); line = line.replaceAll( "\\*", " * " ); String[] tokens = line.trim().split("\\s+"); return tokens; } catch (Exception e) { // if the end of the file is reached return null; } } splits into tokens based on whitespece Java's error-handling mechanism
Minimath's tokenizer class Tokenizer { private Scanner inputStreamScanner; // gets input public String[] nextTokens() { try { String line = inputStreamScanner.nextLine(); line = line.replaceAll( "\\+", " + " ); line = line.replaceAll( "\\*", " * " ); String[] tokens = line.trim().split("\\s+"); return tokens; } catch (Exception e) { // if the end of the file is reached return null; } } Strings can never be replaced! What are these doing? splits into tokens based on whitespece
Java's vs. Python's tokenizing while ( true ) { print( "CS 60 4ever!" ); } while ( true ) { print ( " CS 60 4ever! " ) ; } while true: print "CS 60 4ever!" while true : \n \t print " CS 60 4ever! " And to think that I'm called spacey!
Tokens don't grow on trees… but trees grow from them! "z = x+++y" String < z, =, x, ++, +, y > tokens parse tree parse trees capture the structure of the tokens and define their meaning
Implementing a language Stringinput Tokenizer String[]tokens From list of tokens to a structured parse tree Parser OpenListparseTree Evaluator Quantityresult
Parsing is understanding a sentence... The chef gave her cat food. ambiguity helps illuminate this!
Parsing is understanding a sentence... The chef gave her cat food. chef gave food chef gave food cat cat her her It's our grammar that provides these structures…
Grammars Grammar: A set of rules that determines legal expressions in a language. start here of single-digit sums Example: S V S V + V 0 1 … 9 What do these two production rules mean?
Grammars Grammar: the starting rule or “production” S V S V + S = “I want to be a sum.” V = “I want to be a value.” "sum" V 0 1 … 9 "value" nonterminal or auxiliary symbol "or" in the grammar, not the language each nonterminal symbol represents a “syntactic category” terminal symbols -- things that appear in the expression to be parsed alphabet = { +, 0, 1, …, 9 }
Grammars + Parsing A parser takes in tokens and outputs a parse tree, according to some grammar! Grammar: start here S V S V + S "sum" V 0 1 … 9 "value" S = “I want to be a sum.” V = “I want to be a value.” tokens are leaves Token list: <"4","+","5"> Input string: "4 + 5"
Grammars + Parsing A parser takes in tokens and outputs a parse tree, according to some grammar! Grammar: Parse tree:[+, [4], [5]] S V S V + "sum" A parse tree is the structure of an expression that determines its meaning! S V 0 1 … 9 S "value" V + V 4 S = “I want to be a sum.” V = “I want to be a value.” 5 tokens are leaves Token list: <"4","+","5"> Input string: "4 + 5"
A grammar defines all legal sentences start here S V S V S V + - S V 0 1 … 9 The grammar decides how things get parsed and thus what they mean! 8 - 4 + 5 ? What is the parse tree for tokens are leaves and written as an OpenList?
Root of the parse tree Try it! Keep this page… S Starting symbol S P S P S P - + "sum" P V P V / P V * "product" V 0 1 … 9 "value" Above, create the parse tree for the expression 1 7 * 3 + 9 / 2 What does parse tree evaluate to? expression to be parsed How could you change the grammarso that addition and subtraction had higher precedence than multiplication and division? And why might that be good? 2 What rule could you addthat would enable the use of parentheses ( and ) for grouping terms? 3
S P S P + "sum" P N P N * "product" N any Double "number" Recursive Descent The miniMath grammar: The parser's methods: S() P() N() What's output? • grammar nonterminals each have their own methods Recursive- descent parsing: • the parser object keeps track of a token list (an array) throughout the creation of the tree
S P S P + "sum" P N P N * "product" N any Double "number" Recursive Descent The miniMath grammar: Our parser's methods: It's all in the file Parser.java S() P() (We'll look @ this Monday) N() What's output? • grammar nonterminals each have their own methods Recursive- descent parsing: • the parser object keeps track of a token list (an array) throughout the creation of the tree
/th sc y/ e e Tokening + Parsing everywhere tokens expression z = x+++y; code inumbrage atonementionlyingravenspecksoftheartsseekindimagesinamementogodigatherscarletnotestendangerslaysherelapses text images speech
inumbrage atonementionlyingravenspecksoftheartsseekindimagesinamementogodigatherscarletnotestendangerslaysherelapses parting thoughts… Re-parsable poetry?! inumbrage inumbrage atonementionlyingravenspecksoftheartsseekindimagesinamementogodigatherscarletnotestendangerslaysherelapses atonementionlyingravenspecksoftheartsseekindimagesinamementogodigatherscarletnotestendangerslaysherelapses - Mike Maguire
mondegreens • American author Sylvia Wright (1954): • When I was a child, my mother used to read • aloud to me… One of my favorite poems began, • as I remember: • Ye Highlands and ye Lowlands, • Oh, where hae ye been? • They have slain the Earl of Murray, • And Lady Mondegreen. • The actual line is And laid him on the green. • Wright gives other examples of her own mondegreens: • Surely, Good Mrs. Murphy shall follow me all the days of my life • ("Surely goodness and mercy…" from the 23rd Psalm) • There's a bathroom on the right… • 'Scuse me while I kiss this guy… • It's hard to wreck a nice beach! Actually, it's not that difficult…
I get an eerie, empty feeling around you… Next week in CS 60 NullPointerExceptions? Quantity Tokenizer Parser Evaluator Implementing a small, special-purpose programming language: Unilang • Homework 7 classes you'll implement public static void fun(OpenList L) { L = L.cons("Hooray!"); } OpenList L = OpenList.emptyList; fun(L); System.out.println("L is " + L); Java fun … in main … • Recursive Descent a decent foundation for our language
[ +, [*, [7], [3]], [/, [9], [2]] ] Parse tree as an OpenList: S Starting symbol S P S P S P S P + - + "sum" P V P V P V P V P * / * "product" V P V / V 0 1 … 9 "value" V Above, create the parse tree for the expression 1 7 * 3 + 9 / 2 What would the parse tree evaluate to? 21 + 4 == 25 expression to be parsed How could you change the grammarso that addition and subtraction had higher precedence than multiplication and division? And why might that be good? Closer to the starting symbol 2 -- implies -- lower precedence change the product's Vs to Gs and then add What rule could you addthat would enable the use of parentheses(and)for grouping terms? 3 ( ) G S V
Root of the parse tree Quiz S Name(s): ____________________ Starting symbol S P S P S P + - "sum" P V P V P V * / "product" V 0 1 … 9 "value" Above, create the parse tree for the expression 1 7 * 3 + 9 / 2 What would the parse tree evaluate to? expression to be parsed How could you change the grammarso that addition and subtraction had higher precedence than multiplication and division? And why might that be good? 2 What rule could you addthat would enable the use of parentheses(and)for grouping terms? 3
Wrappers Each primitive data type has a class wrapper I think you mean classy, right? Double double double d1 = 60.0; // regular stuff Double D1 = new Double(d); // "wrapping" Double D2 = new Double("42.0"); // parsing! double d2 = D2; Object ob = d2; if (ob instanceof Double) … // true primitive class type int boolean char float byte long short Integer Boolean Character Float Byte Long Short An object of type Double will wrap a primitive double for storage in lists or other containers as a Java Object.
The question! What is Java thinking? association list, i.e., our database DB ~ [ [ "mile", [ 5280 ["foot"] [ ] ] ], [ "foot", [ 12 ["inch"] [ ] ] ] ] public static OpenList assoc(Object o, OpenList DB) { if (DB.isEmpty()) return OpenList.emptyList; OpenList firstOfDB = DB.first; String firstOfFirst = firstOfDB.first; if (firstOfFirst.equals(o)) return firstOfDB; return OpenList.assoc(o,DB.rest); } Why will Java complain here!? Hint ~ what does the compiler think an OpenList contains?
The answer… Casting for answers! public static OpenList assoc(Object o, OpenList DB) { if (DB.isEmpty()) return OpenList.emptyList; OpenList firstOfDB = (OpenList)(DB.first); String firstOfFirst = (String)(firstOfDB.first); if (firstOfFirst.equals(o)) return firstOfDB; return OpenList.assoc(o,DB.rest); } Caststell the compiler that you know better! What if you don't know better ?! Why won't Java complain here!?
S P S P Recursive Descent + "sum" P N P N * "product" Recursive descent parsing: N • each nonterminal has its own function any Double "number" • the parser needs to maintain a list (or array) of tokens that still need to be handled String "12+15 * 2" Token List < 12, +, 15, *, 2 > Parse Tree [+, [12], [*, [15], [2]]] public OpenList S() public OpenList P() public OpenList N()
A grammar for the Unicalc language S defV E |# E |E E P + E | P - E | P P K * P | K / P | K K U ^ I | U U ( E )|( - E ) | Q Q A V* | V+ A D ~ D | D I a numeric value (int) D a numeric value (double) V a variable name (string) Parse Trees blue: OpenList purple: Quantity ["def" V E] ["#" E] E ["+" P E] ["-" P E] P ["*" K P] ["/" K P] K ["^" U I] U E ["-" E] Q [ { D [V*] D } ] [ {1.0 [V+] 0.0} ] [D D] [D]