120 likes | 236 Views
News blurb o’ the day. Allied armed forces in Iraq using machine translation+AIM to communicate Many possible MT techniques; some based on Bayesian statistical techniques Ex: see “le chat noire” <-> “the black cat”; estimate Pr[“black cat”|“chat noire”]
E N D
News blurb o’ the day • Allied armed forces in Iraq using machine translation+AIM to communicate • Many possible MT techniques; some based on Bayesian statistical techniques • Ex: see “le chat noire” <-> “the black cat”; estimate Pr[“black cat”|“chat noire”] • When you see “chat” next, estimate max probability word to associate with it • Much more difficult than your spam filters -- need to handle entire phrases, words out of order, idom, etc.
Recursive Descent Parsing Or: Before you can understand this sentence, first, you must understand this sentence...
Recursive Descent Parsing • A translation between streams of tokens and complex structures like trees (or tree-like data structs) • One step beyond lexing • Requires more sophisticated structures
Lexical analysis, revisited • Rules equivalent to regular expressions • Can only represent sequences, indefinite repetition (i.e., “*” or “+” operators), and finite cases (“[]” and “|” operators) • Can be recognized in linear time • Equivalent to a finite state machine
R.D. Parsing and CFGs • Rules can be recursive • Technically, based on “context free grammars” • Needs a full stack machine, not just a state machine • Stack can be unboundedly deep • Needs more than a finite number of states to run
CFGs and BNF • Write our rules in “Bakus-Naur Normal Form” (BNF) • Rules made up of two elements: • Terminals: actual tokens that could be found in the data -- “dog”, “127”, “{“, [a-zA-Z]+ • Non-terminals: names of rules • Rules must be of form: • LHS := term1 op1 term2 op2 ... termN opN • LHS is a non-terminal • termi is a terminal or non-terminal • opi is one of the operators we’ve met before -- +, *, |, ()
BNF from P2 FILE := ( CONTROL | PUZZLEDEF )* CONTROL := ( OUTFILE | LOGFILE | ERRFILE | RESULTS | STATS | SEARCH-CTRL | "Run" | "Reset" )
BNF from P2 FILE:= ( CONTROL | PUZZLEDEF)* CONTROL:= ( OUTFILE | LOGFILE | ERRFILE | RESULTS | STATS | SEARCH-CTRL | "Run" | "Reset" )
BNF from P2 FILE := ( CONTROL | PUZZLEDEF )* CONTROL := ( OUTFILE | LOGFILE | ERRFILE | RESULTS | STATS | SEARCH-CTRL | "Run" | "Reset" )
Recursion... N2KPUZZLE := "NToTheKPuzzle" "(" HNAME ")” "=” "{” "StartState" "=" NKPUZSTATE "GoalState" "=" NKPUZSTATE "}” NKPUZSTATE := "[” ( NUMLIST | NKPUZSTATE ( "," NKPUZSTATE )* ) "]” NUMLIST := NON-NEG-INTEGER ( "," NON-NEG-INTEGER )* HNAME := [a-zA-Z]+ POS-INTEGER := [1-9][0-9]+ NON-NEG-INTEGER := [0-9]+
Turning it into code public PuzState parseNKPuzzle(Lexer l) { Token t=l.next(); if (!t.tokStr().equals(“NToTheKPuzzle”)) { throw new ParseException(“Unexpected” + “ token “ + t.tokStr() + “ found when expecting “ + “ N^k-1 puzzle state”); } t=l.next(); if (!t.tokStr().equals(“(“)) { //... } t=l.next(); if (t.getType()!=TT_HNAME) { // ... } String heuristic=t.tokStr();
Turning it into code // parse “)”, “=“, “{“, “StartState”, // “=“. Now ready for NKPUZSTATE NkPuzStateRep sRep=parseNKPuzState(l); // now parse “GoalState”, “=“ NkPuzStateRep gRep=parseNKPuzState(l); // parse “}” and you know you’re done with // NKPUZ // now construct the actual puzzle object if (heuristic.equals(“Manhattan”) { NkPuz p=new NkManhattanPuz(sRep,gRep); return p; } if (heuristic.equals(“TileCount”) { NkPuz p=new NkTileCountPuz(sRep,gRep); return p; }