220 likes | 330 Views
CSA3050: NLP Algorithms. Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm. Left Recursion Ambiguity Inefficiency. Problems with DFTD Parser. Left Recursion.
E N D
CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm CSA3050: Parsing Algorithms 2
Left Recursion Ambiguity Inefficiency Problems withDFTD Parser CSA3050: Parsing Algorithms 2
Left Recursion • A grammar is left recursive if it contains at least one non-terminal A for whichA * A and * • Intuitive idea: derivation of that category includes itself along its leftmost branch. NP NP PP NP NP and NP NP DetP Nominal DetP NP ' s CSA3050: Parsing Algorithms 2
Infinite Search CSA3050: Parsing Algorithms 2
Dealing with Left Recursion • Reformulate the grammar • A A | as • A A' A' A' | • Disadvantage: different (and probably unnatural) parse trees. • Use a different parse algorithm CSA3050: Parsing Algorithms 2
Ambiguity • Coordination Ambiguity: different scope of conjunction:Black cats and dogs like to play • Attachment Ambiguity: a constituent can be added to the parse tree in different places:I shot an elephant in my pyjamas • VP → VP PPNP → NP PP CSA3050: Parsing Algorithms 2
Catalan Numbers The nth Catalan number counts the ways of dissecting a polygon with n+2 sides into triangles by drawing nonintersecting diagonals. CSA3050: Parsing Algorithms 2
Handling Disambiguation • Statistical disambiguation • Semantic knowledge. CSA3050: Parsing Algorithms 2
Repeated Parsing ofSubtrees CSA3050: Parsing Algorithms 2
Earley Algorithm • Dynamic Programming: solution involves filling in table of solutions to subproblems. • Parallel Top Down Search • Worst case complexity = O(N3) in length N of sentence. • Table, called a chart, contains N+1 entries ● book ● that ● flight ● 0 1 2 3 CSA3050: Parsing Algorithms 2
The Chart • Each table entry contains a list of states • Each state represents all partial parses that have been reached so far at that point in the sentence. • States are represented using dotted rules containing information about • Rule/subtree: which rule has been used • Progress: dot indicates how much of rule's RHS has been recognised. • Position: text segment to which this parse applies CSA3050: Parsing Algorithms 2
Examples of Dotted Rules • Initial S RuleS → ● VP, [0,0] • Partially recognised NPNP → Det ● Nominal, [1,2] • Fully recognised VPVP → V VP ● , [0,3] • These states can also be represented graphically CSA3050: Parsing Algorithms 2
The Chart CSA3050: Parsing Algorithms 2
Earley Algorithm • Main Algorithm: proceeds through each text position, applying one of the three operators below. • Predictor: Creates "initial states" (ie states whose RHS is completely unparsed). • Scanner: checks current input when next category to be recognised is pre-terminal. • Completer: when a state is "complete" (nothing after dot), advance all states to the left that are looking for the associated category. CSA3050: Parsing Algorithms 2
Early Algorithm – Main Function CSA3050: Parsing Algorithms 2
Early Algorithm – Sub Functions CSA3050: Parsing Algorithms 2
fl CSA3050: Parsing Algorithms 2
Retrieving Trees • To turn recogniser into a parser, representation of each state must also include information about completed states that generated its constituents CSA3050: Parsing Algorithms 2
Chart[3] ↑Extra Field CSA3050: Parsing Algorithms 2