480 likes | 578 Views
LING 408/508: Computational Techniques for Linguists. Lecture 21 10/10/2012. Outline. Parsing Parsing arithmetic exprs . in prefix notation Parsing arithmetic exprs . in postfix notation Short assignment # 13 Long assignment #6.
E N D
LING 408/508: Computational Techniques for Linguists Lecture 21 10/10/2012
Outline • Parsing • Parsing arithmetic exprs. in prefix notation • Parsing arithmetic exprs. in postfix notation • Short assignment #13 • Long assignment #6
Previously: string representations of arithmetic expressions • Infix: (5 / 7) + ((2 * 4) - 1) • Parentheses have inserted for disambiguation; they are not represented in the original tree • Prefix: + / 5 7 - * 2 4 1 • Postfix: 5 7 / 2 4 * 1 - + • Different string representations of the same tree + / - 5 7 1 * 2 4
Parsing arithmetic expressions • Given a string representation of an arithmetic expression, construct a binary tree • Example: • Input: '5 7 / ' • Output: ('/', (5,None,None), (7,None,None)) • Parsing algorithms for different notations • Prefix: recursion • Postfix: iteration; shift-reduce parsing with a stack • Infix: recursion; recursive-descent parsing
Parsing algorithms operate upon tokenized input string # input: a space-separated string # representing an expression # output: a list of strings # # EXAMPLE: # input: '+ 3 * 2 5' # output: ['+', '3', '*', '2', '5'] def tokenize(s): return s.split(' ')
Parsing natural language syntax • Given a sentence, return all parse trees for that sentence.
Outline • Parsing • Parsing arithmetic exprs. in prefix notation • Parsing arithmetic exprs. in postfix notation • Short assignment #13 • Long assignment #6
Prefix notation • Operator occurs before left and right operands / 5 7 • Operands may be recursively constructed expressions + / 5 7 - * 2 4 1
Parsing prefix notation • Recursive case: • Read operator op • Recursively parse left operand lnode • Recursively parse right operand rnode • Construct node: (op, lnode, rnode) • Base case: • Read an integer (need to convert string to integer) • Construct node: (value, None, None)
Attempt #1 (doesn’t work) def parse_prefix(s): # s is a list of strings operators = {'+', '-', '*', '/'} if s[0] not in operators: # base case: integer return (int(s[0]), None, None) # leaf node else: # recursive case op = s[0] lnode = parse_prefix(s) rnode = parse_prefix(s) return (op, lnode, rnode) # parent node
Variable position for reading operator • Two lines of code refer to s[0] if s[0] not in operators: op = s[0] • But an operator can be in many positions in the string • Solution: specify starting index for parsing an operand from the input string parse_pref(s, idx)
Attempt #2 (doesn’t work):specify starting index def parse_prefix(s, idx=0): operators = {'+', '-', '*', '/'} if s[idx] not in operators: return (int(s[idx]), None, None) else: op = s[idx] lnode = parse_prefix(s, idx+1) rnode = parse_prefix(s, idx+1) return (op, lnode, rnode) T = parse_pref(s)[0] # s is a list of strings
Doesn’t work: right operand index • Input: [operator][left operand][right operand] • Calling function: op = s[idx] lnode = parse_prefix(s, idx+1) rnode = parse_prefix(s, idx+1) return (op, lnode, rnode) • Left operand begins immediately after operator • Index idx+1 in input string • Where does right operand begin? • Second argument should be greater than idx+1 • Need to know how large the left operand is
Right operand index:use size of left subtree • Instead of just returning the node corresponding to a subtree for an operand: lnode = parse_prefix(s, idx) also return the size of the subtree: (lnode, lsz) = parse_prefix(s, idx) • Now the calling function will know where to begin to parse right operand in input string [operator][left operand][right operand] idx idx+1 idx+1 + size(left)
Solution: also return size of subtree def parse_prefix(s=0): operators = {'+', '-', '*', '/'} if s[idx] not in operators: # base case: integer leaf = (int(s[idx]), None, None) return (leaf, 1) # size of subtree else: op = s[idx] (lnode, lsz) = parse_prefix(s, idx+1) (rnode, rsz) = parse_prefix(s, idx+1 + lsz) parent = (op, lnode, rnode) return (parent, 1 + lsz + rsz) T = parse_prefix(s)[0]
A complete program, with tokenization def tokenize(s): return s.split(' ') def parse_prefix(s=0): operators = {'+', '-', '*', '/'} if s[idx] not in operators: # base case: integer leaf = (int(s[idx]), None, None) return (leaf, 1) # size of subtree else: op = s[idx] (lnode, lsz) = parse_prefix(s, idx+1) (rnode, rsz) = parse_prefix(s, idx+1 + lsz) parent = (op, lnode, rnode) return (parent, 1 + lsz + rsz) s = '+ 3 * 2 5' T = parse_prefix(tokenize(s))[0]
Example: sequence of function calls for input s = '+ 3 * 2 5' parse_prefix(s, 0) # parses + 3 * 2 5 parse_prefix(s, 1) # parses 3 parse_prefix(s, 2) # parses * 2 5 parse_prefix(s, 3) # parses 2 parse_prefix(s, 4) # parses 5
Example: function calls and return values (nodes and size) for s = '+ 3 * 2 5' parse_pref(s, 0) # parses + 3 * 2 5 parse_pref(s, 1) # parses 3 returns ((3,None,None),1) parse_pref(s, 2) # parses * 2 5 parse_pref(s, 3) # parses 2 returns ((2,None,None),1) parse_pref(s, 4) # parses 5 returns ((5,None,None),1) returns ((*,(2,None,None),(5,None,None)),3) returns ((+,(3,None,None),(*,(2,None,None),(5,None,None))),5)
Outline • Parsing • Parsing arithmetic exprs. in prefix notation • Parsing arithmetic exprs. in postfix notation • Short assignment #13 • Long assignment #6
Parsing postfix • Postfix: 5 7 / 2 4 * 1 - + • Operator occurs after left and right operands • Requires shift-reduce parsing • Shift: create node according to position in input, advance one position • Reduce: when we see an operator in the input, construct a parent node for the two previous operands
Example • Input: 3 5 + Tokenize as ['3', '5', '+'] • Sequence of steps: 1. idx = 0 Shift: read 3, construct (3, None, None) 2. idx = 1 Shift: read 5, construct (5, None, None) 3. idx = 2 Reduce: read +, construct parent node: ('+', (3,None,None), (5,None,None))
Example • Input: 3 5 + 6 8 / * • Sequence of steps (some omitted): • Reduce: read +, construct parent node: ('+', (3, None, None), (5, None, None) • Reduce: read /, construct parent node: ('/', (6, None, None), (8, None, None) • Reduce: read *, construct parent node: ('*', ('+',(3,None,None),(5,None,None)), ('/',(6,None,None),(8,None,None)))
Reduce operation applies to the2 most-recently shifted items • 3 5 + Apply + to a leaf and a leaf • 3 5 + 6 8 / * Apply * to a parent and a parent • 5 7 / 2 4 * 1 - + Apply - to a parent ('*',2,4) and a leaf (1,None,None)
Can accumulate arbitrarily many operands before reducing • 1 2 3 4 5 + + + + = 1+(2+(3+(4+5))) in infix • Need a data structure to hold these operands • Keep track of their order • Operator applies to two most recent operands 1 2 3 4 5 + + + +
Stack data structure • Stores a sequence of items • Example: stack of 1, 2, 3 • Example: empty stack 3 2 1
Stack operations • Push: put an item on the top of the stack • Pop: take an item off the top of the stack • Example: 4 Push 4 Pop 3 3 3 2 2 2 1 1 1
Analogy to stack of plates:only (put on / take off of) the top http://www.gettyimages.com/detail/200131588-001/The-Image-Bank http://blog.timesunion.com/advocate/files/2008/09/stack_of_plates.jpg
The stack in shift-reduce parsing • Shift: • Push an item on top of the stack • Reduce: • Pop an item from the top (right operand) • Pop another item from the top (left operand) • Perform computation with operator • Push new item onto the stack
The stack in shift-reduce parsing • End result: after reading the entire input string, the result of the computation is the single item on the stack • (Assume that the input string is well-formed) • A stack is the same thing as a pushdown automaton (abstract machine that recognizes context-free languages)
Example: parse this postfix expression:5 7 / 2 4 * 1 - + • Initially: empty stack
Postfix: 5 7 / 2 4 * 1 - + • Read 5 • Push 5 5 In Python, this is [(5,None,None)]
Postfix: 5 7 / 2 4 * 1 - + • Read 7 • Push 7 7 5 5 In Python, this is [(5,None,None), (7,None,None)]
Postfix: 5 7 / 2 4 * 1 - + • Read / • Pop 7 • Pop 5 • Construct node • Push node 7 5 ( /, 5, 7 ) In Python, this is [('/', (5,None,None), (7,None,None))]
Postfix: 5 7 / 2 4 * 1 - + • Read 2 • Push 2 2 ( /, 5, 7 ) ( /, 5, 7 ) In Python, this is [('/', (5,None,None), (7,None,None)), (2,None,None)]
Postfix: 5 7 / 2 4 * 1 - + • Read 4 • Push 4 4 2 2 ( /, 5, 7 ) ( /, 5, 7 )
Postfix: 5 7 / 2 4 * 1 - + • Read * • Pop 4 • Pop 2 • Construct node • Push node 4 2 ( *, 2, 4 ) ( /, 5, 7 ) ( /, 5, 7 )
Postfix: 5 7 / 2 4 * 1 - + • Read 1 • Push 1 1 ( *, 2, 4 ) ( *, 2, 4 ) ( /, 5, 7 ) ( /, 5, 7 )
Postfix: 5 7 / 2 4 * 1 - + • Read - • Pop 1 • Pop (*, 2, 4 ) • Construct node • Push node 1 ( *, 2, 4 ) (-, (*,2,4), 1) ( /, 5, 7 ) ( /, 5, 7 )
Postfix: 5 7 / 2 4 * 1 - + • Read + • Pop (-, (*, 2, 4), 1) • Pop (/, 5, 7) • Construct node • Push node (-, (*,2,4), 1) ( /, 5, 7 ) (+, ( /, 5, 7 ), (-, (*,2,4), 1))
Postfix: 5 7 / 2 4 * 1 - + • End of input string • Stop • Result on top of stack (+, ( /, 5, 7 ), (-, (*, 2, 4), 1)) In Python, this is [('+', ('/', (5,None,None), (7,None,None)), ('-', ('*', (2,None,None), (4,None,None)), (1,None,None)))]
Implementing a stack in Python • Want: • Sequence of items • Push: add to end of sequence • Pop: remove from end of sequence • Use a list • Push is list.append • Pop is list.pop >>> help(list.pop) Help on method_descriptor: pop(...) L.pop([index]) -> item -- remove and return item at index (default last). Raises IndexError if list is empty or index is out of range.
Code for parsing postfix def parse_postfix(s): stack = [] operators = {'+', '-', '*', '/'} for x in s: if x not in operators: leaf = (int(x), None, None) stack.append(leaf) # push on top of stack else: rnode = stack.pop() lnode = stack.pop() parent = (x, lnode, rnode) stack.append(parent) return stack[0] # single node on stack
Can use stack to evaluate an expression directly,without constructing a tree first • Instead of: read 5, push 5 read 5, push 5 read /, pop 7, pop 5, construct node • Do: read 5, push 5 read 7, push 7 read /, pop 7, pop 5, compute, push 0 7 5 5 ( /, 5, 7 ) 7 5 5 0
Later: parsing infix • Infix: (5 / 7) + ((2 * 4) - 1) • Prefix: + / 5 7 - * 2 4 1 • Postfix: 5 7 / 2 4 * 1 - + + / - 5 7 1 * 2 4
Outline • Parsing • Parsing arithmetic exprs. in prefix notation • Parsing arithmetic exprs. in postfix notation • Short assignment #13 • Long assignment #6
Due 10/12 • Convert this prefix expression to infix: - * / 8 3 * + 7 4 2 9 • Draw the sequence of stack operations for parsing this postfix expression: 1 2 3 + 4 5 6 * / - +
Outline • Parsing • Parsing arithmetic exprs. in prefix notation • Parsing arithmetic exprs. in postfix notation • Short assignment #13 • Long assignment #6