240 likes | 429 Views
LING 388 Language and Computers. Lecture 5 9/1 6 /03 Sandiway FONG. Administrivia. Homework 1 due today ling_388@yahoo.com Grading Homework 1 8 pts (Q1: 2, Q2: 3, Q3: 3) In general 2 homeworks every month 6-8 homeworks (75% of the grade) Final paper/extended homework
E N D
LING 388Language and Computers Lecture 5 9/16/03 Sandiway FONG
Administrivia • Homework 1 due today • ling_388@yahoo.com • Grading • Homework 1 • 8 pts (Q1: 2, Q2: 3, Q3: 3) • In general • 2 homeworks every month • 6-8 homeworks (75% of the grade) • Final paper/extended homework • 25% of the grade
Administrivia • Poll: • How many would want an extra (optional) computer lab session? • Send email to Charles if interested • Lecture Notes Explained • Designed to cover the all the material explained in class • Not sufficient by themselves – come to class • Ask for help if you miss a class, and need material explained
Administrivia • Next Tuesday • Computer Lab Class • SBS 224 • Homework 2 • TA Office Hours • 11 am – 1 pm Mondays • Haury 317
Review • Last Lecture: Introduction • Concepts: • alphabet, string, language • regular expressions (+,n,*,|) • Finite state automata (FSA) • Quintuple (S, Sinitial, Sfinal, Alphabet, s) • Two possible encodings in Prolog based on • 1: one predicate per state • 2: predicate encoding the transition function s
FSA Regular Expressions Equivalence Regular Grammars
a a a Regular Expression <-> FSA • Cases (by example) • a • a* • a+ • a|b • Note: • Optionality a? (= a|l) a a b
a y x a z NDFSA to FSA • Non-deterministic FSA • Allows choice of next state on a given input character • Is this a significant change? • No! Not more powerful than a regular, i.e. deterministic, FSA
a a y x a z NDFSA to FSA • How to simulate a NDFSA using a FSA • Keep track of states in parallel • Create “composite state” • Example: x v =y,z
FSA Regular Expressions Equivalence Revised NDFSA Regular Grammars
The Power of FSA • Many languages cannot be simulated by FSA • …including human language • Example: • constructions involving embedding • FSA are (by definition) fixed • fixed number of states • Can view states as storage, i.e. memory • Lack of flexibility in terms of what can be memorized means limited expressive power
The Power of FSA • An illustration of the limits of FSA • Consider • L = { ab, aabb, aaabbb, …} • “same number of b’s as a’s…” • = {anbn | n >= 1}
a a a b a a a b b The Power of FSA • We can build FSA for… • ab • aabb • aaabbb b b b = end state
b a a a b b The Power of FSA • We can merge FSA for… • ab • aabb • aaabbb b b b
The Power of FSA • However, we cannot directly encode the infinite set L = {anbn | n >= 1} = { ab, aabb, aaabbb, …} • Question: Why? • Answer: • Direct encoding would require an infinite number of states • Because L has an infinite number of members • And we’re using Finite State Automata
a The Power of FSA • Note: FSA does allow infinity to be coded • …in the form of loops • Example: (ab)+ • Empty character = e b e
The Power of FSA • However, the loop construct does not help us in this case • Question: Why not? • Answer: • Because loops can only freely allow any number of iterations • And L = {anbn | n >= 1} requires a controlled number n of as andbs
Grammars • A grammar G consists of • A set of non-terminals symbols (VN) • A set of terminal symbols (VT) • A starting non-terminal symbol (S) • A set of production rules of the form: • X -> Y1…Yn n>=0 • To be read as: “X rewrites to symbols Y1 through Yn” • X e VN, Yie (VN u VT) 1<=i<=n
Grammars • Other common terminology: • L(G) is the language of grammar G • G generates L(G) • Beginning with the start symbol S… • L(G) is the set of all strings containing only terminal symbols, i.e. symbols from VT, that can be generated by (repeated) application of the production rules • A sentential form is any string containing terminal and non-terminal symbols generated by G
Grammars • Example: • Suppose we have a grammar G with production rules: • S -> NP VP • NP -> Det N • Det -> the Det -> a • N -> man N -> sandwich • VP -> V VP -> V NP • V -> ate • Starting non-terminal symbol: S • VN = {S, NP, Det, N, VP, V} (set of non-terminals) • Symbols that can appear on the left side of a production rule • VT = {the, a, man, sandwich, ate} (set of terminals) • Symbols that do not appear on the left side of any production rule • Note: • Non-terminals may appear on both sides of production rules
Grammars • Production rule application • Example: • G generates the man ate a sandwich • Sentential forms: • S • NP VP • Det N VP • the N VP • the man VP • the man V NP • the man ate NP • the man ate Det N • the man ate a N • the man ate a sandwich • Note: • In the derivation, non-terminals marked in color are expanded next
Grammars • There are other equally valid ways of deriving the man ate a sandwich • Example: • Sentential forms • S • NP VP • NP V NP • NP ate NP • NP ate Det N • Det N ate Det N • Det N ate a N • the N ate a N • the N ate a sandwich • the man ate a sandwich
Grammars • Many different possible strategies for rewriting production rules: • Left to right, right to left, middle out • Top-down, bottom-up • Left corner, head-driven • For Generation: • In terms of correctness of derivation, • doesn’t really matter which strategies we pick (or mix together) as long as we can connect the start symbol with the terminal string we want • For Parsing: • i.e. the opposite process of starting with the terminal string and deriving the start symbol • We may want to restrict strategies to operate… • Left to right – words are heard in left to right order
Grammars • Other Considerations • Computational efficiency • How many rewrites does it take to generate or parse a sentence using a particular strategy? • How much memory is needed? Is it different for different strategies? • Looking ahead… • Generally speaking, bottom-up parsers are usually faster than top-down parsers • We will see later in the course…