1 / 24

LING 388 Language and Computers

LING 388 Language and Computers. Lecture 5 9/1 6 /03 Sandiway FONG. Administrivia. Homework 1 due today ling_388@yahoo.com Grading Homework 1 8 pts (Q1: 2, Q2: 3, Q3: 3) In general 2 homeworks every month 6-8 homeworks (75% of the grade) Final paper/extended homework

Download Presentation

LING 388 Language and Computers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING 388Language and Computers Lecture 5 9/16/03 Sandiway FONG

  2. Administrivia • Homework 1 due today • ling_388@yahoo.com • Grading • Homework 1 • 8 pts (Q1: 2, Q2: 3, Q3: 3) • In general • 2 homeworks every month • 6-8 homeworks (75% of the grade) • Final paper/extended homework • 25% of the grade

  3. Administrivia • Poll: • How many would want an extra (optional) computer lab session? • Send email to Charles if interested • Lecture Notes Explained • Designed to cover the all the material explained in class • Not sufficient by themselves – come to class • Ask for help if you miss a class, and need material explained

  4. Administrivia • Next Tuesday • Computer Lab Class • SBS 224 • Homework 2 • TA Office Hours • 11 am – 1 pm Mondays • Haury 317

  5. Review • Last Lecture: Introduction • Concepts: • alphabet, string, language • regular expressions (+,n,*,|) • Finite state automata (FSA) • Quintuple (S, Sinitial, Sfinal, Alphabet, s) • Two possible encodings in Prolog based on • 1: one predicate per state • 2: predicate encoding the transition function s

  6. FSA Regular Expressions Equivalence Regular Grammars

  7. a a a Regular Expression <-> FSA • Cases (by example) • a • a* • a+ • a|b • Note: • Optionality a? (= a|l) a a b

  8. a y x a z NDFSA to FSA • Non-deterministic FSA • Allows choice of next state on a given input character • Is this a significant change? • No! Not more powerful than a regular, i.e. deterministic, FSA

  9. a a y x a z NDFSA to FSA • How to simulate a NDFSA using a FSA • Keep track of states in parallel • Create “composite state” • Example: x v =y,z

  10. FSA Regular Expressions Equivalence Revised NDFSA Regular Grammars

  11. The Power of FSA • Many languages cannot be simulated by FSA • …including human language • Example: • constructions involving embedding • FSA are (by definition) fixed • fixed number of states • Can view states as storage, i.e. memory • Lack of flexibility in terms of what can be memorized means limited expressive power

  12. The Power of FSA • An illustration of the limits of FSA • Consider • L = { ab, aabb, aaabbb, …} • “same number of b’s as a’s…” • = {anbn | n >= 1}

  13. a a a b a a a b b The Power of FSA • We can build FSA for… • ab • aabb • aaabbb b b b = end state

  14. b a a a b b The Power of FSA • We can merge FSA for… • ab • aabb • aaabbb b b b

  15. The Power of FSA • However, we cannot directly encode the infinite set L = {anbn | n >= 1} = { ab, aabb, aaabbb, …} • Question: Why? • Answer: • Direct encoding would require an infinite number of states • Because L has an infinite number of members • And we’re using Finite State Automata

  16. a The Power of FSA • Note: FSA does allow infinity to be coded • …in the form of loops • Example: (ab)+ • Empty character = e b e

  17. The Power of FSA • However, the loop construct does not help us in this case • Question: Why not? • Answer: • Because loops can only freely allow any number of iterations • And L = {anbn | n >= 1} requires a controlled number n of as andbs

  18. Grammars • A grammar G consists of • A set of non-terminals symbols (VN) • A set of terminal symbols (VT) • A starting non-terminal symbol (S) • A set of production rules of the form: • X -> Y1…Yn n>=0 • To be read as: “X rewrites to symbols Y1 through Yn” • X e VN, Yie (VN u VT) 1<=i<=n

  19. Grammars • Other common terminology: • L(G) is the language of grammar G • G generates L(G) • Beginning with the start symbol S… • L(G) is the set of all strings containing only terminal symbols, i.e. symbols from VT, that can be generated by (repeated) application of the production rules • A sentential form is any string containing terminal and non-terminal symbols generated by G

  20. Grammars • Example: • Suppose we have a grammar G with production rules: • S -> NP VP • NP -> Det N • Det -> the Det -> a • N -> man N -> sandwich • VP -> V VP -> V NP • V -> ate • Starting non-terminal symbol: S • VN = {S, NP, Det, N, VP, V} (set of non-terminals) • Symbols that can appear on the left side of a production rule • VT = {the, a, man, sandwich, ate} (set of terminals) • Symbols that do not appear on the left side of any production rule • Note: • Non-terminals may appear on both sides of production rules

  21. Grammars • Production rule application • Example: • G generates the man ate a sandwich • Sentential forms: • S • NP VP • Det N VP • the N VP • the man VP • the man V NP • the man ate NP • the man ate Det N • the man ate a N • the man ate a sandwich • Note: • In the derivation, non-terminals marked in color are expanded next

  22. Grammars • There are other equally valid ways of deriving the man ate a sandwich • Example: • Sentential forms • S • NP VP • NP V NP • NP ate NP • NP ate Det N • Det N ate Det N • Det N ate a N • the N ate a N • the N ate a sandwich • the man ate a sandwich

  23. Grammars • Many different possible strategies for rewriting production rules: • Left to right, right to left, middle out • Top-down, bottom-up • Left corner, head-driven • For Generation: • In terms of correctness of derivation, • doesn’t really matter which strategies we pick (or mix together) as long as we can connect the start symbol with the terminal string we want • For Parsing: • i.e. the opposite process of starting with the terminal string and deriving the start symbol • We may want to restrict strategies to operate… • Left to right – words are heard in left to right order

  24. Grammars • Other Considerations • Computational efficiency • How many rewrites does it take to generate or parse a sentence using a particular strategy? • How much memory is needed? Is it different for different strategies? • Looking ahead… • Generally speaking, bottom-up parsers are usually faster than top-down parsers • We will see later in the course…

More Related