280 likes | 406 Views
Parsing with Context-Free Grammars for ASR. Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow , Kathy McKeown , Dan Jurafsky and James Martin. What is Syntax?. Structure of language How words are arranged together and related to one another
E N D
Parsing withContext-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin
What is Syntax? • Structure of language • How words are arranged together and related to one another • Goal of syntactic analysis: relate surface form (what someone says or writes) to underlying structure, to support semantic analysis (what the utterance or text means) • Syntactic representation: typically a tree structure
Structure in Strings • A set of words, or, a lexicon: the a small nice big very boy girl sees likes • Some `good’ (grammatical) sentences: • the boy likes a girl • the small girl likes the big girl • a very small nice boy sees a very nice boy • Some bad (ungrammatical) sentences: • *the boy the girl • *small boy likes nice girl • Can we find a way of distinguishing between the two kinds of sequences? • Can we identify similarities among grammatical subsequences?
One Version of Constituent Structure • Lexicon: the a small nice big very boy girl sees likes • Grammatical sentences: • (the) boy (likes a girl) • (the small) girl (likes the big girl) • (a very small nice) boy (sees a very nice boy) • Ungrammatical sentences: • *(the) boy (the girl) • *(small) boy (likes the nice girl)
Another Constituency Hypothesis • Lexicon: the a small nice big very boy girl sees likes • Grammatical sentences: • (the boy) likes (a girl) • (the small girl) likes (the big girl) • (a very small nice boy) sees (a very nice boy) • Ungrammatical sentences: • *(the boy)(the girl) • *(small boy) likes (the nice girl) • Better: fewer types of constituents (blue and red are of same type)
Even More Structures • Lexicon: the a small nice big very boy girl sees likes • Grammatical sentences: • ((the) boy) likes ((a) girl) • ((the) (small) girl) likes ((the) (big) girl) • ((a) ((very) small) (nice) boy) sees ((a) ((very) nice) girl) • Ungrammatical sentences: • *((the) boy)((the) girl) • *((small) boy) likes ((the) (nice) girl)
From Substrings to Trees likes boy girl the a (((the) boy) likes ((a) girl))
How do we Label the Nodes? • ( ((the) boy) likes ((a) girl) ) • Choose constituents so each one has one non-bracketed word: the head • Group words by distribution of constituents they head (POS) • Noun (N), verb (V), adjective (Adj), adverb (Adv), determiner (Det) • Category of constituent: XP, where X is POS • NP, S, AdjP, AdvP, DetP
Types of Nodes S nonterminal symbols = constituents NP likes NP boy girl DetP DetP the a terminal symbols = words Phrase-structure tree (((the/Det) boy/N) likes/V ((a/Det) girl/N))
Determining Part-of-Speech A blue seat/a child seat: noun or adjective? • Syntax: • a blue seat a child seat • a very blue seat *a very child seat • this seat is blue *this seat is child • Morphology: • bluer *childer • blue and child are not the same POS • blueis Adj, child is Noun
Determining Part-of-Speech • Preposition or particle? • A he threw out the garbage • B he threw the garbage out the door • A he threw the garbage out • B *he threw the garbage the door out • The two out are not same POS • A is particle, B is Preposition
Constituency • Some Noun phrases (NPs) • A red dog on a blue tree • A blue dog on a red tree • Some big dogs and some little dogs • A dog • I • Big dogs, little dogs, red dogs, blue dogs, yellow dogs, green dogs, black dogs, and white dogs • How do we know these form a constituent?
NP Constituency • NPs can all appear before a verb: • Some big dogs and some little dogs are going around in cars… • Big dogs, little dogs, red dogs, blue dogs, yellow dogs, green dogs, black dogs, and white dogs are all at a dog party! • I do not • But individual words can’t always appear before verbs: • *little are going… • *blue are… • *and are • Must be able to state generalizations like: • Noun phrases occur before verbs
PP Constituency • Preposing and postposing: • Under a tree is a yellow dog. • A yellow dog is under a tree. • But not: • *Under, is a yellow dog a tree. • *Under a is a yellow dog tree. • Prepositional phrases notable for ambiguity in attachment • I saw a man on a hill with a telescope.
Context-Free Grammars • Defined in formal language theory • Terminals: e.g. cat • Non-terminal symbols: e.g. NP, VP • Start symbol: e.g. S • Rewriting rules: e.g. S NP VP • Start with start symbol, rewrite using rules, done when only terminals left
A Fragment of English S NP VP VP V PP NP DetP N N cat | mat V is PP Prep NP Prep on DetP the Input: the cat is on the mat
Derivations in a CFG S S NP VP VP V PP NP DetP N N cat | mat V is PP Prep NP Prep on DetP the S
Derivations in a CFG NP VP S NP VP VP V PP NP DetP N N cat | mat V is PP Prep NP Prep on DetP the S NP VP
Derivations in a CFG DetP N VP S NP VP VP V PP NP DetP N N cat | mat V is PP Prep NP Prep on DetP the S NP VP DetP N
Derivations in a CFG the cat VP S NP VP VP V PP NP DetP N N cat | mat V is PP Prep NP Prep on DetP the S NP VP DetP N the cat
Derivations in a CFG the cat V PP S NP VP VP V PP NP DetP N N cat | mat V is PP Prep NP Prep on DetP the S NP VP DetP N V PP the cat
Derivations in a CFG the cat is Prep NP S NP VP VP V PP NP DetP N N cat | mat V is PP Prep NP Prep on DetP the S NP VP DetP N V PP the cat is Prep NP
Derivations in a CFG the cat is on Det N S NP VP VP V PP NP DetP N N cat | mat V is PP Prep NP Prep on DetP the S NP VP DetP N V PP the cat is Prep NP on DetP N
Derivations in a CFG the cat is on the mat S NP VP VP V PP NP DetP N N cat | mat V is PP Prep NP Prep on DetP the S NP VP DetP N V PP the cat is Prep NP on DetP N the mat
S NP VP S VP VP V PP VP V NP VP V NP DetP NP NP N NP NP N PP Prep NP N cat | mat | food | bowl | Mary V is | likes | sits Prep on | in | under DetP the | a A More Complicated Fragment of English Mary likes the cat bowl. The cat ate the tasty food. Hello. Nice talking to you.
Pocket Sphinx Grammar Format • Variables go in angle brackets, e.g. <city> • Terminals must appear in your pronunciation dictionary (case sensitive) • X Y is concatenation -- e.g., I WANT • (X | Y) means X or Y -- e.g., (WANT|NEED) • Square brackets mean optional -- e.g., [ON] FRIDAY • * means that the expansion may be spoken zero or more times-- e.g. <digit>* • + means one or more times-- e.g. <digit>+
Example • <city> = BOSTON | NEWYORK | WASHINGTON | BALTIMORE; • <time> = MORNING | EVENING; • <day> = FRIDAY | MONDAY; • public <query> = (((WHAT TRAINS LEAVE) | (WHAT TIME CAN I TRAVEL) | (IS THERE A TRAIN)) (FROM|TO) <city> [(FROM|TO) <city>] ON <day> [<time>]); Hello. No. I want to go on Tuesday. When does the train leave?
Next Class • Language modeling for large vocabulary applications: Ngrams