100 likes | 512 Views
GRAMMAR & PARSING (Syntactic Analysis) . NLP- WEEK 4. SYNTACTIC STRUCTURE. To compute the syntactic structure of a sentence, must consider TWO things: GRAMMAR = a formal specification of the structures allowable in a language
E N D
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4
SYNTACTIC STRUCTURE • To compute the syntactic structure of a sentence, must consider TWO things: • GRAMMAR = a formal specification of the structures allowable in a language • PARSING Technique = the method of analysing a sentence to determine itsstructure according to the grammar
TREE Representation • Most common method to re[resent how a sentence is broken into its major subparts & how these subparts are broken up in turn is using a TREE. • Eg: Fatin ate the papaya. (S (NP (NAME Fatin) ) -----> LIST notation (VP (V ate) (NP (ART the) (N papaya) ) ) ). * Show correspondence Tree structure (Fig 3.1 pg 42, Allen)
Tree Representation : Terminology • Trees = a special form of GRAPH • Structures consisting of: • NODES (eg. Labeled as S, NP) • LINKS (connecting lines/arrows) • ROOT (the node at the top) – (dominates all other nodes) • LEAVES (the nodes at the bottom) • “ a LINK points from a PARENT node to a CHILD node) ‘ • Every CHILD node has a UNIQUE PARENT • A PARENT node may point to MANY CHILD codes • An ANCESTOR of a node N is defined as N’s Parent • A node is DOMINATED by its Ancestor node
CONSTRUCT a TREE Structure • To construct a tree structure of a Sentence, one MUST know what Structures are legal for English. • A set of REWRITE Rules: • describes what tree structures are allowable. • Say that certain symbol may be expanded in the tree by a sequence of other symbols • Example Rule ( Grammar 3.2, Allen pg 42) • Grammars consisting entirely of rules with a single symbol on the LHS (called the MOTHER) = Context Free Grammars (CFGs).
CFGs • A very important grammars: • The formalism is powerful enough to describe most of the structure in Natural languages • Yet, It is restricted enough so that efficient parsers can be built to analyze sentences.
Terminology cont. • Symbols that cannot be further decomposed in a grammar = TERMINAL symbols (namely the words) • The other symbols such as S, VP, NP = NON-TERMINAL symbols. • The grammatical symbols such as N, V that describes word categories = LEXICAL symbols • Some words will be listed under multiple categories. Eg: word can would be listed under V and N.
Grammars and Parsing • Grammars have a special symbol called the START symbol ( = S) • A grammar is said to DERIVE a sentence if there is a sequence of rules that allow you to rewrite the start symbol into the sentence.
DERIVATIONS • Two important processes are based on derivations: • Sentence Generation – uses derivations to construct legal sentences • Parsing – identifies the structure of sentences given a grammar.
SEARCHING TECHNIQUES • Two basis methods of searching: • A Top-down Strategy: start with the S symbol and then searches through different ways to rewrite the symbols until the input sentence is generated; or until all possibilities have been explored. • A Bottom-up Strategy : start with the words in the sentence and use the rewrite rules backward to reduce the sequence of symbols until it consists solely of S. The LHS of each rule is used to rewrite the symbol on the RHS