120 likes | 150 Views
Syntax and Processing it: Definite Clause Grammars in Prolog (optional material). John Barnden School of Computer Science University of Birmingham Natur al Language Process ing 1 2014/15 Semester 2. DCGs: Introduction.
E N D
Syntax and Processing it:Definite Clause Grammars in Prolog(optional material) John Barnden School of Computer Science University of Birmingham Natural Language Processing 1 2014/15 Semester 2
DCGs: Introduction • A way of writing syntactic recognizers and parsers directly in Prolog. • We write Prolog rules of a special type. These look very much like CF grammar productions. • Recognition or parsing happens by the normal Prolog computation process. • Different structures can be recognized/created for the same sentence, by the normal alternative-answer process of Prolog: i.e., natural handling of syntactic ambiguity. • In the parsing case, syntax trees are produced. • Grammatical constraints such as agreement are also easy to include. • The rules can be translated into ordinary Prolog, but with a lot of extra parameters that are tedious to write and that obscure the main information. • The compiler meta-interprets the rules into normal Prolog. • Caution: DCGs provide only top-down depth-first parsing, because of Prolog’s approach to using rules. • But other strategies may be better. More on this later.
DCGs, contd: Recognition • See link on Slides page to a toy recognizer in DCGthat you can examine and play with. • Example DCG rules for recognition of non-terminal categories: • s --> np, vp. • np --> noun, pp. np --> det, adj, noun, pp. • Example DCG rules for recognition of terminal categories: • det --> [a]. det --> [an]. det --> [the]. • noun --> [cat]. noun --> [dog]. noun --> [dogs]. • verb --> [dogs]. • (There is another, more economical method.) • The program can be run in two ways: • s([a, dog, sits, on, a, mat], []). np([a, dog], []). • phrase(s, ([a, dog, sits, on, a, mat]). phrase(np,[a,dog]). • The second argument for s, npetc. is for catching extra words: • np([a, dog, sits, on, a], X). Gives X = [sits, on, a].
Advantage of DCGs over ordinary Prolog • Consider the abstract grammar rules S NP VP NP Det Noun • Here’s how they could be implemented in ordinary Prolog (for just recognition, but syntax-tree constructing and grammatical-category checking [see later] can be added) : • s(WordList, Residue):- • np(WordList, Residue_to_pass_on), vp(Residue_to_pass_on, Residue). • np(WordList, Residue):- • det(WordList, Residue_to_pass_on), noun(Residue_to_pass_on, Residue). • det([the | Residue], Residue). • noun([dog | Residue], Residue). • Can be called as in:s([a, dog, sits, on, a dog], []). • Exercise: See ordinary-prolog version of the recognizer linked from Slides page. • Compared to DCG form, have the extra WordListandResiduearguments in every syntactic-category predicate. Tedious, error-prone.
DCGs: Additions • Can embed ordinary Prolog within grammar rules. • Can use disjunction and cuts. • Can add arguments to the category symbols (np, det, etc.) so as to • Build syntax trees, i.e. do parsing, not just recognition • Include “grammatical categories” (used to enforce constraints such as agreement) • Build semantic structures. • Will see some of this in following slides.
DCGs: Parsing • Add a parameter to each category symbol, delivering a node of the syntax tree: • vp(vp_node(Verb_node, PP_node) ) --> verb(Verb_node), pp(PP_node). • verb(verb_node(sits)) --> [sits]. • The program can again be run in two ways: • s(ST, [a, dog, sits, on, a, mat], []). • phrase(s(ST), ([a, dog, sits, on, a, mat]). • See links on Slides page to toy parsers in DCGthat you can examine and play with. • So far: “basic” parser1. • An initial exercise: add new words and new NP rules.
DCGs: Syntactic Ambiguity • Suppose we add two extra rules: • vp( vp_node(Verb_node, PP_node1, PP_node2) ) --> • verb(Verb_node), pp(PP_node1), pp(PP_node2). • np( np_node(Det_node, N_node, PP_node) ) --> • det(Det_node), noun(N_node), pp(PP_node). • Then we get two different structures for • A dog sits on the mat with the flowers. • Exercise: • Work out by hand what structures you should get, both as drawn syntax trees and as Prolog forms. • Try it out using the relevant parser on the Slides page.
Terminals: A Better Implementation • verb(verb_node(Word)) --> [Word], {verb_pred(Word)}. • The part in braces is ordinary Prolog. • Individual verbs are included as follows: • verb_pred(sit). verb_pred(sits). verb_pred(hates). • This is less writing per individual verb, and concentrates the node-building into one place. • Looks possibly less efficient, because of the extra step. • BUT in modern Prologs it speeds up execution: • by making the DCG terminal symbol call (verb in top line above) deterministic • by making the call of the lexical predicates (verb_pred, etc.) deterministic. • Exercise: amend one of the toy parsers by using the above method.
Grammatical Categories • A grammatical category is a dimension along which (some) lexical or syntactic consistuents can vary in limited, systematic ways, such as (in English): • Number singular or plural: lexically, nouns, verbs, determiners, numerals • Person first, second and third: lexically, only for verbs, nouns and some pronouns • Tense present, past (various forms), future: lexically, only for verbs • Gender M, F, N [neither/neuter]: lexically, only some pronouns and some nouns • Syntactic constituents can sometimes inherit grammatical category values from their components, e.g. (without showing all possible GC values): • the big dog: 3rd person M/F/N singular NP // the big dogs: 3rd person M/F/N plural NP • we in the carpet trade: 1st person M/F plural NP // you silly idiot: 2nd person M/F singular NP • eloped with the gym teacher:past-tense VP // will go:future-tense VP • the woman with the long hair: female NP // the radio with the red knobs: neuter NP • A lexical or syntactic constituent can be ambiguous as to a GC value: • e.g. sheep: singular/plural; manage: singular/plural 1st/2nd person
Grammatical Categories in DCGs, contd • Or, using the better lexicon representation: • noun(n_node(Word), gcs(numb(Numb), person(third)) ) • --> [Word], {noun_pred(Word, Numb)}. • noun_pred(dog, singular). • noun_pred(dogs, plural).
Grammatical Categories in DCGs, contd • Enforcing agreement in an NP syntax rule: • np(np_node(Det_node, N_node), gcs(Number_gc, Person_gc) ) • --> det(Det_node, gcs(Number_gc, Person_gc) ), • noun(n_node, gcs(Number_gc, Person_gc) ). • OR more simply, if don’t need to enforce a particular shape to gcs(...): • np(np_node(Det_node, N_node), GCs) • --> det(Det_node, GCs), noun(n_node, GCs). • Enforcing subject-NP / VP agreement (NB: doesn’t handle the case GC) • s(s_node(NP_node, VP_node), GCs) • --> np(NP_node, GCs), vp(VP_node, GCs).
Grammatical Categories in DCGs, contd • Not enforcing agreement within part of a VP rule: • vp(vp_node(Verb_node, PP_node), GCs ) • --> verb(Verb_node, GCs), pp(PP_node). • OR if you needed PP to return some GCs that didn’t matter: • vp(vp_node(Verb_node, PP_node), GCs ) • --> verb(Verb_node, GCs), pp(PP_node, _ ). • Exercise: understand and play around with the GC version of the parser linked from Slides page. • The program can again be run in two ways: • s(ST, GCs, [a, dog, sits, on, a, mat], []). • phrase(s(ST, GCs), ([a, dog, sits, on, a, mat]).