210 likes | 326 Views
CSA2050 Introduction to Computational Linguistics. Lecture 8 Definite Clause Grammars. Rationale. Prolog Program. Logic. CFG + Sentence. Sentence Structure. Logic Rules and Grammar Rules. Basic Question: what is the connection between logic rules and grammar rules?
E N D
CSA2050 Introduction to Computational Linguistics Lecture 8 Definite Clause Grammars CSA2050: DCG I
Rationale Prolog Program Logic CFG + Sentence SentenceStructure CSA2050: DCG I
Logic Rules andGrammar Rules • Basic Question: what is the connection between logic rules and grammar rules? • x y male(x) & parent(x,y) → father(x,y) • S → NP VP • They are both concerned with the definition of predicates. CSA2050: DCG I
Logic Rulesand Grammar Rules • Logic: arbitrary n-ary predicates, eg raining; clever(x); father(x,y); between(x,y,z) • Grammar Rules: predicates over text segments, egnp(x); vp(y); s(z). CSA2050: DCG I
Text Segments • A text segment is a sequence of consecutive words. • A text segment can be identified by two pointers, if we assign names to the spaces between words.0 the 1 cat 2 sat 3 on 4 the 5 mat 6 • (0,6) is the whole sentence • (0,2) is the first noun phrase CSA2050: DCG I
From Grammar Rules to Logic • The general statement made by the CF rule S → NP, VP • can be summarised using predicates over segments with the following logic statementNP(p1,p) & VP(p,p2) => S(p1,p2) CSA2050: DCG I
From Grammar Rules to Logic 0 the 1 cat 2 sat 3 on 4 the 5 mat 6 VP NP S CSA2050: DCG I
From Logic to Prolog • Each logic statement of the formNP(p1,p) & VP(p,p2) => S(p1,p2) • Corresponds to the "definite clause"s(P1,P2) :- np(P1,P), vp(P,P2). CSA2050: DCG I
S → NP, VP NP → N NP → Det N VP → V NP s(P1,P2) :- np(P1,P), vp(P,P2). np(P1,P2) :- n(P1,P2). np(P1,P2) :- det(P1,P), n(P,P2). vp(P1,P2) :-v(P1,P), np(P, P2) Converting a Grammar CSA2050: DCG I
Lexical Categories and Rules • Lexical categories are those which are not defined in the grammar itself (eg. N and V in our grammar) • Instead, they are defined by the words that they rewriteV → run, sleep, talk etc • Lexical categories always derive exactly one input token. CSA2050: DCG I
Lexical Rules • A rule defining lexical category C must express the following information:there is a C between positions p1 and p2 if some word of syntactic category C spans those positions • There are many different ways to translate such a rule into a Prolog clause. • Each way needs to make reference to how the input sentence is represented. CSA2050: DCG I
Defining Lexical Categories • Each category is defined in terms of the words it can rewrited(P1,P2) :- input(P1,P2,[the]).n(P1,P2) :- input(P1,P2,[cat]).n(P1,P2) :- input(P1,P2,['John']).v(P1,P2) :- input(P1,P2,[ate]). • How is the input sentence represented? CSA2050: DCG I
Representing the Input • Define the predicate input(P1,P2,L) such that P1 and P2 are positions and L is a list containing the words spanning those positions • Checkpoint: show how to represent the input sentence "John ate the cat" CSA2050: DCG I
John ate the cat input(0,1,['John']). input(1,2,[ate]). input(2,3,[the]). input(3,4,[cat]). • Checkpoints • Why is John in quotes? • Why use a list of one element rather than an atom? • Is this the only way to do it? CSA2050: DCG I
1. Grammar s(P1,P2) :- np(P1,P), vp(P,P2). np(P1,P2) :- n(P1,P2). np(P1,P2) :- d(P1,P), n(P,P2). vp(P1,P2) :- v(P1,P2). vp(P1,P2) :-v(P1,P), np(P, P2) 2. Lexicon d(P1,P2) :- input(P1,P2,[the]). n(P1,P2) :- input(P1,P2,[cat]). n(P1,P2) :- input(P1,P2,['John']). v(P1,P2) :- input(P1,P2,[ate]). 3. Input input(0,1,['John']). input(1,2,[ate]). input(2,3,[the]). input(3,4,[cat]). 4. Query ?- s(0,4). Complete Program CSA2050: DCG I
1 1 Call: vp(1,4) ? 2 2 Call: v(1,4) ? 3 3 Call: input(1,4,[ate]) ? 3 3 Fail: input(1,4,[ate]) ? 2 2 Fail: v(1,4) ? 2 2 Call: v(1,_349) ? 3 3 Call: input(1,_349,[ate]) ? 3 3 Exit: input(1,2,[ate]) ? 2 2 Exit: v(1,2) ? 4 2 Call: np(2,4) ? 5 3 Call: n(2,4) ? 6 4 Call: input(2,4,[cat]) ? 6 4 Fail: input(2,4,[cat]) ? 6 4 Call: input(2,4,[John]) ? 6 4 Fail: input(2,4,[John]) ? 5 3 Fail: n(2,4) ? 5 3 Call: d(2,_1338) ? 6 4 Call: input(2,_1338,[the]) ? 6 4 Exit: input(2,3,[the]) ? 5 3 Exit: d(2,3) ? 7 3 Call: n(3,4) ? 8 4 Call: input(3,4,[cat]) ? 8 4 Exit: input(3,4,[cat]) ? 7 3 Exit: n(3,4) ? 4 2 Exit: np(2,4) ? 1 1 Exit: vp(1,4) ? Trace of query?- vp(1,4) CSA2050: DCG I
Representing the Sentence Using Difference Lists We can represent the input as a pair of pointers • The first pointer points to the entire list • The second pointer points to a suffix of the list. • The represented list is the difference between the two lists. input(['John',ate,the,cat],['John',ate,the,cat]). input(['John',ate,the,cat],[ate,the,cat]). input(['John',ate,the,cat],[the,cat]). input(['John',ate,the,cat],[]). input([X|Y],Y,X). CSA2050: DCG I
DCG Notation • The conversion of CF rules into Prolog is so simple that it can be done automatically. • Clauses in DCG notation:s --> np, vp.np --> d, n.n --> [cat]. are automatically translated when read in tos(P1,P2) --> np(P1,P),vp(P,P2).np(P1,P2) --> d(P1,P), n(P,P2).n([dog|L],L). CSA2050: DCG I
DCG Notation • Every DCG rule takes the formnonterminal --> expansionwhere expansion is any of • A nonterminal symbol np • A list of non-terminal symbols [each,other] • A null constitutent [ ] • A plain Prolog goal enclosed in braces {write('Found')} • A series of any of these expansions joined by commas. CSA2050: DCG I
1. Grammar s --> np, vp. np --> n. np --> d, n. vp --> v. vp --> v, np 2. Lexicon d --> [the]. n --> [cat]. n --> ['John']. v --> ['ate']. 3. Input 4. Query ?- s(['john', ate, the, cat], []). Complete DCG CSA2050: DCG I
Checkpoints • What is your system's translation of s --> np, vp.n --> [cat]. CSA2050: DCG I