270 likes | 448 Views
Computer Simulation of a Serial Parsing Model. Doan-Nguyen Hai haidoan@cs.concordia.ca CLaC (Computational Linguistics at Concordia) Department of Computer Science Concordia University (Montréal, Canada). Introduction.
E N D
Computer Simulation of a Serial Parsing Model Doan-Nguyen Hai haidoan@cs.concordia.ca CLaC (Computational Linguistics at Concordia) Department of Computer Science Concordia University (Montréal, Canada)
Introduction • Question: How about a computer simulation for serial syntactic parsing, which achieves a medium coverage and can deal with some classes of long and difficult sentences? • We implemented such a computer simulation: • Based on the garden-path theory (Frazier, 1978; Frazier & Clifton, 1996) • Parsing process is basically incremental and input-driven (bottom-up), but some top-down mechanisms are also included
Introduction (2) • Focus on reanalysis • Not based on context-free grammars. Using context extensively. Linguistic knowledge is represented procedurally in Prolog clauses • Considering characteristics of computer to obtain high performance whenever psycholinguistic knowledge is not available • Leaving space for non-structural information in future extension (a human-simulating sentence interpretation program)
Introduction (3) • Limitations: not yet considered: • Lexical ambiguities: words of multiple categories • Conjunctive structures, sentences with punctuations
State-of-the-Art • Computational implementations: • Marcus 1980: deterministic serial • McRoy & Hirst 1990: serial model, no reanalysis • Lombardo 1998: top-down parsing + selective reanalysis • Stevenson 1994: connectionism • Crocker & Brants 2000: parallel, probabilistic
The Algorithm Basic loop: incremental CurrentTree := NIL; whilenot end of sentence do InNode := next input word; attach InNode to CurrentTree; end How to do attachment ? Where to attach InNode into CurrentTree ? In principle: examine all information from the tree and the input word. Attachment may be costly!
First refinement: Attachment with Waiting Nodes • A node is waiting for some constituent(s) • obligatory: I look for … • expected: I am reading … • Incoming element = expected constituent ? Quick checks! • I read … the (book) / the (teacher’s book) / good (books) • First refinement: attach(InNode) :- waiting_node(WaitingNode), attach_wait(WaitingNode, InNode); ... . % more procedures
More refinements • Two cases: • InNode can be attached to CurrentTree: specific attachment • InNode cannot be attached to CurrentTree: reanalysis
Specific Attachment • attach_spec(InNode) examines specifically the tree and the input element for an attachment • I read it … loudly • I went to the theatre of the town... to see (her) • Resolve attachment ambiguities and improve efficiency with psycholinguistic preferences. Classical examples: • Late Closure: He said that John came … yesterday. • Minimal Attachment: The boy … found ... • PP-attachment: V preferred to N: I founda bird … in (the garden)
Reanalysis • Reanalysis • I knew the cat in the garden … was (smart) • The cat found in the garden … was (smart) • I told him the story … was (interesting) • Is the girl sitting there … intelligent? • Recent focus in psycholinguistics • Frazier & Rayner 1982: Selective Reanalysis. • Fodor & Inoue 1994: Diagnosis model: repairing (not reparsing!) after diagnosis. • Fodor & Inoue 1998: Attach Anyway + Adjust • Others: Lewis 1998 (NL-Soar), Sturt & Crocker 1996 (Tree Lowering)
Reanalysis (2) • Reanalysis: symptoms + guesses + verifications + repair • The cat found in the garden … was (smart) • symptoms: 'was' has no subject but needs one • guesses: the potential subject may be now the object of a verb, or the subject of a clause • verifications • repair the current tree
Reanalysis instead of easy attachment • verbs with multiple subcategorizations • I will bring the children … some food • v - 1 object --> v - 2 objects (Not a relative!) • I tell the children … that … • I tell the children … she … • v - object --> v - object - clause (Not relative) • I want him … to come • v - object --> v - (CP (TP him to (VP come))) (Not a purposive adjunct) • Although there is no breakdown and attachment is possible, reanalysis should be invoked instead!
Reanalysis instead of easy attachment (2) • Hypothesis: When a verb is satisfied by the incoming element, the parser still remembers its other subcategorizations by registering the verb as a may_wait node. • attach_may_wait(Elt) considers the possibility of reanalysis before doing specific attachment • I want him • verb - obj: (VP want (DP him)) • may_wait(want, other subcats of 'want') • … to come • attach_may_wait reanalyzes the current tree: (VP want (CP (TP (DP him) to (VP come)))
Reanalysis instead of easy attachment (3) • empty categories • The picturesi you have ei … seen • eiis put right after have (Minimal Chain Principle (De Vincenzi 1991)) • seen can begin a modifier, cf. The pictures you took seen from this perspective are very surrealist • Other examples • The booki I want ei … to read • Whati do you want ei … to read? • The folderi which I keep ei … my files in • Whati do you see ei … the birds with?
Reanalysis instead of easy attachment (4) • Hypothesis: (Similar to multiple subcategorization verbs) An object trace node is registered in may_wait state. During parsing, when the top may_wait node is a trace node, attach_may_wait will check whether InNode needs to be attached to the verb, and if yes, move the trace node somewhere else. • The picturesi you have ei ... • ei is registered as may_wait • When seen arrives, attach_may_wait reanalyzes and gives: • The picturesi you haveseen • The parser then looks for a new position for ei • The picturesi you haveseen ei
Reanalysis instead of easy attachment (5) • If the parser cannot find a new position for ei, it hangs it in the memory for future location • The folderi which I keep ei … my files in is blue • The folderi which I keepei … • The folderi which I keepmy files -- ei is hung in memory • The folderi which I keepmy files in ei-- ei is located
The algorithm attach(N) :- waiting_node(W), attach_wait(W, N); may_wait_node(W), attach_may_wait(W, N); % reanalysis attach_spec(N); reanalyze(N). % reanalysis CurrentTree := NIL; whilenot end of input do N := next word; attach N to CurrentTree; end
DPi DP adjunct waiting node the boy the boy CP CP waiting node who ei TP you Structure postulation the boy … you the boy … who Conform to Frazier 1998: parsing is not purely bottom-up, but also has some top-down features
Forward parsing • Examples • We need... disk… storage... management... software. • I saw the … teacher’s son’s dog. Reanalysis repeated??? Seems unlikely! • I go… to… school / see her. • They are… selected… people / by John. Reanalysis seems not occur in either case • Hypothesis: With normal pronouncing rhythm, reanalysis does not occur here but forward parsing
Forward parsing (2) • Forward parsing: the parser delays attachment and continues to work with the input stream to gather enough information for a good attachment • We need... disk… storage... management... software. When disk arrives, the parser does not attach it immediately, but continues parsing forward to get the head of the DP • I go… to… school / see her. When to arrives, the parser continues parsing forward to know whether it will get a PP or an infinitive VP
Implementation • Programmed in Prolog (Windows) • Coverage: many basic and 'advanced' structures of English. See test examples. • Test: • Time complexity: • Theoretical ? • Practical: seems good (linear ?)
Test Results (on 300 sentences) Time (operations)
Some test sentences • The folder I kept my letters in is blue. • Is the girl working in the house your daughter? • Is the girl working in the house your daughter bought recently? • What exactly do you want to see in this old house? • The big white horse raced quickly past the old barn fell behind the house. • The girl who I hear they say he thinks John loves is their daughter. • To drink with friends that you have not met for a long time is very interesting. • Do you notice the old tall woman who sits looking at your daughter's little dog has spent a long time here? • Are the three little boys in the computer group chosen recently by the university where your friend is studying computer science intelligent boys?
The folder I kept my letters in is blue. 'node(4,d:0:[sg,def],0-1,the) node(2,n:[sg],1-2,folder) - 'node(7,e:o,?,e(d,node(4,d:0:[sg,def],0-1,the))) node(5,c:r,?,[]) 'node(3,d:p:[sg,p(1)],2-3,i) node(6,t:[],?,[]) node(8,v:[o,g]:[cj,past0,r(keep)],3-4,kept) node(13,d:ps:[pl],4-5,my) node(11,n:[pl],5-6,letters) -node(12,p:[],6-7,in) node(15,e:d,?,e(d,node(7,e:o,?, e(d,node(4,d:0:[sg,def],0-1,the))))) node(16,t:[cj,sg,p(3),pres,sg,def],?,[]) node(14,v:be:[cj,sg,p(3),pres],7-8,is) node(17,a,8-9,blue)
Discussions • Future work: • Lexical ambiguities: words of multiple categories • Conjunctive structures, sentences with punctuations • Incorporating non-structural (semantic, real world) information: sentence processing. • Incorporating other techniques: statistics. • Comparing the parser with traditional cfg-based and statistics-based parsers. • Programming the parser on the computer may help us understand more on the human process.
Conclusions • A serial parsing prototype has been implemented with some good results • Particularities: • Reanalysis instead of easy attachment • Forward parsing • Still a lot of things to do
References Crocker, M. & Brants, T., 2000. Wide-Coverage Probabilistic Sentence Processing. Journal of Psycholinguistic Research, Vol. 29, No. 6, 2000. De Vincenzi M., 1991. Syntactic parsing strategies in Italian (Dordrecht: Kluwer Academic, 1991). Fodor, J.D. & Ferreira, F., 1998 (eds). Reanalysis in Sentence Processing. Kluwer Academic. Fodor, J.D. & Inoue, A., 1994. The Diagnosis and Cure of Garden Paths. Journal of Psycholinguistic Research, Vol. 23, No. 5, 1994. Fodor, J.D. & Inoue, A., 1998. Attach Anyway. In Fodor & Ferreira (1998) 101-141. Frazier, L., 1978. On Comprehending Sentences: Syntactic Parsing Strategies. Doctoral dissertation, University of Connecticut. Frazier, L., 1998. Getting There (Slowly). Journal of Psycholinguistic Research, Vol. 27, No. 2, 1998. Frazier, L. & Clifton, C., 1996. Construal. Cambridge, MA: MIT Press. Lombardo, V., 1998. A Computational Model of Recovery. In Fodor & Ferreira (1998) 287-325. Marcus, M., 1980. A Theory of Syntactic Recognition for Natural Language. Cambridge, MA: MIT Press. McRoy, S. & Hirst, G., 1990. Race-based Parsing and Syntactic Disambiguation. Cognitive Science, 14. Stevenson S., 1998. Parsing as Incremental Restructuring. In Fodor & Ferreira (1998) 327-363.