490 likes | 628 Views
Prepositional Phrase Attachment & Generation of Semantic Relation. Ashish Almeida (03M05601) Guide: Pushpak Bhattacharyya. Problem Definition. Semantics Extraction English to UNL: UNL: Language independent knowledge representation Some important problem
E N D
Prepositional Phrase Attachment & Generation of Semantic Relation Ashish Almeida (03M05601) Guide: Pushpak Bhattacharyya
Problem Definition • Semantics Extraction • English to UNL: • UNL: Language independent knowledge representation • Some important problem • Prepositional phrase (PP) attachment • Semantic head detection • PRO resolution • Generation of semantic relations
UNL: Semantics Representation • He read the book on physics • Universal Networking Language – UNL • Knowledge representation through graph • Concepts and relationships among them • Universal word (UW) • - unique concept • Relation • - connect two UWs read agent object He book modifier physics
Example: PP Attachment • He read the book on physics Incorrect Correct read read on He book He book the the physics on physics
Overview • Problem definition • Previous work • PP Attachment • Semantic Head Detection • PRO resolution in infinitival-to • Automatic Dictionary Enrichment • Rules and implementation • Results & Conclusion • References
Previous Work • English to UNL analysis • P. Bhattacharyya: UNL analysis process • PP attachment • Ratnaparakhi: probabilistic approach • Brill: rule based approach • Semantic relations • P.Pantel: detection of different roles of preposition
The Sentence Frame [V-N-P-N] • [ V-NP1-P-NP2 ] • Attachment problem (V or NP1) • NP: simple noun phrase without any embedded clause or prepositional phrase • Sufficient context information • Comparison with other’s work • Example: • He [is reading]V [this book]NP1 [for]P [his exam]NP2. • Solution to PP attachment • - based on argument structure theory.
Argument Structure (AS) of Verb • Example: He forwarded the mail to John. • Forward (X, Y) • Forward (the mail, John) • The verb takes to-PP as a complement • The verb also determines the choice of preposition, i.e., to • Important clue: the noun after ‘to’ attaches to verb ‘forward’
Argument Structure: Nouns • Example: We received [[an invitation] to the wedding]. • noun attachment • invitation (wedding) • Noun ‘invitation’ demands to-PP as an argument • Receive (invitation (wedding) )
Augmenting the Dictionary Entries [forward] “forward(icl>do)” (V, VOA, #_TO_AR2) verb English word UW Attributes list Action verb 2nd argument is to-prepositional phrase • Dictionary encodes the knowledge through this attribute (#_TO_AR2) that the verb ‘forward’ takes to-PP as second argument.
PP Attachment • In [V-N1-P-N2] frame, • N2 can attach to V or N1 • It depends on argument taking property of both V and N1 • 2 cases: V may or may not demand P-N2 • 2 cases: N1 may or may not demand P-N2 • While attaching N2 to V or N1, Priority is given • First to argument-hood • Second to neighbor-hood ... of V and N1
PP Attachment Table • Four cases: for example for the frame [V-N1-of-N2]
Automatic Dictionary Enrichment • Oxford Dictionary (OALD): argument structure • WordNet: argument structure • Penn Treebank corpus: PRO controlled-ness property of verbs
add•ition noun …… 2 [C] ~ (to sth) a thing that is added to sth else: the latest addition to our range of carsan addition to the family(= another child) (NAmE) to build a new addition onto a houselast minute additions to the government’s package of proposals Using Oxford Dictionary • A typical entry in OALD • E.g. noun addition Second Sense “Addition to <something>” indicates that the word ‘addition’ takes to-PP as an argument Added the feature #_TO_AR1 in the attribute list of the noun ‘addition’.
Semantic Relations • The semantic relations between verb and its argument is an idiosyncratic property of the verb • Semantic relations of arguments are stored in the lexicon as feature • Using Beth Levin’s verb category • Verbs in same class behave similarly • syntactically and semantically • Example: • Give type verbs: give, lend, pay, sell, refund • Give - #_TO_AR2_, #_TO_AR2_GOL
Semantic Head Detection • In case of NP involving [N1-of-N2], • Syntactically, N1 is head • University of Mumbai • Bunch of sticks • Semantically, N1 or N2 can be head • Bunch of sticks • Sticks is semantic head • qua (sticks, bunch)
Example: Semantic Head V V N1 N2 N1 N2 Saw the book of physics Drank a cup of milk
Partitives • Dictionary enrichment • Identified and classified such quantity words • Numbers- one-third, dozen • Container- can, cup, bag • Collection- bundle, group • Measure- inch, gram • Indefinite amount - drop, dose • #PARTITIVE attribute is given to such words
Solution: Semantic Head detection • Given the sentence frame [N1 of N2], if N1 has the attribute #PARTITIVEthen N2 becomes semantic head • Quantity (qua) relation is generated. • For example • Cup of tea • qua (tea, cup)
What is PRO? • PRO: • pronominal, anaphoric • He wants [to go]IP . • Heiwants [PROito go]. • Subject of ‘go’ is same as subject of ‘want’, i.e. ‘he’ • PRO is co-indexed with the subject ‘he’
PRO: Idiosyncratic • PRO: • Subject controlled • Hei promised me [PROito come for the party]. • Object controlled • He ordered usk[PROkto finish the work]. • Promise – subject controlled • Order – object controlled • Added as an attribute of the verb
PRO Resolution • If • the verb has “sub/obj-cotrpolled-PRO” property • and has to-infinitival clause • Then • copy the subject/object of that main clause to the position of PRO and give it same UW-id (unique identifier).
PRO Realization in UNL • They promised Mary [to give a party]
Dictionary Enrichment : PRO ((S (NP-SBJ-1 investors) (VP continue (S (NP-SBJ *-1) (VP to (VP pour (NP cash) (PP-DIR into (NP money funds)))))) .)) • Penn Tree Bank Corpus • Annotated with co-indexed PRO information • NP-SBJ-1 is also subject of to-clause *-1 • Thus the verb ‘continue’ will get attribute ‘subject-controlled-pro’ E.g.: They ____ him to write the letter. English Wordnet provide such frames against verbs, which indicates that the verb takes to-inf as an argument
UNL system Dictionary English sentence UNL expression Enconnvertor Rule-base For English
Enconvertor: Analysis • Enconvertor • Rules based • Similar to Turing machine • Two analysis heads (windows) • Many condition heads (windows) • Move over a sentence • Usually, word by word
Rules: Shift • Shift (can move left or right) • Right shift over a sentence by a word • For instance, R{V,^# FOR AR2:::}{N:::}(PRE,#FOR)P60; Move to the right (R) over the sentence, if the left analysis window {V,^# FOR AR2:::} is on verb which does not expect for-PP as second argument (^ indicates negation) And right analysis window {N:::} is on noun And next condition window (PRE, #FOR) matches to a preposition FOR The rule has absolute priority of 60. (255 is hightest)
Rules: Reduce • Reduce (delete a node and/or relate it to other node) • Delete a node and create a relation <{V,#_FOR_AR2,#_FOR_AR2_rsn:::}{N,FORRES,PRERES::rsn:}P25; Delete word under right analysis window while creating a reason (rsn) relation with the verb on its left, if The left analysis window {V,#_FOR_AR2,#_FOR_AR2_rsn:::} is on verb which expects for-PP as second argument (#_FOR_AR2) And right analysis window {N,FORRES,PRERES::rsn:} is on a noun which also specifies rsn relation to be created The rule has absolute priority of 25. (255 is hightest)
Limitations • Prerequisite: • word sense disambiguation • Dictionary contains all words of the sentence • Multiword or named entity detection is based on dictionary lookup • Arbitrary PRO is not handled
Results • Semantic Head Detection • Temporal analysis
Error analysis • Inadequate rules • Missing rules that handle common phenomena leads to wrong UNL • Errors in attributes assigned to entries in dictionary • Spelling errors, missing attributes etc. • Idiomatic constructs
Conclusion • Future work • It can be applied to other prepositions • Special cases like ‘of’ and ‘to’ could be investigated • Clause attachment can similarly be handled • Key idea • Enrichment of dictionary automatically/ semi-automatically • It involves adding syntactic and semantic level attributes
Resources • A. S. Hornby. 2006. Oxford Advanced Learner’s Dictionary of Current English. Oxford University Press, Oxford. • Chris Greaves. 2006. Web Concordancer, http://www.edict.com.hk • George Miller. 2003. WordNet 2.0. http://wordnet.princeton.edu/ • M. Marcus, G. Kim and M. Marcinkiewicz. 1994. The Penn Treebank: annotating predicate-argument structure. ARPA.
References • UNDL Foundation. 2003. The Universal Networking Language (UNL) specifications version 3.2. http://www.unlc.undl.org • Jignashu Parikh, Jagadish Khot, Shachi Dave and Pushpak Bhattacharyya. 2004. Predicate Preserving Parsing. European Union Working Conference on Sharing Capability in Localization and Human Language Technologies (SCALLA04), Kathmandu, Nepal • Jane Grimshaw. 1990. Argument Structure. The MIT Press, Cambridge, Mass. • E. Brill and R. Resnik. 1994. A Rule based approach to Prepositional Phrase Attachment disambiguation. Proc. of the fifteenth International conference on computational linguistics. Kyoto. • Adwait Ratnaparkhi. 1998. Statistical Models for Unsupervised Prepositional Phrase Attachment. Proceedings of COLING-ACL. http://www.cis.upenn.edu/ adwait/statnlp.html
Contribution • R. K. Mohanty, A. Almeida, Srinivas S. and P. Bhattacharyaa. 2004. The complexity of OF. ICON, Hyderabad, India. • A. Almeida and P. Bhattacharyya. 2007. Semantics of ‘to’ ICCTA 2007, Kolkata, India. • R. K. Mohanty, A. Almeida and P. Bhattacharyaa. 2005. Prepositional Phrase Attachment and Interlingua.CCLING-2005 Workshop, Mexico, India.
Questions - Prof. S. Kaushik • The lexicon carries lot of information which will make development of lexicons very difficult task. Subsequently this will make processing slow and inefficient. Comment on this. • The entries in the lexicon has following structure • [Head-word] “Universal Word” (attribute list) • In our work, we have been adding more attributes into this attribute list. This does not complicate the dictionary. In MT based system it is common practice to have many attributes for each word in the lexicon. Addition of more attribute to the words has no effect on number of entries in the dictionary. However, if the dictionary size increase, the dictionary access can be made faster with the help of database storage and proper indexing scheme. • Also, We have tried to address the issue of creating/ enriching the lexicon automatically through annotated corpus/ oxford dictionary to simplify the dictionary creation.
Are the existing lexicons and rules scalable? • Existing lexicon and rules are scalable. • We can add more entries into lexicon. It uses indexing, so that there will be little difference in speed since the access time will be in terms of O(log n). • Rules can also be extended. Though for a given language (say English) rules will be finite in number. Thus there will not be any sizable increase in the number of rules.
Can your approach be extended for other languages? • This work is done specifically for English. It uses heavily argument structure information and word properties. • But the linguistic theory can also be applied while solving similar problems in other languages. The algorithm developed for attachment can be tried out on languages which have structure similar to English.
How significant is the UNL base for the work reported here? If the translation framework was something else, how much would that affect the work done? • UNL is a well known interlingua. Some other interlinguas are LCS (Lexical Conceptual Structure) by Dorr and Conceptual Structures. These interlinguas do not have computer information support. Since there representation is complex compared to UNL. There is a universal language called Esperanto. But it also lacks preciseness and hence is difficult to represent in the computer. • Any framework will have two parts: enconversion and deconversion. Difficulty of analysis depends on how deeply that framework encodes the knowledge. Besides, this work is based on argument structure theory and semantic properties of the words. Hence any framework can be used for this.
What was the methodology adopted for the analysis reported in chapters 4-7? • Our approach is based on linguistic theory and principles. The process involves corpus lookup, extraction of different syntactic patterns form the corpus and its analysis. We relied mainly on concordance search on Brown corpus and BNC corpus. Initially, we focused on analysis of sentences with only of-PPs. For testing we used sentences from BNC corpus and WSJ data-set used by Ratnaparkhi. • For study of partitives, we manually looked for partitives in the corpus in addition to using thesaurus and Wordnet ontologies. • For dictionary enrichment, we referred to various available resources. We explored them to extract desired features for the dictionary.
How do you know if the categories identified for this analysis are exhaustive? Are there alternative ways to categorise? Is there a basis for categoraisation? • For verbs, we used Beth Levin work on verb classification and Wordnet. Wordnet ontologies are used for noun categories. • In the case of prepositions, we tried to categorize prepositions according to their roles, i.e., temporal, spatial, manner etc. But except for temporal, we were not able to do much work in this direction. We found that unless we do analysis of each preposition individually, it would be difficult to categorize them. So we chose to do complete analysis of individual prepositions. This led us to select much common prepositions such as of and to.