1 / 29

OUTLINE

An Italian-English dependency parser and its [possible] application to Hindi Leonardo Lesmo (lesmo@di.unito.it) Natural Language Processing Group (Dip. Informatica – Univ. Torino) (http://www.di.unito.it/gull). OUTLINE. The Turin University Parser Performances

raleigh
Download Presentation

OUTLINE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Italian-English dependency parser and its [possible] application to HindiLeonardo Lesmo(lesmo@di.unito.it)Natural Language Processing Group(Dip. Informatica – Univ. Torino)(http://www.di.unito.it/gull) CGMIL 2008 - Hyderabad - India

  2. OUTLINE • The Turin University Parser • Performances • The Turin University Treebank (TUT) • Mapping between TUT and AnnCorra • Current activities and the future CGMIL 2008 - Hyderabad - India

  3. THE PARSER Chunking rules Dictionary Tagging rules Chunking Lexical access POS tagging Morphology Analysis of Conjunctions Verbal subcategories Verbal Attachment Post-processing Segmentation Verbal frames CGMIL 2008 - Hyderabad - India

  4. chunking When [the man] that you mentioned sent me [that beautiful message], I fell [in love][with him] segmentation {{When [the man]{that you mentioned} sent me [that beautiful message]}, I fell [in love][with him]} caseframing AN EXAMPLE When the man that you mentioned sent me that beautiful message, I fell in love with him CGMIL 2008 - Hyderabad - India

  5. to fall verb+fin-rmod-time verb- indcomp*locut rmod verb-subj when in I with conj-arg prep-arg prep-arg to send love him verb-subj verb-obj verb-indobj the me that det+def- arg det+def- arg man message verb-rmod+ relcl adjc+qualif-rmod to mention beautiful verb-obj verb-subj that you THE FINAL RESULT CGMIL 2008 - Hyderabad - India

  6. THE ACTUAL FORMAT 1 When (WHEN CONJ SUBORD TIME) [7;VERB+FIN-RMOD-TIME] 2 the (THE ART DEF ALLVAL ALLVAL) [7;VERB-SUBJ] 3 man (MAN NOUN COMMON M SING) [2;DET+DEF-ARG] 4 that (THAT PRON RELAT ALLVAL ALLVAL LSUBJ+OBL) [6;VERB-OBJ] 5 you (YOU PRON PERS ALLVAL ALLVAL 2 LSUBJ+LOBJ+LIOBJ+OBL) [6;VERB-SUBJ] 6 mentioned (MENTION VERB MAIN IND PAST ALLVAL ALLVAL) [3;VERB-RMOD-RELCL] 7 sent (SEND VERB MAIN IND PAST ALLVAL ALLVAL) [1;CONJ-ARG] 8 me (I PRON PERS ALLVAL SING 1 LOBJ+LIOBJ+OBL) [7;VERB-INDCOMPL-THEME] 9 that (THAT ADJ DEMONS ALLVAL SING) [7;VERB-OBJ] 10 beautiful (BEAUTIFUL ADJ QUALIF ALLVAL ALLVAL) [11;ADJC+QUALIF-RMOD] 11 message (MESSAGE NOUN COMMON N SING) [9;DET+DEF-ARG] 12 , (#\, PUNCT) [14;SEPARATOR] 13 I (I PRON PERS ALLVAL SING 1 LSUBJ) [14;VERB-SUBJ] 14 fell (FALL VERB MAIN IND PAST ALLVAL ALLVAL) [0;TOP-VERB] 15 in (IN PREP MONO) [14;PREP-RMOD] 16 love (LOVE NOUN COMMON N SING) [15;PREP-ARG] 17 with (WITH PREP MONO) [14;PREP-RMOD] 18 him (HE PRON PERS M SING 3 LOBJ+LIOBJ+OBL) [17;PREP-ARG] 19 . (#\. PUNCT) [14;END] CGMIL 2008 - Hyderabad - India

  7. Results: Evalita 2007 LAS: Labeled Attachment ScoreUAS: Correct Attachment ScoreLAS2: Correct Label Score CGMIL 2008 - Hyderabad - India

  8. Comparison with CoNLL CoNLL: International contest for dependency parsers (multilanguage) CGMIL 2008 - Hyderabad - India

  9. The Turin University Treebank (TUT) • Current size:Italian: 2200 sentences 62445 tokens (4635 traces; 6704 punctuation)English: 150 sentences 4250 tokens (253 traces; 513 punctuation)English not yet online (under test) CGMIL 2008 - Hyderabad - India

  10. Parts of Speech(and “subtypes”) 1. ADJ (adjectives) - DEITT (deictic) next - DEMONS (demonstrative) such, this, that - EXCLAM (exclamative) - INDEF (indefinite) numerous, certain, few - INTERR (interrogative) what, which - ORDIN (ordinal) first, twentieth, last - ORDINSUFF (ordinal suffixes) nd, rd, th, st - POSS (possessive) my, your, their - QUALIF (qualificative) nice, big, English 2. ADV (adverbs) - ADFIRM (adfirmative) - ADVERS (adversative) although, though - COMPAR (comparative) less, more - CONCESS (concessive) also - DOUBT (doubt) perhaps - EXPLIC (explicative) that_is - INTERJ (interjections) at_any_rate - INTERR (interrogative) how, where, when, why - LIMIT (limit) just, only - LOC (locative) there, within, below, here - MANNER (manner) aloud, alright, well - NEG (negation) not - QUANT (quantification) little, rather, too - REASON (motivation) in_fact - STRENG (strengthening) even, moreover - SUPERL (superlative) most - TIME (time) sometime, afterward, already CGMIL 2008 - Hyderabad - India

  11. Parts of Speech(and “subtypes”)2 3. ART (articles) - DEF (definite) the - INDEF (indefinite) a, another, - GENITIVE (genitive): 's 4. CONJ (conjunctions) - COORD (coordinative) and, but, or, neither, nor - SUBORD (subordinative) since, that, to, unless - COMPAR (comparative) than 5. DATE (dates) 08/06/2008 6. INTERJ (interjections) alas 7. MARKER (markers) 8. NOUN (nouns) - COMMON house, boy, chair - PROPER Mary, Italia, Italy, England 9. NUM (numbers) zero, twenty, 127, 3.14 10. PHRAS (phrasals) yes, no 11. PREDET (predeterminers) all, both 12. PREP (prepositions) - MONO of, to, from, in - POLI during, above, under, in front of CGMIL 2008 - Hyderabad - India

  12. Parts of Speech(and “subtypes”)3 • 13. PRON (pronouns) • DEMONS (demonstrative) this, that, • EXCLAM (exclamative) what • INDEF (indefinite) everything, nobody, something • INTERR (interrogative) what, who • LOC (locative) I: ne, ci, vi • ORDIN (ordinals) first, second, fiftieth • PERS (personal) I, you, we, her • POSS (possessive) mine, yours • REFL-IMPERS (reflexive-impersonal) ci, vi, si, se • RELAT (relative) that, who, which, where • 14. PUNCT (punctuation) • 15. SPECIAL (special) • 16. VERB (verbs) • MAIN (all standard verbs) go, eat, give, be (in “to be intelligent”) • AUX (auxiliaries) be (in “to be kissed”) • MOD (modals) must, can, will CGMIL 2008 - Hyderabad - India

  13. Top Dependent Nofunction Function Arg Modifier adjc-arg Apposition Rmod advb-arg noun-arg conj-arg verb-arg verb-subj verb-indobj verb-predcompl verb-obj verb-indcompl The labelling scheme CGMIL 2008 - Hyderabad - India

  14. Nofunction Verb-expletive Aux Emptycompl Visitor Separator Contin Interjection Aux+progressive Aux+tense Coordinator Aux+passive Coord Contin+locut Coord2nd Contin+denom Contin+prep Coordantec The NOFUNCTION labels CGMIL 2008 - Hyderabad - India

  15. Auxiliaries Aux+progressive: I am looking for … Aux+tense: … the debate has – to quite some extent - suffered from … Aux+passive: … whose historical experience is not marked by … Continuations Contin+locut: … convinced of the feasibility … in order to reinforce … Contin+prep: … grown out of the millenniums … Contin+denom: Samuel Alexander asserted … Some examples CGMIL 2008 - Hyderabad - India

  16. the det+def- arg question prep-rmod of prep-arg what verb-rmod+relcl might verb+modal-indcompl verb-subj visitor we trace consider verb-predcompl+obj verb-subj verb-obj trace trace to prep-arg be verb-subj verb-obj trace an Visitors (and traces) The question of what we might consider to be an adequate … CGMIL 2008 - Hyderabad - India

  17. Coordination coord+base coord2nd+base base: … is tautologous and without ontologic commitment … coordantec+compar coord+compar coord2nd+compar compar: … were more like mythical heroes than like the omnipotent God … coordantec+correlat coord+correlat coord2nd+correlat correlat: … neitherJohn norhis friends … coord+base coord2nd+base compar: … Samuel asserted that mentality emerged … and then tassertedtSamuel that… … and “word” traces CGMIL 2008 - Hyderabad - India

  18. The AnnCorra scheme • It is chunk-based (some elementary subtrees are left unanalysed) • Some POS are merged (e.g. Demonstratives include both Adj and Pron) • It involves 28 relations (arc labels) and 25 different POS (tabel below) • There are some non-dependency labels (as for coordination (ccof) CGMIL 2008 - Hyderabad - India

  19. Mapping category labels CGMIL 2008 - Hyderabad - India

  20. Mapping arc labels The argument (karaka) labels • k1 (karta): the primary (or “most independent”) participant in the action (similar to agent) VERB-SUBJ • k2 (karma): this is the secondary participant (often, the patient).  VERB-OBJ • k3 (karana): the instrument. VERB-INDCOMPL-MEANSMANNER • k4 (sampradana): recipient or the beneficiary of an action  VERB-INDOBJ • k5 (apadana):the stationary element in a separation  ???? • k7 (adhikarana): the locus (spatial or temporal or abstract) of karta or karma. It is tagged as k7p, k7t or k7 depending on the type of location.  VERB-INDCOMPL-LOC CGMIL 2008 - Hyderabad - India

  21. must verb-subj verb+modal-indcompl I read verb-subj verb-obj the t det+def-arg book Mapping the structure Chunk-based structure of AnnCorra (must read) k1 k2 I (the book) CGMIL 2008 - Hyderabad - India

  22. study' heard 1 2 verb-subj verb-obj x y two three 1 1 1 det+quantif-arg det+indef-arg student' theorem' difficult' students theorems adjc+qualif-rmod restr(x): restr(y): difficult x y 1 1 1 student' theorem' difficult' quant(y): quant(x):   Current activities and the future A word about semantics: DTS CGMIL 2008 - Hyderabad - India

  23. CTX CTX CTX study' study' study' 1 1 1 2 2 2 x x x y y y 1 1 1 1 1 1 1 1 1 2x [ student’(x)  3y [theorem’(y)  study’(x,y) ]] 3y [ theorem’(y)  2x [student’(y) study’(x,y) ]] student' student' student' theorem' theorem' theorem' difficult' difficult' difficult' ??? Branching Quantification (Independent Set) Disambiguation: Semdep arcs Any more reading? CGMIL 2008 - Hyderabad - India

  24. Current activities and the future Practical semantic interpretation based on ontological knowledge for DB access Automatic analysis of legal texts for extracting information about trule amendments (date, modified text, new text) Extension of the treebank with semantic annotation (in cooperation with Johan Bos) Development of a graphical interface with a online server (Java implementation and socket-based connection with a Lisp server) CGMIL 2008 - Hyderabad - India

  25. The future (last but not least) In cooperation with IIIT Hyderabad Morphological analysis of Hindi (mid-way) Development and testing of a Hindi parser and of mapping rules from Hindi to English and viceversa CGMIL 2008 - Hyderabad - India

  26. Function: ? ? ? ? ? ? wn w1 w2 wi-1 HEAD= wi wi+1 wi+2 ….. ….. Structure: (head-category head-subcategory (dependent-position (dependent-category (dependent-constraints))) ARC-LABEL) More on Parsing 1 CGMIL 2008 - Hyderabad - India

  27. PDETMOD i (cat=ART, subcat=DEF gender=m, number=pl) tutti (cat=PREDET, gender=m, number=pl) all the ADJCMOD-QUALIF molto (cat=ADV) bello (cat=ADJ, subcat=QUALIF, gender=m, number=sing) giardino (cat=NOUN, gender=m, number=sing) garden very nice More on Parsing 2 Examples: (ART DEF (before (PREDET (agree))) PDETMOD) (NOUN COMMON (chunk-follows (ADJ (agree) (subcat qualif))) ADJCMOD-QUALIF) CGMIL 2008 - Hyderabad - India

  28. verbs nosubj-verbs ssubj-inf-verbs subj-verbs obj-verbs indobj-verbs bisognare need camminare empty-modal basic-trans walk modal trans dovere must potere can trans-indobj dictionary subcategorization classes More on Parsing 3 Verb subcategorization classes: CGMIL 2008 - Hyderabad - India

  29. basic class (e.g. trans) transformed classes (e.g. trans, trans+passivization, trans+infinitivization, trans+prodrop, trans+passivization+infinitivization, ….. ) More on Parsing 4 Transformations: Example transformation: (infinitivization replacing (subj-verbs) (is-inf-form tr-verb v-casefr) (cancel-case s-subj)) CGMIL 2008 - Hyderabad - India

More Related