1 / 41

Gerardo Sierra, César Aguilar & Rodrigo Alarcón (gsierram, caguilar, ralarconm)@iingen.unam.mx

Verbal Predications for Definition Extraction from Specialised Corpora. Gerardo Sierra, César Aguilar & Rodrigo Alarcón (gsierram, caguilar, ralarconm)@iingen.unam.mx 4th International Conference Practical Applications in Language Corpora Lodz, Poland, 4 – 6 April 2003. Outline.

matia
Download Presentation

Gerardo Sierra, César Aguilar & Rodrigo Alarcón (gsierram, caguilar, ralarconm)@iingen.unam.mx

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Verbal Predications for Definition Extraction from Specialised Corpora Gerardo Sierra, César Aguilar & Rodrigo Alarcón (gsierram, caguilar, ralarconm)@iingen.unam.mx 4th International Conference Practical Applications in Language Corpora Lodz, Poland, 4 – 6 April 2003

  2. Outline • Introduction • Background • Recurrent patterns in definitional contexts • Verbal paradigm evaluation • Conclusions

  3. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Introduction • Terminographical work: to identifie terms and definitions from specialised texts • Main goal: To develop a conceptual information extraction system

  4. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Conceptual information extraction system • To identify recurrent patterns in textual fragments where a term is defined • To expand the paradigm of the recurrent patterns

  5. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Conceptual information extraction system • To evaluate those paradigms • To develop a computational linguistic technique to retrieve definitional contexts

  6. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Definitional Context • Textual fragment of a specialised text that contain the necessary information to define a term. Term • En este estudio, la “erosión del bordo” se entiende como el desgaste de las superficies expuestas a la acción directa del agua y viento llegando a producir un adelgazamiento de tal magnitud que propicie o permita el paso del agua (por pérdida del bordo libre) o el desplome de una parte del bordo (por debilitamiento local) Definition Definitional Context

  7. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Definitional Context • Textual fragment of a specialised text that contain the necessary information to define a term. In this studie Quotation marks It Is understand as • En este estudio, la “erosión del bordo” se entiende como el desgaste de las superficies expuestas a la acción directa del agua y viento llegando a producir un adelgazamiento de tal magnitud que propicie o permita el paso del agua (por pérdida del bordo libre) o el desplome de una parte del bordo (por debilitamiento local) Definitional Context Characteristic elements

  8. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Background • Jennifer Pearson (1998) Terms in Contexts • Ingrid Meyer (1998) Knowledge-rich contexts • Carlos Rodríguez (1999) Operaciones Metalingüísticas Explícitas

  9. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Corpus based analysis • Engineering texts: • Logistics • Transport • Expert systems • Bioclimatic structures • Artificial intelligence

  10. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Definitional contexts’ elements • Minimal elements : • Term (T) • Definition (D)

  11. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Definitional contexts’ elements • Characteristic elements : • Tipographical mark (tm) • Pragmatic predication (PP) • Verbal predication (VP)

  12. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Patterns clasiffication • Typographical • Sintactic • Mixed

  13. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Typographical patterns • Text format factors to emphasise both term and/or definition • Exclude a verbal predication • Verbal predications are substituted by punctuation marks

  14. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Typographical patterns

  15. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Syntactic patterns • They do not include typographical features • Pragmatic and verbal predications • (PP/VP) + T/D + (PP/VP) + D/T + (PP)

  16. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Pragmatic predications • Information about usage or treatment of the term. • Clues to understand a term in the context it appears.

  17. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Pragmatic predications • Adverbial phrases: • generalmente (generrally) • Prepositional phrases: • en terminos generales (in general terms) • Simple words: • concepto (concept)

  18. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Verbal predications • Verbs to connect a term qwith its definition • Commonly called metalinguistic verbs • definir (to define) • denominar (to denominate)

  19. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Verbal predications • Simple forms • verb + grammatical particle • Compound forms • pronoun se + verb + grammatical particle

  20. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Verbal predications • Simple forms • tambien llamado (also called) • consiste de (consist of) • Compund forms • se define como (it is defined as) • se denomina como (it is denominated as)

  21. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Sintactic patterns

  22. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Mixed patterns • Tipographical marks • Syntactic elements • Pragmatic predications • Verbal predications

  23. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Mixed patterns

  24. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Verbal paradigm evaluation • To expand the verbal paradigm obtained • What grammatical particles could appear with each verb?

  25. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Verbal predicactionexamples

  26. Introduction • Background • Recurrent patterns • Evaluation • Conclusions CREA • Corpus de Referencia del Español Actual (Reference corpus of today’s Spanish) • www.rae.es • Boolean operators • (* / AND / AND NOT / OR / Dist #) • Restrictive criteria • (Theme / Media / Geographical)

  27. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Results DCI = Definitional Context Index definitional contexts / textual fragments retrieved

  28. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Verbal predicaction evaluation • Automatic search of the expanded verbal paradigm • Precision & Recall

  29. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Automatic search • All the structures possibilities of verbs • Verbal times (presente, pasado, futuro, antepresente del modo indicativo) • Grammatical pearsons (1, 3 plural and singular) • Without pragmatic predications

  30. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Precision & Recall Precision definitional contexts automaticallyretrieved textual fragments automaticallyretrieved Recall definitional contexts automatically retrieved definitional contexts in the corpus

  31. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Precision & Recall It is defined as It is based on It is denominated as To visualise It is considered as

  32. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Precision & Recall improve • Recall • To expand the verbal paradigm (grammatical particles) • Precision • To consider other characteristic elements (typographical marks, pragmatic predications)

  33. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Corpus tagging • Some tags to consider: • Fonts • Size • Colour • Capital and small capital letters, etc. • Head elements (titles, subtitles, etc.) • Word spacing: “los d a ñ o s se definen como…” • Bullets, footnotes, quotes, superscripts, subscripts… • No need for tagging of punctuation(it can simply be recognised)

  34. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Corpus tagging • POS Tags • Necessary to determine internal structure of phrases (noun, verbal and adverbial) which constitute • Terms, • Definitions, • Verbal and pragmatic predications

  35. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Corpus tagging • POS Tags • Some attributes are not relevant • Gender and number (Noun Phrases) • Verbal tense inflexion (present, future, past, imperfect, subjunctives… etc.) • Relevant attributes • Grammatical person (Verbal Phrases): • Conceptual information typically introduced by 3rd person • Whether or not a verb is auxiliary

  36. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Corpus tagging • Parsing Tags • Necessary to determine syntactic relations among: • All kinds of phrases involved within (and without) • Terms • Definitions • Verbal and pragmatic predications

  37. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Corpus tagging • POS Tags • Syntactic phrases • Terms may consist of both NP + PP (Cabré 2001) • Definitions are composed of at least one well formed sentence (one or more syntactic phrases) • Pragmatic predications (related to style) • Prepositional Phrase: “en términos generales” (in general terms) • Noun Phrase: “la característica principal” (the main characteristic) • Adverbial Phrase: “tradicionalmente” (traditionally) • Overlapping: prepositional phrases with adverbial function

  38. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Corpus tagging • POS Tags • Verbal predications • Metalinguistic verbs: “se define como” • Non metalinguistic verbs: “se visualiza como” • Other structures • Verbal phrases consisting of verb + noun, where the verb has been semantically eroded • “tiene la finalidad” (it has the aim)

  39. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Conclusions • Definitional contexts extraction system • Linguistic analysis of definitional contexts • Recurrent patterns

  40. Introduction • Background • Recurrent patterns • Evaluation • Conclusions Conclusions • Expand and evaluate the verbal paradigm • Corpus based study

More Related