1 / 22

Tibor Laczk ó, György Rákosi & Ágoston Tóth Department of English Linguistics

Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational Linguistic Project. Tibor Laczk ó, György Rákosi & Ágoston Tóth Department of English Linguistics University of Debrecen

kelvin
Download Presentation

Tibor Laczk ó, György Rákosi & Ágoston Tóth Department of English Linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Husse-9 ConferencePécs, 22-24 January, 2009HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational Linguistic Project Tibor Laczkó, György Rákosi & Ágoston Tóth Department of English Linguistics University of Debrecen {laczkot, rakosigy, tagoston}@delfin.unideb.hu Sponsored by OTKA research grant K 72983

  2. Overview • Lexical-Functional Grammar (LFG) • The ParGram Project at PARC • The HunGram Project in Debrecen • A short demonstration: possible ParGram treatments of certain elliptical noun phrases in English and Hungarian

  3. 1/1 Stanford and LFG • LFG as a linguistic theory was developed in the late 1970s. One of the principal aims was to create a framework suitable for massive computational applications, and there has been a lively co-operation between theory and computational linguistic practice ever since. • The two co-founders: • Joan Bresnan (Stanford University, SU) mainly linguistic aspects • Ronald Kaplan (Palo Alto Research Center, PARC and SU, now at Powerset, Inc.)  mainlycomputational aspects • General information on LFG is available at: http://www.essex.ac.uk/linguistics/LFG/

  4. 1/2 Design Principles of LFG • Lexicalism • Modularism • Parallel architecture • Generating and parsing structures are equally important • Rule system that is directly renderable in a mathematical formalism

  5. 1/3Central Modules of LFG constituent structurephonology (language-specific) word order lexicon (powerful) functional structuresemantics (universal) grammatical relations

  6. 1/4 Adpositional phrases in LFG PPPPNP PrNPNP Po Det N Det NDet N near the boxa doboz mellett a doboz-ban in PRED near/in/mellett/-ban, Pr ‘NEAR/IN <(OBJ)>’ OBJ PRED box, N ‘BOX’ DEF + PERS 3 NUM sg near/in, Pr ‘NEAR/IN <(OBJ)>’mellett, Po ‘NEAR <(OBJ)>’ -ban, Nsuff ‘IN <(OBJ)>’

  7. 2/1 PargGram at PARC • The Parallel Grammar (ParGram) project – launched and organized by PARC • LFG-based computational program • Capitalizes on LFG’s flexible general linguistic and computationally implementable architecture • Parser and generator • Goal: to analyze more and more languages on a maximally uniform platform – in the spirit of Universal Grammar

  8. 2/2 PargGram at PARC • A truly international project: English, German, French, Norwegian,Japanese, Chinese, Urdu (India), Malagasy (Madagascar), Arabic, Vietnamese, Spanish, Welsh, Indonesian, Turkish, Georgian, & Hungarian • Further information: http://www2.parc.com/isl/groups/nltt/default.html

  9. 2/3XLE parser • a deep, grammar-based parsing system for implementing lexical-functional grammars; constructed aspart of the ParGram project • output: c-structures and f-structures • supports tokenization and morphological analysis through finite-state transducers (with alternative analyses) • can select the most probable analysis from the potentially large candidateset using stochastic disambiguation (if implemented) • has a generator mode • implemented in C; runs on Solaris, linux, and MacOSX. • bottom line: a facility for writing syntactic rules and lexical entries, and for testing and editing them

  10. toy-eng.lfg the D * (^ DEF)=+. girl N * (^ PRED) = 'GIRL'. walk V * (^ PRED)='WALK<(^ SUBJ)>'; N * (^ PRED)='WALK'. f-structure attribute-value matrices that encode predicate-argument relationsand other grammatical information (e.g. number, tense, case) c-structure context-free phrase-structuretree encoding constituency and linear order

  11. 2/4 Challenging natural language phenomena • Lexical ambiguityHomonymy: bank, fluke; ár, légy, írPolysemy: bulb, line; körte, toll • Structural ambiguityI saw the girl with the telescope.Részegen láttam Jánost. //Egész nap a hajókat néztük a Dunán. // • Word formation (compounding, derivation, minor processes)horror, horrid, horrify; terror, (*terrid), terrify;candor, candid, (*candify)student film society committee scandal video… • Anaphoric referencesa)We gave the bananas to the monkeys because they were hungry.b)We gave the bananas to the monkeys because they were ripe. • Ellipsis

  12. 2/5 Direct challenges • Non-toy lexicon of the Hungarian language, empirical techniques • Tokenization, morphological analysiswalks walk +Verb +Pres +3sg walk +Noun +PlNamed entity recognition Types: person, role, location, organization, brand, title, etc. This is the website of [the University of Debrecen org]. [The University of Debrecen loc] is not far from us. • Parsing performance trade-off between accuracy, usability and speed

  13. 3/1 HunGram • Tibor Laczkó – 2005/2006: Fulbright research grant to Stanford University • a ParGram invitation to PARC  research at two host institutions • two goals at PARC: • familiarity with the formalism (XLE) • starting the implementation in XLE of the results of the research on the morpho-syntax of Hungarian noun phrases (in an LFG framework)

  14. 3/2 HunGram • LFG Research Group (LFGRG) at the Department of English Linguistics, UD • Tibor Laczkó • György Rákosi • Ágoston Tóth • 2 PhD students • XLE software licence from PARC

  15. 3/3 HunGram • OTKA research grant for 2008—2012 (K 72983) (Hungarian Scientific Research Fund) • objectives • developing a comprehensive LFG grammar of the Hungarian language (morphology, syntax, lexicon, semantic issues) • implementing it in HunGram/ParGram • launching an English vs. Hungarian comparative research project on the ParGram platform • incorporating the results in various course materials at the English Linguistics Department • (1 & 2) 3  4

  16. 4/1 Demo Elliptical noun phrases

  17. 4/2 “az öt nagy zöldet” c-structure c-structure + morphology f-structure

  18. 4/3 “the five large green ones”

  19. 4/4 “a három ügyes fiú öt nagy zöldjét”

  20. 4/5 “the three boys’ five large green ones”

  21. 4/6 Elliptical noun phrases • differences • c-structure • morphology (e. g.: +case vs. –case) • English: ‘pro’ realized by an overt element, in the lexicon, in c-structure, in f-structure • Hungarian: ‘pro’ is covert, introduced by a functional annotation in c-structure, present in f-structure • EngGram vs. Hungram wrt to the number of features • similarities • f-structure – except for typological differences (case etc. features) • as Hungram gets more and more developed, more and more shared EngGram (ParGram) features • proposal • (previous talk) – a more lexical solution: ‘pro’ introduced by (case-marked) adjectival lexical items • plan • testing its implementability in HunGram (and EngGram?)

  22. Hungram1.lfg FIRST HUNGARIAN CONFIG (1.0) ROOTCAT ROOT. FILES common.templates.lfg hun-lex.lfg hun-templates.lfg hun-morphconfig.lfg hun-rules.lfg. LEXENTRIES (FIRST HUNGARIAN). CHARACTERENCODING iso-8859-2. MORPHOLOGY (STANDARD HUNGARIAN). RULES (FIRST HUNGARIAN). TEMPLATES (STANDARD COMMON) (FIRST HUNGARIAN). GOVERNABLERELATIONS SUBJ OBJ POSS OBL OBL-? COMP XCOMP PREDLINK. SEMANTICFUNCTIONS ADJUNCT TOPIC FOCUS. ----

More Related