220 likes | 366 Views
Husse-9 Conference Pécs, 22-24 January, 2009 HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational Linguistic Project. Tibor Laczk ó, György Rákosi & Ágoston Tóth Department of English Linguistics University of Debrecen
E N D
Husse-9 ConferencePécs, 22-24 January, 2009HunGram vs. EngGram in ParGram On the Comparison of Hungarian and English in an International Computational Linguistic Project Tibor Laczkó, György Rákosi & Ágoston Tóth Department of English Linguistics University of Debrecen {laczkot, rakosigy, tagoston}@delfin.unideb.hu Sponsored by OTKA research grant K 72983
Overview • Lexical-Functional Grammar (LFG) • The ParGram Project at PARC • The HunGram Project in Debrecen • A short demonstration: possible ParGram treatments of certain elliptical noun phrases in English and Hungarian
1/1 Stanford and LFG • LFG as a linguistic theory was developed in the late 1970s. One of the principal aims was to create a framework suitable for massive computational applications, and there has been a lively co-operation between theory and computational linguistic practice ever since. • The two co-founders: • Joan Bresnan (Stanford University, SU) mainly linguistic aspects • Ronald Kaplan (Palo Alto Research Center, PARC and SU, now at Powerset, Inc.) mainlycomputational aspects • General information on LFG is available at: http://www.essex.ac.uk/linguistics/LFG/
1/2 Design Principles of LFG • Lexicalism • Modularism • Parallel architecture • Generating and parsing structures are equally important • Rule system that is directly renderable in a mathematical formalism
1/3Central Modules of LFG constituent structurephonology (language-specific) word order lexicon (powerful) functional structuresemantics (universal) grammatical relations
1/4 Adpositional phrases in LFG PPPPNP PrNPNP Po Det N Det NDet N near the boxa doboz mellett a doboz-ban in PRED near/in/mellett/-ban, Pr ‘NEAR/IN <(OBJ)>’ OBJ PRED box, N ‘BOX’ DEF + PERS 3 NUM sg near/in, Pr ‘NEAR/IN <(OBJ)>’mellett, Po ‘NEAR <(OBJ)>’ -ban, Nsuff ‘IN <(OBJ)>’
2/1 PargGram at PARC • The Parallel Grammar (ParGram) project – launched and organized by PARC • LFG-based computational program • Capitalizes on LFG’s flexible general linguistic and computationally implementable architecture • Parser and generator • Goal: to analyze more and more languages on a maximally uniform platform – in the spirit of Universal Grammar
2/2 PargGram at PARC • A truly international project: English, German, French, Norwegian,Japanese, Chinese, Urdu (India), Malagasy (Madagascar), Arabic, Vietnamese, Spanish, Welsh, Indonesian, Turkish, Georgian, & Hungarian • Further information: http://www2.parc.com/isl/groups/nltt/default.html
2/3XLE parser • a deep, grammar-based parsing system for implementing lexical-functional grammars; constructed aspart of the ParGram project • output: c-structures and f-structures • supports tokenization and morphological analysis through finite-state transducers (with alternative analyses) • can select the most probable analysis from the potentially large candidateset using stochastic disambiguation (if implemented) • has a generator mode • implemented in C; runs on Solaris, linux, and MacOSX. • bottom line: a facility for writing syntactic rules and lexical entries, and for testing and editing them
toy-eng.lfg the D * (^ DEF)=+. girl N * (^ PRED) = 'GIRL'. walk V * (^ PRED)='WALK<(^ SUBJ)>'; N * (^ PRED)='WALK'. f-structure attribute-value matrices that encode predicate-argument relationsand other grammatical information (e.g. number, tense, case) c-structure context-free phrase-structuretree encoding constituency and linear order
2/4 Challenging natural language phenomena • Lexical ambiguityHomonymy: bank, fluke; ár, légy, írPolysemy: bulb, line; körte, toll • Structural ambiguityI saw the girl with the telescope.Részegen láttam Jánost. //Egész nap a hajókat néztük a Dunán. // • Word formation (compounding, derivation, minor processes)horror, horrid, horrify; terror, (*terrid), terrify;candor, candid, (*candify)student film society committee scandal video… • Anaphoric referencesa)We gave the bananas to the monkeys because they were hungry.b)We gave the bananas to the monkeys because they were ripe. • Ellipsis
2/5 Direct challenges • Non-toy lexicon of the Hungarian language, empirical techniques • Tokenization, morphological analysiswalks walk +Verb +Pres +3sg walk +Noun +PlNamed entity recognition Types: person, role, location, organization, brand, title, etc. This is the website of [the University of Debrecen org]. [The University of Debrecen loc] is not far from us. • Parsing performance trade-off between accuracy, usability and speed
3/1 HunGram • Tibor Laczkó – 2005/2006: Fulbright research grant to Stanford University • a ParGram invitation to PARC research at two host institutions • two goals at PARC: • familiarity with the formalism (XLE) • starting the implementation in XLE of the results of the research on the morpho-syntax of Hungarian noun phrases (in an LFG framework)
3/2 HunGram • LFG Research Group (LFGRG) at the Department of English Linguistics, UD • Tibor Laczkó • György Rákosi • Ágoston Tóth • 2 PhD students • XLE software licence from PARC
3/3 HunGram • OTKA research grant for 2008—2012 (K 72983) (Hungarian Scientific Research Fund) • objectives • developing a comprehensive LFG grammar of the Hungarian language (morphology, syntax, lexicon, semantic issues) • implementing it in HunGram/ParGram • launching an English vs. Hungarian comparative research project on the ParGram platform • incorporating the results in various course materials at the English Linguistics Department • (1 & 2) 3 4
4/1 Demo Elliptical noun phrases
4/2 “az öt nagy zöldet” c-structure c-structure + morphology f-structure
4/6 Elliptical noun phrases • differences • c-structure • morphology (e. g.: +case vs. –case) • English: ‘pro’ realized by an overt element, in the lexicon, in c-structure, in f-structure • Hungarian: ‘pro’ is covert, introduced by a functional annotation in c-structure, present in f-structure • EngGram vs. Hungram wrt to the number of features • similarities • f-structure – except for typological differences (case etc. features) • as Hungram gets more and more developed, more and more shared EngGram (ParGram) features • proposal • (previous talk) – a more lexical solution: ‘pro’ introduced by (case-marked) adjectival lexical items • plan • testing its implementability in HunGram (and EngGram?)
Hungram1.lfg FIRST HUNGARIAN CONFIG (1.0) ROOTCAT ROOT. FILES common.templates.lfg hun-lex.lfg hun-templates.lfg hun-morphconfig.lfg hun-rules.lfg. LEXENTRIES (FIRST HUNGARIAN). CHARACTERENCODING iso-8859-2. MORPHOLOGY (STANDARD HUNGARIAN). RULES (FIRST HUNGARIAN). TEMPLATES (STANDARD COMMON) (FIRST HUNGARIAN). GOVERNABLERELATIONS SUBJ OBJ POSS OBL OBL-? COMP XCOMP PREDLINK. SEMANTICFUNCTIONS ADJUNCT TOPIC FOCUS. ----