140 likes | 285 Views
Prague Arabic Dependency Treebank. Development in Data and Tools. Jan Haji č Otakar Smr ž Petr Zemánek Jan Šnaidauf Emanuel Beška. Faculty of Mathematics and Physics Faculty of Arts and Philosophy Charles University in Prague. Project Release – PADT 1.0.
E N D
Prague Arabic DependencyTreebank Development in Data and Tools Jan HajičOtakar SmržPetr ZemánekJan ŠnaidaufEmanuel Beška Faculty of Mathematics and Physics Faculty of Arts and Philosophy Charles University in Prague
Project Release – PADT 1.0 • December 2004, Linguistic Data Consortium • 140 000 Morpho, 111 000 Syntax Prague Arabic Dependency Treebank: Development in Data and Tools
Open-Source Tools • TrEd Tree Editor • Multi-purpose annotation environment • Suite of programming utilities • Netgraph Search Engine • Server/Client system architecture • Easy-to-learn query language • Encode::Arabic Perl Module • Extension for processing of Arabic script • ArabTeX, Buckwalter, Unicode, … Prague Arabic Dependency Treebank: Development in Data and Tools
PADT Functional Views • Functional Generative Description • Theory of linguistic meaning and its expression • Prague Dependency Treebank for Czech • Independence of representation levels • Tectogrammatical – linguistic meaning • Analytical – surface dependency syntax • Morphological – categories and lexical units • Abstraction of the relations across levels • Strict distinction between form and function • Different units of description on each level Prague Arabic Dependency Treebank: Development in Data and Tools
Functional Morphology • Provides syntax levels with their abstract language, not just giving letters in tokens • Revives multiple senses of categories • Completeness of generation • Strict modeling of grammatical control • MorphoTrees – ‘human tagging’ • Successful prototype feature-based tagger Prague Arabic Dependency Treebank: Development in Data and Tools
Syntactic Levels of Description • Analytical level • Pragmatically motivated, close to surface syntax • Every single token resulting frommorphological level forms one node • Tree-like dependency structure for every sentence • Tectogrammatical level • Linguistic (literal) meaning, deep relations, TFA • Initial structures transformed from AL • Nodes for autosemantic words only • Decisive role of valency frames Prague Arabic Dependency Treebank: Development in Data and Tools
Logic of Analytical Trees • Concepts of dependency and valency • Reduction: sentence must retain grammatical correctness if leaves(terminal nodes) are chopped off • Trees: clause components clauses sentences paragraphs etc.Subtrees of clauses exchangeable for non-clauses • Nodes: words, tokenized parts of words, punctuation marks – marked by functions • Edges: syntactic relations –governing node dependent node/subtree Prague Arabic Dependency Treebank: Development in Data and Tools
Some Syntax Issues of Arabic • Non-verbal predication of several types • Subordinate non-verbal clauses / modification • Verb-like behavior of many nominal forms • Mostly VSO in verbal sentences, but… • vice-versa in non-verbal clauses • different, depending on context boundness • Compound verbs, fixed composite prepositions • Grammatical co-reference, accusative ofinner object, complex referencing, etc. Prague Arabic Dependency Treebank: Development in Data and Tools
Problem I: Predication • Head node of tree: PREDICATE • Why? Steady role in sentence, cannot be omitted • Verbal predicate: I-go to school • Non-verbal predicate • Nominal: The-house a-big (=the house is big) • Existential: There a-city (=there is a city) • Prepositional • Possessive: For him a-house (=he has a house) • Adverbial: The-mosque in the-city (=…is…) • Conjunctional: The-problem that (=…is that) Prague Arabic Dependency Treebank: Development in Data and Tools
Predication Types in Trees Verbal Nominal dAma [Pred] lasted kabIrun [Pnom] a-big[nom.] iqtirAHu [Sb] proposal sAEatayni [Adv] two-hours [acc.] Prepositional(possessive) al-baytu [Sb] the-house[nom.] Existential ‑hu [Atr] his al-EamalIyata [Obj] the-operation [acc.] EalA [AuxP] on vam~ata [PredE] there-is zumalA’i [Obj] colleagues Prepositional(adverbial, locative) la- [PredP] for madInatun [Sb] a-city [nom.] ‑hi [Atr] his Verb-likebehavior (object of noun?) fI [PredP] in -hu [Obj] him baytun [Sb] a-house [nom.] al-jAmiEu [Sb] the-mosque [nom.] al-madInati [Adv] the-city [gen.] Prague Arabic Dependency Treebank: Development in Data and Tools
Problem II: Clauses & Co-reference • Recursiveness: subordinate clause is con-tained as subtree in place of simple element • Head-node of clause gets the same function • Problem: non-verbal structures – clauses or not? • Compound verbs (mA zAla etc.) treated equally • Grammatical co-reference: Personal pro- noun formally required by another element • Pronoun must be marked to be treated as such • Target of reference is unambiguously identifiable • Often in subordinate clauses, mostly attributiveEx.: He-wrote a-book number its-pages hundred Prague Arabic Dependency Treebank: Development in Data and Tools
Clauses & Co-reference in Trees Compound verb, formed as main verb and its complement Attributive clause, prepositional predicate (adverbial) zAlat [Pred] she-stopped kataba [Pred] he-wrote kitAban [Obj] a-book mA [AuxM] not Objective clause, verbal predicate tuHis~u [Atv] she-feels al-rajulu [Sb] the-man [nom.] fI [Atr_PredP] in zaybabu [Sb] Zaynab Attributive clause, nominal predicate mi’atu [Sb] hundred [nom.] Referencing pronoun, as attribute in clause anna [AuxC] that -hi [Adv_Ref] it tuEjibu [Obj_Pred] they-impress SafHatin [Atr] pages [gen.] jumalan [Sb] sentences [acc.] Referencing pronoun, as adverbial in clause wADiHun [Atr_Pnom] clear [nom.] naHwu [Sb] grammar [nom.] -hA [Obj] her ‑hA [Atr_Ref]their Prague Arabic Dependency Treebank: Development in Data and Tools
Future Prospects • Implementation of Functional Morphology • Tectogrammatical annotation • Lexicons of valency frames • Re-training the feature-based tagger on MorphoTrees • Machine-learning on the treebank data for various purposes Prague Arabic Dependency Treebank: Development in Data and Tools
Thank you Questions welcome! http://ckl.mff.cuni.cz/padt/ Prague Arabic Dependency Treebank: Development in Data and Tools