140 likes | 377 Views
Arabic Syntactic Trees. from Constituency to Dependency. Zden ě k Ž abokrtsk ý Otakar Smr ž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague. Motivation & Background. Linguistic Data Consortium Arabic Treebank
E N D
Arabic Syntactic Trees from Constituency to Dependency Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague
Motivation & Background • Linguistic Data Consortium Arabic Treebank • Constituent-syntax bracketing ~100k words published • Modification from English to Arabic • Prague Arabic Dependency Treebank • Dependency approach to syntax ~50k words in progress • Pre-step to tectogrammatical description • Motivation: co-operation and resource exchange • Our goal: transform the data from one annotation scheme to the other Arabic Syntactic Trees: from Constituency to Dependency
Non-terminal nodes + Text tokens Constituent labeling on non-terminals Slots and traces Linguistic Data Consortium, University of Pennsylvania Sentence root node + Text tokens Analytical function for every tree node Government and roles CCL & IFAL & ICL, Charles University in Prague Constituency X Dependency Arabic Syntactic Trees: from Constituency to Dependency
Trace of the antecedent subject Compound function of the head of the clause – outer and inner perspectives Free word-order compliant Model Arabic Phrase I Arabic Syntactic Trees: from Constituency to Dependency
Outline of the Transformation 1. Build temporary dependency tree • Contraction of the input phrase-structure tree • Uniquely determined by head selection function • Implementation: simple recursive procedure 2. Create analytical tree topology • Post-processing (corrections) of the temporary dep. tree, e.g., substituting traces with trace coindexed fillers • Re-arrangement of special complex constructs 3. Assign analytical functions Arabic Syntactic Trees: from Constituency to Dependency
Head Selection Function • For each constituent, select the head constituent among its children • Based on (ordered) handcrafted rules • Examples: • If there is a node with tag=PREP among the children, then it is the head • If there is a node with phrase_label=VP among the children, then it is the head • ... etc ... • If nothing was selected by the rules, then the rightmost child is selected Arabic Syntactic Trees: from Constituency to Dependency
Analytical Function Assignment • Based on (ordered) handcrafted rules and lexical lists • Completes the process, does not override previous assignments • Examples: • phrase_label=NP-SBJ afun=Sb • lemma=wa- afun=Coord • pos_tag=CONJ afun=AuxC • ... etc ... Arabic Syntactic Trees: from Constituency to Dependency
Sister-like co-ordination Conjunction of co-ordination Status constructus Model Arabic Phrase II Arabic Syntactic Trees: from Constituency to Dependency
Non-expressed subject (?) Complex modality constructs Principal discrepancies between descriptions – both in topology and labeling Model Arabic Phrase III Arabic Syntactic Trees: from Constituency to Dependency
Model Arabic Sentence • Wa lam yakun mina ’s-sahli `alay hi muwāğahatu kāmīrāti ’t-tilfizyūni wa `adasāti ’l-muşawwirīna wa huwa yaş`adu ’l-bāşa. • It was not easy for him to face the television cameras and the lenses of photographers as he was getting on the bus. Arabic Syntactic Trees: from Constituency to Dependency
Constituency Annotation Arabic Syntactic Trees: from Constituency to Dependency
Dependency Annotation Arabic Syntactic Trees: from Constituency to Dependency
Evaluation & Conclusion • Implementation still in progress, fine-tuning needed • 10,000 words manually annotated in both styles • ~60% of correctly aimed dependencies • 2nd Prague Penn Arabic Treebanking Workshop, May 2003 in Prague • Transfer from dependency to constituency? Arabic Syntactic Trees: from Constituency to Dependency
Related Work • New tool for assignment of analytical functions • Based on machine learning (C5-trained decision trees) • Error rate 17% (supposing the topology of the tree is correct) • First experiments with Arabic dependency parser • Incorporated into the process of annotation of Prague Arabic Dependency Treebank Arabic Syntactic Trees: from Constituency to Dependency