250 likes | 532 Views
The PDT Morphology and Surface Syntax. Jan Haji č Institute of Formal and Applied Linguistics School of Computer Science Faculty of Mathematics and Physics Charles University, Prague Czech Republic. Morphology (m-layer). Prerequisites for the manual annotation process: Tokenized data
E N D
The PDTMorphology and Surface Syntax Jan Hajič Institute of Formal and Applied Linguistics School of Computer Science Faculty of Mathematics and Physics Charles University, Prague Czech Republic Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
Morphology (m-layer) • Prerequisites for the manual annotation process: • Tokenized data • Annotation guidelines • Annotation tool • Manual decision making support • Offline (or online) morphological analyzer • Quality checking tool • Process description • Results (manually annotated data) to be used for... • tagger training, linguistic research, basis for further annotation, ... Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
Morphological Attributes Ex.: nejnezajímavějším “(to) the most uninteresting” • Tag: 13 categories • Example: AAFP3----3N---- Adjective no poss. Gendernegated Regular no poss. Numberno voice Feminine no personreserve1 Pluralno tensereserve2 Dative superlativebase var. • Lemma: POS-unique identifier Books/verb -> book-1, went -> go, to/prep. -> to-1 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
Morphological Tagset • 13 categories, 4452 plausible tags (combinations): Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
Morphological Analysis • Formally: MA: A+→ Pow(L x T) • MA(f) = { [ l,t ] }; • f A+ (the token), • l L (lemma), • t T (tag) • tokens taken in isolation • no attempt to solve e.g. auxiliaries vs. full verbs • Ex.: MA(“má“) = { [mít,VB-S---3P-AA---],lit. “to have” lit. “has”,”my”[můj,PSFS1-S1------1], lit. “my” [můj,PSFS5-S1------1], [můj,PSNP1-S1------1], [můj,PSNP4-S1------1], [můj,PSNP5-S1------1] } Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
Morphological Analysis:Implementation • Dictionary-based • covers 800kW (lemmas), ~ 20 mil. forms (w/tag) • C code implementation • standard (regular) derivations on-the-fly; ex.: • spojit spojený spojený spojenost spojitelný spojitelný spojitelnost • irregular forms listed in dictionary (w/tags) • no phonological processing(concatenation only) • grammatical prefixes only: negation, superlative joinedly join joined joinedliness joinably joinable joinability Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
The Morphological Annotation Tool (LAW) Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
The Process ofMorphological Annotation • From tokenized to annotated text: tokenized text (auto, w-layer) (Auto) morphological analysis morphological dictionary Manual morphological disambiguation (DA) text w/morph. interpretations annotation guidelines text w/select. interpretation annotated text (m-layer) Manual adjudication Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
PDT – Syntactic Annotation • Surface syntax annotation • Dependency surface syntax • Comparable to Penn Treebank annotation • Convertible: dependency ↔ parse trees • Deep syntactic/semantic annotation • Dependency trees • Different topology • High level of generalization and formalization • Many node attributes
governor dependent Analytical Syntax (a-layer) • Dependency + Analytical Function The influence of the Mexican crisis on Central and Eastern Europe has apparently been underestimated.
Analytical Syntax: Functions • Main (for [main] semantic lexemes): • Pred, Sb, Obj, Adv, Atr, Atv(V), AuxV, Pnom • “Double” dependency: AtrAdv, AtrObj, AtrAtr • Special (function words, punctuation,...): • Reflefives, particles: AuxT, AuxR, AuxO, AuxZ, AuxY • Prepositions/Conjunctions: AuxP, AuxC • Punctuation, Graphics: AuxX, AuxS, AuxG, AuxK • Structural • Elipsis: ExD, Coordination etc.: Coord, Apos
Example • All came from Cray Research.
Surface Syntax Example • Complete sentence: Sb, Pred, Obj • Resistance needs courage.
Surface Syntax Example • Analytical verb form: • he would be allowed to be enrolled
Surface Syntax Example • Predicate with copula (state) • you were fired
Surface Syntax Example • Passive construction (action) • (The) book has been translated [by Mr. X]
Surface Syntax Example • Complement • she left crying
Surface Syntax Example • Object • he gave Mary a book
Surface Syntax Example • Object used for infinitive of analytical verb forms • he wants to learn
Surface Syntax Example • Relative clause (embedded) • the woman, who had a French accent, was very pretty
Surface Syntax Example • Coordination • ... (to) magic, mysticism(,) etc.
Surface Syntax Example • Apposition • cheap, i.e. under five dollars
Surface Syntax Example • Incomplete phrases • Peter works well, but Paul badly
Surface Syntax Example • Variants (equality) • he bought shoes for his son
XML Annotation Layers (English) • Strictly top-down links • w+m+a can be easily “knitted” • API for cross-layer access (programming) • PML Schema / Relax NG • [With slight modification, can be used for spoken data (audio as layer “-1”)]