250 likes | 456 Views
Cross-lingual projection of Semantics. Sebastian Pado IGK Colloquium Dec 16th 2004. Overview. Background: Role Semantics Semantic Projection Current and Future Work. Framework: Role semantics. Predicate-argument structure, Theta roles, who did what to whom. Agent. Recipient. Theme.
E N D
Cross-lingual projection of Semantics Sebastian Pado IGK Colloquium Dec 16th 2004
Overview • Background: Role Semantics • Semantic Projection • Current and Future Work
Framework: Role semantics Predicate-argument structure, Theta roles, who did what to whom Agent Recipient Theme Peter gives Mary a book NB. No treatment of discourse relations, modality, negation, etc.
Flavours of role semantics • Top-down approach: common, intuitively defined roleset for all verbs • give: is Mary Recipient or Goal or Patient? • resemble: Subj vs. Obj • Bottom-up approach: Frame Semantics • Frames: Conceptual rep of a situationStatement, Giving, Transaction • Each frame is introduced by a targetsay, give, buy • Roles are frame-specific
Frame Semantics • An Example Frame: Giving • Targets: give, hand out, receive • Roles: Donor, Recipient, Theme • The Berkeley FrameNet Project • English Frame Lexicon • ~ 200 Frames, ~ 2.500 words (V/N/Adj) • Typically 3-6 roles per frame • Corpus of ~ 60.000 annotated instances
What do Role Semantics buy us? • Surface-independent representation • Solves the paraphrase problem Peter gives the book to Mary Mary receives the book from Peter • Flexible basis for QA, Inference etc. • Aljoscha Burchardt’s PhD • Common cross-lingual semantic rep
Semantic Role Assignment • Task: Automatic tagging of roles on free text • Important for NLP applications • Linking (syntax-semantics interface) • Statistical modelling (as classification) • Frame = semantically coherent targets • Targets show linking idiosyncrasies • Give:Sub - Donor, Dobj - Theme, To-PP/Iobj - Rec • Get: Sub - Rec, Dobj - Theme, From-PP - Donor • Needs lots of training data
Moving to another language… • SALSA: Manual creation and use of a German corpus with semantic annotation • Basis: TIGER newspaper corpus, 1.5m words • English frames (mostly) work for German • Frame concept language-independent • But: Annotation slow and error-prone • Total effort: > 10 person years Can we use the English data for German?
Overview • Background: Role Semantics • Semantic Projection • Current and Future Work
Central idea: Semantic Projection • Find a large, parallel bilingual corpus • E/G part of EUROPARL (25m words) • Assign semantic roles on English side • Train automatic tagger on English data • Project semantics over to German • Step 1: Find semantic equivalences via word alignment • Step 2: Project frame • Step 3: Project roles Result: Large German annotated corpus
Projection: Example Three assumptions to make this work Arriving Arriving Peter comes home Peter kommt nach Hause
Assumption 1 Semantic representation is parallel Arriving Arriving Peter comes home Peter kommt nach Hause
Semantic (im-)parallelism • Frame definition based on realisable roles • German and English typologically similar • Mostly, same frames evoked • Aspect is problematic • Proper differences We finish by 12 o’clock Activity_finish Wir sind um 12 Uhr fertig Activity_done_state • Same aspect, lexicalised differently I finish by saying Abschliessend sage ich
Assumption 2 There is always parallel lexical material that is semantically equivalent Arriving Arriving Peter comes home Peter kommt nach Hause
(Im)parallelism of lexical material • We only need semantic parallelism, only for targets and roles • Don’t care about discourse, modality, etc. • Don’t care about exact wording • Insights from translation science • Translation = Recreation of text based on content and target language norms • Frame structures ~ propositional content • Specific register • Specific domain (no cultural differences)
Assumption 3 Word Alignment provides semantic equivalence Arriving Arriving Peter comes home Peter kommt nach Hause
Word Alignment as Semantic Equivalence • Current Word Alignment models use co-occurrence to determine alignment • But co-occurrence != semantic equivalence decide entscheiden Entscheidung treffen insist bestehen darauf Problems: Phrasal verbs, Idioms, Support Verbs (Funktionsverbgefuege), Noise proper
Overview • Background: Role Semantics • Semantic Projection • Current and Future Work
Current Work (1) • Empirical assessment of assumptions • Manual annotation of parallel corpus sample • Independent annotation of German / English • Evaluation of semantic parallelism • Evaluation of lexical parallelism • Evaluation of automatic word alignment
Current Work (2) • Token-wise word alignment too noisy • decide - treffen: Deciding? • Instead: Find reliable type equivalences • Statistics over complete corpus, filtering • Removal of German collocations • Result: German frame lexicon • Target x can evoke frames a,b,c • Project frame only if licensed by German lexicon
Current Work (3) • Projection of roles: Find equivalences between constituents • Define pairwise similarities • Efficiently identify best match • Graph matching • Probabilistic model • Choice points: • Definition of similarities • Bijective correspondence, yes or no? • Implementation
Future Work • Thorough Evaluation • Filtering • Projection will be noisy • Training a German semantic tagger • Evaluation wrt coverage, accuracy • Combination with manually annotated data (SALSA) • Using another language • English/French part of EUROPARL
Conclusion • Automatic creation of semantically annotated data for a new language • Projection of annotation from known languageusing a word-aligned parallel corpus • Theory in place • Potential Problems: • Semantics may diverge • Lexical material may diverge • Word Alignment noisy • Empirical evaluation underway