50 likes | 201 Views
Ulrich Heid, IMS-CL, Universität Stuttgart. Comments on Emanuele Pianta: Exploiting Parallel Texts to leverage the manual annotation bottleneck: the MultiSemCor case. The methodology: transfer of annotations. It does around 75% of the annotation work It produces
E N D
Ulrich Heid, IMS-CL, Universität Stuttgart Comments on Emanuele Pianta:Exploiting Parallel Texts to leveragethe manual annotation bottleneck:the MultiSemCor case
The methodology: transfer of annotations • It does around 75% of the annotation work • It produces • an annotated TL corpus (pos, lemma, sense) • an annotated parallel corpus
Transfer of annotations: required infrastructure • „Controlled“ translation: sentence-wise, pos-preserving where possible • Multiword recognition • Parallel WordNets: Princeton Target Language Problems could arise: • with „free“ translations (cf. Translation Memories) • with more „deviant“ WordNets, e.g. GermaNet
Analysing the transfer result Systematic cases of non-alignment: • lack of „cross-linguistic synonymy“ • translation not 1:1 • not pos-preserving: coexist - coesistenza • 1:2: successfully - con successo • Do we get the same problems as those discussed as „divergences“/“mismatches“ in MT? • Would a marking of chunks in SL/TL help? • Would a morphology system help?
Towards relaxing the conditions on the infrastructure • To get the system to work under suboptimal conditions • Would the integration of morphological relations across pos be useful? (Yes for alignment, no for WN synset transfer) • Could the system be made „aware“ of transfer problems (and signal these to the user?) • Test with e.g. Acquis Communautaire? • Test with Germanic languages?