190 likes | 333 Views
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources”. Gerd Fliedner Computational Linguistics Saarland University. Comments/Thoughts.
E N D
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics Saarland University
Comments/Thoughts • Useful approach, as it can potentially speed up and support annotation and thus making new FrameNets. • Uses only few resources, therefore extendable to other language pairs (in principle). • First experiments ‘only’.
Multilingual FrameNets • Having FrameNet for as many languages as possible would be nice. • There are numerous monolingual and cross-lingual applications. • BUT: Building ‘a FrameNet’ is knowledge and labour intensive work, and thus expensive, funding may be a problem.
Bootstrapping Multilingual FNs LSA • (Re-) Use as much knowledge from existing FrameNets as possible. • Ease the task of annotators by making useful suggestions. • Use automatic methods for knowledge acquisition. Swamp of Language
More than one strand of hair may be needed… • By the way: Change_hair_configuration is not yet in FN.
FR.FrameNet • In FR.FrameNet, several methods have been explored that could reduce time and costs of building new FrameNets. • Tasks explored: • Lexical Unit (Frame Evoking Element) transfer • Identify Frame Elements • Disambiguating LU-Frame Assignment
Lexical Unit Transfer • Can be seen as the task of finding and disambiguating translation pairs (links to Machine Translation, lexicography). • Extract disambiguated translations from existing ‘cluster-based’ dictionary. • Some manual annotation required, but relatively fast and simple way of acquiring a solid core lexicon.
Manual Filtering • Is frame information currently used for disambiguation? • How is the manual annotation done? Sounds like rules of thumb. Guidelines? • How is it evaluated?
Resources needed • Lexical unit transfer • English FrameNet • Large coverage bi-lingual dictionary (source►target language, optimally sense-disambiguated) • Corpus in target language • (Some) manual annotation (Read: OK, may be problem for ‘small’ languages, may be problem for small projects)
Lexical Unit Transfer: Other Possibilities • Using ‘human readable’ resources • Use existing dictionaries • Problem: Disambiguation • Using machine readable resources • Use Euro WordNet or similar • Problem again: Disambiguation • Use parallel corpora • Padó&Lapata, AAAI-05
Identify Frame Elements • Core idea: The same semantic restrictions/preferences should apply to Frame Elements in source and target language. • How can these semantic preferences be learned? • First step: Learn cross-lingual semantic similarity • Second step: Identify Frame Elements in one language and transfer.
Bilingual Infomap/Latent Semantic Analysis (LSA) • Originally used for crosslingual information retrieval. • Use bilingual, parallel ‘core’ corpus. • Parallel documents/paragraphs/… are put together and count as one text. • Build vector space. • Monolingual and cross-lingual similarities will ‘fall out’.
Identify and transfer Frame Elements • Use Berkeley FrameNet corpus as training corpus (English): Frame Elements (content words+POS) from annotated examples are used as starting point. • Use semantic space (generated by LSA) to find good (hopefully semantically related) translation candidates for words making up Frame Element. • To identify French Frame Element: Find ‘closest’ vector. • Several good examples, some less good ones.
Add Clustering • Inspection of data shows: Frame Elements may have semantically different fillers. • Thus, clustering of LSA vectors seems promising. • Identifying French Frame Elements: Instead of finding closest vector, check whether word vector belong to one of the clusters. • Problems: Identify optimal number of clusters, sparse data, …
Resources Needed • Frame Element identification/transfer • English FrameNet • Parallel corpus source/target language • Additional corpora in both languages • Corpus in target language • (Tagger in source/target language) • (Not so little) manual annotation (Read: OK, may be problem for ‘small’ languages, may be problem for small projects)
Use information from WordNet? • For French: • Use (Euro) WordNet alternatively/in addition: • Use Euro WordNet links (translations) • Use WordNet to expand ‘queries’ • Use similarity measures such as Jiang&Conrath 97. • For other languages that do not have WordNet: ???
Syntax • Certain Frame Elements are semantically totally heterogeneous, but syntactically (relatively) easy to identify • For example: Statement.Message (engl.: say that X, fr.: dire que X) • Problem: Semantic transfer can be learned using LSA, syntactic transfer (that≈que) cannot. • Could (partially) parsed parallel corpora be used to learn syntactic transfer? Can ‘syntactic’ and ‘semantic’ Frame Element identification be combined? Alternatively: Can ‘syntactic’ Frame Elements be recognised and left to annotators altogether?
Frame Element Preferences • Knowing more about Frame Elements (explicitly) would be very helpful. • Automatic Frame/Frame Element assignment. • Manual annotation/guidelines. • Transfer to other languages. • Encoding preferences as links within FrameNet • Encoding preferences as links with external resources (WordNet? SUMO/MILO?), cf. work by Aljoscha Burchardt • Cf. yesterday’s talk by Michael Ellsworth
Conclusions • (Some) more research required. • Optimising the annotation process probably very important, e.g.: • Use several cycles (start with ‘more certain’ cases, re-train with the additional data, …) • Integrate different strategies, e.g. ‘syntax’ and ‘semantics’. • Which decisions can be made automatically? Can suggestions be made? How good are they? Recall vs. precision optimisations