290 likes | 438 Views
Keyword extraction for metadata annotation of Learning Objects. Lothar Lemnitzer, Paola Monachesi RANLP, Borovets 2007. Outline. A quantitative view on the corpora The keyword extractor Evaluation of the KWE. Creation of a learning objects archive.
E N D
Keyword extraction for metadata annotation of Learning Objects Lothar Lemnitzer, Paola Monachesi RANLP, Borovets 2007
Outline • A quantitative view on the corpora • The keyword extractor • Evaluation of the KWE
Creation of a learning objects archive • Collection of the learning material IST domains for the LOs: 1. Use of computers in education, with sub-domains: 2. Calimera documents (parallel corpus developed in the Calimera FP5 project, http://www.calimera.org/ ) • Result: a multilingual, partially parallel, partially comparable, domain specific corpus
Corpus statistics – full corpus • Measuring lengths of corpora (# of documents, tokens) • Measuring token / tpye ratio • Measuring type / lemma ratio
Corpus statistics – full corpus • Bulgarian, German and Polish corpora have a very low number of tokens per type (probably problems with sparseness) • English has by far the highest ratio • Czech, Dutch, Portuguese and Romanian are in between • type / lemma ration reflects richness of inflectional paradigms
Reflection • The corpora are heterogeneous wrt to the type / token ratio • Does the data sparseness of some corpora, compared to others, influence the information extraction process? • If yes, how can we counter this effect? • How does the quality of the linguistic annotation influence the extraction task?
Corpus statistics – annotated subcorpus • Measuring lenghts of annotated documents • Measuring distribution of manually marked keywords over documents • Measuring the share of keyphrases
Reflection • Did the human annotators annotate keywords of domain terms? • Was the task adequately contextualised? • What do the varying shares of keyphrases tell us?
Keyword extraction • Good keywords have a typical, non random distribution in and across documents • Keywords tend to appear more often at certain places in texts (headings etc.) • Keywords are often highlighted / emphasised by authors • Keywords express / represent the topic(s) of a text
Modelling Keywordiness • Linguistic filtering of KW candidates, based on part of speech and morphology • Distributional measures are used to identify unevenly distributed words • TFIDF • (Adjusted) RIDF • Knowledge of text structure used to identify salient regions (e.g., headings) • Layout features of texts used to identify emphasised words and weight them higher • Finding chains of semantically related words
Challenges • Treating multi word keywords (= keyphrases) • Assigning a combined weight which takes into account all the aforementioned factors • Multilinguality: finding good settings for all languages, balancing language dependent and language independent features
Treatment of keyphrases • Keyphrases have to be restricted wrt to length (max 3 words) and frequency (min 2 occurrences) • Keyphrase patterns must be restricted wrt to linguistic categories (style of learningis acceptable; of learning stylesis not)
KWE Evaluation 1 • Human annotators marked n keywords in document d • First n choices of KWE for document d extracted • Measure overlap between both sets • measure also partial matches
KWE Evaluation – Overlap Settings • All three statistics have been tested • Maximal keyphrase length set to 3
Reflection • Is it correct to use the human annotation as „gold standard“ • Is it correct to give a weight to partial matches?
KWE Evaluation - IAA • Participants read text (Calimera „Multimedia“) • Participants assign keywords to that text (ideally not more than 15) • KWE produces keywords for text • IAA is measured over human annotators • IAA is measured for KWE / human ann.
KWE Evaluation – Judging adequacy • Participants read text (Calimera „Multimedia“) • Participants see 20 KW generated by the KWE and rate them • Scale 1 – 4 (excellent – not acceptable) • 5 = not sure
Reflection • How should we treat the „not sure“ decisions (quite substantial for a few judges) • What do the added keywords tell us? Where are they in the ordered list of recommendations?
Conclusions • Evaluation of a KWE in a multilingual environment and with diverse corpora is more difficult than expected beforehand • Now we have the facilities for a controlled development / improvement of KWE • Quantitative evaluation has to be accompanied by validation of the tool