1 / 40

Metadata generation and glossary creation in eLearning

Metadata generation and glossary creation in eLearning. Lothar Lemnitzer Review meeting, Zürich, 25 January 2008. Outline. Demonstration of the functionalities Where we stand Evaluation of tools Consequences for the development of the tools in the final phase. Demo.

jag
Download Presentation

Metadata generation and glossary creation in eLearning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

  2. Outline • Demonstration of the functionalities • Where we stand • Evaluation of tools • Consequences for the development of the tools in the final phase

  3. Demo We simulate a tutor who adds a learning objects and generates and edits additional data

  4. Where we stand (1) Achievements reached in the first year of the project: • Annotated corpora of learning objects • Stand-alone prototype of keyword extractor (KWE) • Stand-alone prototype of glossary candidate detector (GCD)

  5. Where we stand (2) Achievements reached in the second year of the project: • Quantitative evaluation of the corpora and tools • Validation of the tools in user-centered usage scenarios for all languages • Further development of tools in response to the results of the evaluation

  6. Evaluation - rationale Quantitative evaluation is needed to • Inform the further development of the tools (formative) • Find the optimal setting / parameters for each language (summative)

  7. Evaluation (1) Evaluation is applied to: • the corpora of learning objects • the keyword extractor • the glossary candidate detector In the following, I will focus on the tool evaluation

  8. Evaluation (2) Evaluation of the tools comprises of • measuring recall and precision compared to the manual annotation • measuring agreement on each task between different annotators • measuring acceptance of keywords / definition (rated on a scale)

  9. KWE Evaluation step 1 • On human annotator marked n keywords in document d • First n choices of KWE for document d extracted • Measure overlap between both sets • measure also partial matches

  10. KWE Evaluation – step 2 • Measure Inter-Annotator Agreement (IAA) • Participants read text (Calimera „Multimedia“) • Participants assign keywords to that text (ideally not more than 15) • KWE produces keywords for text

  11. KWE Evaluation – step 2 • Agreement is measured between human annotators • Agreement is measured between KWE and human annotators We have tested two measures / approaches • kappa according to Bruce / Wiebe • AC1, an alternative agreement weighting suggested by Debra Haley at OU, based on Gwet

  12. KWE Evaluation – step 3 • Humans judge the adequacy of keywords • Participants read text (Calimera „Multimedia“) • Participants see 20 KW generated by the KWE and rate them • Scale 1 – 4 (excellent – not acceptable) • 5 = not sure

  13. GCD Evaluation - step 1 • A human annotator marked definitions in document d • GCD extracts defining contexts from same document d • Measure overlap between both sets • Overlap is measured on the sentence level, partial overlap counts

  14. GCD Evaluation – step 2 • Measure Inter-Annotator Agreement • Experiments run for Polish and Dutch • Prevalence-adjusted version of kappa used as a measure • Polish: 0.42; Dutch: 0.44 • IAA rather low for this task

  15. GCD Evaluation – step 3 • Judging quality of extracted definitions • Participants read text • Participants get definitions extracted by GCD for that text and rate quality • Scale 1 – 4 (excellent – not acceptable) • 5 = not sure

  16. GCD Evaluation – step 3 Further findings • relatively high variance (many ‚1‘ and ‚4‘) • Disagreement between users about the quality of individual definitions

  17. Individual user feedback - KWE • The quality of the generated keywords remains an issue • Variance in the responses from different language groups • We suspect a correlation between language of the users and their satisfaction • Performance of KWE relies on language settings, we have to investigate them further

  18. Individual user feedback – GCD • Not all the suggested definitions are real definitions. • Terms are ok, but definitions cited are often not what would be expected. • Some terms proposed in the glossary did not make any sense. • The ability to see the context where a definition has been found is useful.

  19. Consequences - KWE • Use non-distributional information to rank keywords (layout, chains) • Present first 10 keywords to user, more keywords on demand • For keyphrases, present most frequent attested form • Users can add their own keywords

  20. Consequences - GCD • Split definitions into types and tackle the most important types • Use machine learning alongside local grammars • Look into the part of the grammars which extract the defined term • Users can add their own definitions

  21. Plans for final phase • KWE, work with lexical chains • GCD, extend ML experiments • Finalize documentation of the tools

  22. Validation User scenarios with NLP tools embedded: • Content provider adds keywords and a glossary for a new learning object • Student uses keywords and definitions extracted from a learning object to prepare a presentation of the content of that learning object

  23. Validation • Students use keywords and definitions extracted from a learning objects to prepare a quiz / exam about the content of that learning object

  24. Validation We want to get feedback about • The users‘ general attitude towards the tools • The users‘ satisfaction with the results obtained by the tools in the particular situation of use (scenario)

  25. User feedback • Participants appreciate the option to add their own data • Participants found it easy to use the functions

  26. Plans for the next phase Improve precision of extraction results: • KWE – implement lexical chainer • GCD – use machine learning in combination with local grammars or substituting these grammars • Finalize documentation of the tools

  27. Corpus statistics – full corpus • Measuring lengths of corpora (# of documents, tokens) • Measuring token / tpye ratio • Measuring type / lemma ratio

  28. Corpus statistics – full corpus • Bulgarian, German and Polish corpora have a very low number of tokens per type (probably problems with sparseness) • English has by far the highest ratio • Czech, Dutch, Portuguese and Romanian are in between • type / lemma ration reflects richness of inflectional paradigms

  29. To do • Please check / verify this numbers • Report, for the M24 deliverable, about improvements / recanalysis of the corpora (I am aware of such activities for Bulgarian, German, and English)

  30. Corpus statistics – annotated subcorpus • Measuring lenghts of annotated documents • Measuring distribution of manually marked keywords over documents • Measuring the share of keyphrases

  31. Keyphrases

More Related