1 / 28

Dan Cristea, Corina Forăscu, Dan Tufiş, Ionuţ Pistol, Diana Trandabăţ, Adrian Iftene

LT 4 eL - WP1 : Setting the scene WP leader: UAIC Univ . AI. I. Cuza of Iasi Faculty of Computer Science. Dan Cristea, Corina Forăscu, Dan Tufiş, Ionuţ Pistol, Diana Trandabăţ, Adrian Iftene Contact: dcristea@info.uaic.ro. Utrecht Review Meeting, February 1, 2007. Objectives.

haruki
Download Presentation

Dan Cristea, Corina Forăscu, Dan Tufiş, Ionuţ Pistol, Diana Trandabăţ, Adrian Iftene

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LT4eL - WP1: Setting the sceneWP leader: UAICUniv. AI. I. Cuza of IasiFaculty of Computer Science Dan Cristea, Corina Forăscu, Dan Tufiş, Ionuţ Pistol, Diana Trandabăţ, Adrian Iftene Contact: dcristea@info.uaic.ro Utrecht Review Meeting, February 1, 2007

  2. Objectives • inventarization and classification of existing tools necessary for the development of the relevant functionalities (i.e. key word extractor, glossary candidate detector); • collection and normalization of the learning material related to the use of the computer in education (Humanities, Social Sciences); • investigation of IPR issues; • adoption of relevant standards for linguistic annotation of learning objects; • dissemination of the results through a Web portal

  3. Partners in WP1 • Utrecht University (UU), The Netherlands • University of Hamburg (UHH), Germany • University of Lisbon (FFCUL), Portugal • Charles University Prague (CUP), Czech Republic • Institute for Parallel Processing, Bulgarian Academy of Sciences (IPP-BAS), Bulgaria • University of Tübingen (UTU), Germany • Institute of Computer Science, Polish Academy of Sciences (ICS-PAS), Poland • Zürich University of Applied Sciences Winterthur (ZHW), Switzerland • University of Malta (UOM), Malta

  4. LMS User Profile LING. PROCESSOR EN GE Lemmatizer, POS, Partial Parser Ontology CROSSLINGUAL RETRIEVAL Lexikon Lexikon Lexicon Lexikon Lexicon Lexikon Lexikon Lexikon Lexikon RO PT PL CZ BG DT MT PT GE PL RO DT MT EN CZ Documents SCORM Pseudo-Struct. Basic XML CONVERTOR 2 Documents SCORM Documents HTML Pseudo-Struct Glossary CONVERTOR 1 Metadata (Keywords) Ling. Annot XML BG EN Documents User (PDF, DOC, HTML, SCORM,XML) REPOSITORY

  5. The Portal • A working space: • Repository for resources, tools, deliverables • Exchange information among participants • Statistics • Hosted by UAIC: • January 2007: 1.15 Gb (without realTimeStat, searchForm, upload/updateForm) • Address: http://consilr.info.uaic.ro/uploads_lt4el • Username: guestLt4eL • Passwd: elearning Demo version on CD

  6. O1. Collection of language resources and tools (1) • Inventarization and classification of existing tools (http://consilr.info.uaic.ro/uploads_lt4el/tools/all.php?) relevant to: • the integration of language technology resources in eLearning (WP2) • the integration of semantic knowledge (WP3)

  7. O1. Collection of language resources and tools (2) • Inventarization and classification of existing language resources • corpora and frequencies lists:http://consilr.info.uaic.ro/uploads_lt4el/menu/all.php • lexica: http://www.let.uu.nl/lt4el/wiki/index.php/Lexica_Joint_Table

  8. O2. Collection of LOs: the portal Uploads, updates & real-time statistics at http://consilr.info.uaic.ro/uploads_lt4el/ Criteria (→ attributes): • Subdomains relevant for beginners in IST & e-learning → Domain • Multilingualism → Language • Medium sized documents → Numberofwords • IPR~clear → IPR • Uniformity in topics →keywordsselected initially

  9. Collection of LOs: domains 1. Use of computers in education, with sub-domains: 1.1 Teaching academic skills, with sub-domains: 1.1.1 Academic skills 1.1.2 Relevant computer skills for the above tasks (MS Word, Excel, Power Point, LaTex, Web pages, XML) 1.1.3 Basic skills (use of computer for beginners) (chats, e-mail, Intenet) 1.2 e-Learning, e-Marketing 1.3 The I*Teach document (Leonardo project, http://i-teach.fmi.uni-sofia.bg/) 1.4 Impact of use of computers in society 1.5 Studies about use of computers in schools / high schools 1.6 Impact of e-Learning on education 2. Calimera documents (parallel corpus developped in the Calimera FP5 project, http://www.calimera.org/ )

  10. Collection of LOs: domains coverage

  11. The hierarchy of LOs’ formats

  12. Collection of LOs: annotation layers • Initial documents: doc, pdf, html, txt → Base-XML • Linguistic annotation: tokens, POS, lemma, chunks → WP2 XML format (LT4ELAna.dtd) • Keywords, definitions and ontology links annotations

  13. Level 1 conversions doc pdf latex other doc → html html plain text Base-XML

  14. Level 1 conversions doc → html (UTF-8) • MS Office: Save As html • OpenOffice Writer SXC/ODT: Save As html

  15. Level 1 conversions doc pdf latex other pdf → html html plain text Base-XML

  16. Level 1 conversions: pdf → html (UTF-8) 1. Adobe on-line conversion tool 2. pdfbox (Windows) 3. pdftohtml (Linux) 4. OpenOffice 5. Adobe Acrobat Professional

  17. Level 1 conversions doc pdf latex other html plain text Base-XML convertor Base-XML

  18. Level 1 conversions: html → Base-XML • The UAIC Java converter • keeps all the tags possibly useful (fixed) • produces a log of all the removed tags/data • The CUP html2xml.pl converter • tags kept according to a DTD

  19. Collection of LOs: second level morpho tok pos lemma NP Language specific tools tok-pos-lemma WP2 XML format

  20. Collection of LOs: second level morpho tok pos lemma NP tok-pos-lemma scripts WP2 XML format

  21. Collection of LOs: KW extractor WP2 XML format Level 2 KW extractor Level 3 Man KD XML Auto KD XML

  22. Collection of LOs: KW extractor WP2 XML format Level 2 Level 3 Man KD XML Auto KD XML KW extractor evaluation

  23. Collection of LOs: third level Man KD XML Auto KD XML def extractor Incl. km.xml, dm.xml Incl. akw, adef akw: automatically annotated kws adef: automatically annotated defs kmxml: manually annotated kws dmxml: manually annotated defs

  24. Collection of LOs: third level Man KD XML Auto KD XML def extractor Incl. km.xml, dm.xml Incl. akw, adef akw: automatically annotated kws adef: automatically annotated defs kmxml: manually annotated kws dmxml: manually annotated defs def extractor evaluation

  25. Open issues • Convertors • Tables, figures, page look… • IPRs • Clarify the IPR status • authors & EU + national legislation • Define IPR categories for LOs: • usage (free, restricted, for research...)

  26. WP1 over time Official end of WP1 Beginning of project D1.1 Evaluation December 05 May 06 Now February 06 • Structure & functionalities to the portal • BaseXML convertors • new LOs Initial collection on Portal • Levels 2&3 additions • new tools • grammars • guides, docs • - ontology, TermLex

  27. tok akw txt axml doc pdf latex html other tpl morpho adef pos lemma NP wp2xml sxml Level 1 Level 2 Level 3 Proposal: the hierarchy seen as a processing environment

  28. Conclusions • LOs, resources and tools collected • Initially: portal seen as a repository • Now: portal potentially integrated with the LMS as a processing environment

More Related