280 likes | 447 Views
LT 4 eL - WP1 : Setting the scene WP leader: UAIC Univ . AI. I. Cuza of Iasi Faculty of Computer Science. Dan Cristea, Corina Forăscu, Dan Tufiş, Ionuţ Pistol, Diana Trandabăţ, Adrian Iftene Contact: dcristea@info.uaic.ro. Utrecht Review Meeting, February 1, 2007. Objectives.
E N D
LT4eL - WP1: Setting the sceneWP leader: UAICUniv. AI. I. Cuza of IasiFaculty of Computer Science Dan Cristea, Corina Forăscu, Dan Tufiş, Ionuţ Pistol, Diana Trandabăţ, Adrian Iftene Contact: dcristea@info.uaic.ro Utrecht Review Meeting, February 1, 2007
Objectives • inventarization and classification of existing tools necessary for the development of the relevant functionalities (i.e. key word extractor, glossary candidate detector); • collection and normalization of the learning material related to the use of the computer in education (Humanities, Social Sciences); • investigation of IPR issues; • adoption of relevant standards for linguistic annotation of learning objects; • dissemination of the results through a Web portal
Partners in WP1 • Utrecht University (UU), The Netherlands • University of Hamburg (UHH), Germany • University of Lisbon (FFCUL), Portugal • Charles University Prague (CUP), Czech Republic • Institute for Parallel Processing, Bulgarian Academy of Sciences (IPP-BAS), Bulgaria • University of Tübingen (UTU), Germany • Institute of Computer Science, Polish Academy of Sciences (ICS-PAS), Poland • Zürich University of Applied Sciences Winterthur (ZHW), Switzerland • University of Malta (UOM), Malta
LMS User Profile LING. PROCESSOR EN GE Lemmatizer, POS, Partial Parser Ontology CROSSLINGUAL RETRIEVAL Lexikon Lexikon Lexicon Lexikon Lexicon Lexikon Lexikon Lexikon Lexikon RO PT PL CZ BG DT MT PT GE PL RO DT MT EN CZ Documents SCORM Pseudo-Struct. Basic XML CONVERTOR 2 Documents SCORM Documents HTML Pseudo-Struct Glossary CONVERTOR 1 Metadata (Keywords) Ling. Annot XML BG EN Documents User (PDF, DOC, HTML, SCORM,XML) REPOSITORY
The Portal • A working space: • Repository for resources, tools, deliverables • Exchange information among participants • Statistics • Hosted by UAIC: • January 2007: 1.15 Gb (without realTimeStat, searchForm, upload/updateForm) • Address: http://consilr.info.uaic.ro/uploads_lt4el • Username: guestLt4eL • Passwd: elearning Demo version on CD
O1. Collection of language resources and tools (1) • Inventarization and classification of existing tools (http://consilr.info.uaic.ro/uploads_lt4el/tools/all.php?) relevant to: • the integration of language technology resources in eLearning (WP2) • the integration of semantic knowledge (WP3)
O1. Collection of language resources and tools (2) • Inventarization and classification of existing language resources • corpora and frequencies lists:http://consilr.info.uaic.ro/uploads_lt4el/menu/all.php • lexica: http://www.let.uu.nl/lt4el/wiki/index.php/Lexica_Joint_Table
O2. Collection of LOs: the portal Uploads, updates & real-time statistics at http://consilr.info.uaic.ro/uploads_lt4el/ Criteria (→ attributes): • Subdomains relevant for beginners in IST & e-learning → Domain • Multilingualism → Language • Medium sized documents → Numberofwords • IPR~clear → IPR • Uniformity in topics →keywordsselected initially
Collection of LOs: domains 1. Use of computers in education, with sub-domains: 1.1 Teaching academic skills, with sub-domains: 1.1.1 Academic skills 1.1.2 Relevant computer skills for the above tasks (MS Word, Excel, Power Point, LaTex, Web pages, XML) 1.1.3 Basic skills (use of computer for beginners) (chats, e-mail, Intenet) 1.2 e-Learning, e-Marketing 1.3 The I*Teach document (Leonardo project, http://i-teach.fmi.uni-sofia.bg/) 1.4 Impact of use of computers in society 1.5 Studies about use of computers in schools / high schools 1.6 Impact of e-Learning on education 2. Calimera documents (parallel corpus developped in the Calimera FP5 project, http://www.calimera.org/ )
Collection of LOs: annotation layers • Initial documents: doc, pdf, html, txt → Base-XML • Linguistic annotation: tokens, POS, lemma, chunks → WP2 XML format (LT4ELAna.dtd) • Keywords, definitions and ontology links annotations
Level 1 conversions doc pdf latex other doc → html html plain text Base-XML
Level 1 conversions doc → html (UTF-8) • MS Office: Save As html • OpenOffice Writer SXC/ODT: Save As html
Level 1 conversions doc pdf latex other pdf → html html plain text Base-XML
Level 1 conversions: pdf → html (UTF-8) 1. Adobe on-line conversion tool 2. pdfbox (Windows) 3. pdftohtml (Linux) 4. OpenOffice 5. Adobe Acrobat Professional
Level 1 conversions doc pdf latex other html plain text Base-XML convertor Base-XML
Level 1 conversions: html → Base-XML • The UAIC Java converter • keeps all the tags possibly useful (fixed) • produces a log of all the removed tags/data • The CUP html2xml.pl converter • tags kept according to a DTD
Collection of LOs: second level morpho tok pos lemma NP Language specific tools tok-pos-lemma WP2 XML format
Collection of LOs: second level morpho tok pos lemma NP tok-pos-lemma scripts WP2 XML format
Collection of LOs: KW extractor WP2 XML format Level 2 KW extractor Level 3 Man KD XML Auto KD XML
Collection of LOs: KW extractor WP2 XML format Level 2 Level 3 Man KD XML Auto KD XML KW extractor evaluation
Collection of LOs: third level Man KD XML Auto KD XML def extractor Incl. km.xml, dm.xml Incl. akw, adef akw: automatically annotated kws adef: automatically annotated defs kmxml: manually annotated kws dmxml: manually annotated defs
Collection of LOs: third level Man KD XML Auto KD XML def extractor Incl. km.xml, dm.xml Incl. akw, adef akw: automatically annotated kws adef: automatically annotated defs kmxml: manually annotated kws dmxml: manually annotated defs def extractor evaluation
Open issues • Convertors • Tables, figures, page look… • IPRs • Clarify the IPR status • authors & EU + national legislation • Define IPR categories for LOs: • usage (free, restricted, for research...)
WP1 over time Official end of WP1 Beginning of project D1.1 Evaluation December 05 May 06 Now February 06 • Structure & functionalities to the portal • BaseXML convertors • new LOs Initial collection on Portal • Levels 2&3 additions • new tools • grammars • guides, docs • - ontology, TermLex
tok akw txt axml doc pdf latex html other tpl morpho adef pos lemma NP wp2xml sxml Level 1 Level 2 Level 3 Proposal: the hierarchy seen as a processing environment
Conclusions • LOs, resources and tools collected • Initially: portal seen as a repository • Now: portal potentially integrated with the LMS as a processing environment