1 / 16

Exploring the keyword space in large learning resource aggregations: the case of GLOBE

Exploring the keyword space in large learning resource aggregations: the case of GLOBE. Miguel‐Angel Sicilia, Salvador Sánchez ‐Alonso, Elena Garcia‐Barriocanal, Julià Minguillón , Enayat Rajabi. LACRO'13 workshop April 2013, Leuven, Belgium. Agenda. Introduction GLOBE materials

sheila
Download Presentation

Exploring the keyword space in large learning resource aggregations: the case of GLOBE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploring the keyword space in large learning resource aggregations: the case of GLOBE Miguel‐Angel Sicilia, Salvador Sánchez‐Alonso, Elena Garcia‐Barriocanal, JuliàMinguillón, EnayatRajabi LACRO'13 workshop April 2013, Leuven, Belgium

  2. Agenda • Introduction • GLOBE materials • Keywords and classifications • Interlinking to Linked Open Data • Discussion and conclusion

  3. Introduction • Huge number of e-learning resources available on-line, for free or by subscription • Several initiatives aim at federating e-learning systems to unlock the educational content hidden in their repositories (e.g. GLOBE ) • The use of the IEEE LOM standard + OAI‐PMH has • facilitated the deployment of such collections

  4. Background • How different metadata elements properly describe and categorize the resource space • IEEE LOM proposes around 50 different elements including keyword and classification: • keywords are intended for the description of topics in any existing language • classification refers to classifying the Los • Some experimental studies exist on actual use of IEEE LOM ( e.g. Friesen (2004), Ochoa et al (2011))

  5. Background – IEEE LOM standard

  6. GLOBE Materials • GLOBE(Global Learning Objects Brokered Exchange) enables share and reuse between several learning object repositories • We harvested GLOBE through OAI-PMH and got around 770,000 metadata records • Most frequent language is English (also pointed out Ochoa 2011) , while large amount of resource has no language declared

  7. Language of resource in GLOBE

  8. GLOBE materials: Keyword • There exist around 5,5 million keywords in the sample ( ~ 7 keywords per resource) • Large number of keywords generated via machine translation (referenced by codes starting with “x-mt-”) • There are also around 3,2 million records seem generated by human practices ( ~ 4 keywords per resource) • Frequencies are high for relatively high number of keywords (beyond 15) (might be attributed to automated extraction)

  9. GLOBE materials: Keyword

  10. GLOBE materials: Classification • A total of ~ 700k classifications distributed across ~500k resources were found with ~1 million taxon entries • About 92% of all the resources have at most two • classifications, and only 187 resources have more than 10. • There were only 43 different classification purposes found, with discipline being “discipline” a 60% and “Technical design” around 18%. The latter is from a vocabulary specific of the MACE project. Another 11% of the purposes were blank. • Keywords and classifications were matched against each other for the same resources ( ~270k coincidences)

  11. GLOBE materials: Classification

  12. Interlinking to other resources:DBpedia

  13. Interlinking to other resources:DBpedia • In linked open data RDF links exposed through the web express relationships between elements • DBpedia is the central dataset and most interlinking tools are providing automated ways to interlink with this dataset • Keywords and classifications can be approached from the perspective of external data sources • Keywords and classifications could be linked to large knowledge base e.g. Dbpedia (less than 30%)

  14. Discussion and conclusion • English dominates the distribution of languages in GLOBE with a few other represented languages • There is a considerable amount of keywords generated via machine translation. • English again dominates the linguistic space of classifications • Classifications result in a more concise representation, as becomes evident with the contrast of the more than 3 million keywords (excluding machine translation) with the 1 million classification entries

  15. Discussion and conclusion • The amount of coincidence with lexical variants in DBPedia entries is limited and there is not a significant difference, so that they appear to have a similar potential for interlinking. • It is important to highlight that the coincidence analysed have been based on equal string match without any consideration of polysemy and lexical variants. • It should be noted that GLOBE has to be considered a highly heterogeneous repository in several aspects as described by Ochoa (2011), including the way the metadata is created in the repositories, ranging from automatic creation to quality-controlled, internal mechanisms

  16. Thank you

More Related