160 likes | 234 Views
Exploring the keyword space in large learning resource aggregations: the case of GLOBE. Miguel‐Angel Sicilia, Salvador Sánchez ‐Alonso, Elena Garcia‐Barriocanal, Julià Minguillón , Enayat Rajabi. LACRO'13 workshop April 2013, Leuven, Belgium. Agenda. Introduction GLOBE materials
E N D
Exploring the keyword space in large learning resource aggregations: the case of GLOBE Miguel‐Angel Sicilia, Salvador Sánchez‐Alonso, Elena Garcia‐Barriocanal, JuliàMinguillón, EnayatRajabi LACRO'13 workshop April 2013, Leuven, Belgium
Agenda • Introduction • GLOBE materials • Keywords and classifications • Interlinking to Linked Open Data • Discussion and conclusion
Introduction • Huge number of e-learning resources available on-line, for free or by subscription • Several initiatives aim at federating e-learning systems to unlock the educational content hidden in their repositories (e.g. GLOBE ) • The use of the IEEE LOM standard + OAI‐PMH has • facilitated the deployment of such collections
Background • How different metadata elements properly describe and categorize the resource space • IEEE LOM proposes around 50 different elements including keyword and classification: • keywords are intended for the description of topics in any existing language • classification refers to classifying the Los • Some experimental studies exist on actual use of IEEE LOM ( e.g. Friesen (2004), Ochoa et al (2011))
GLOBE Materials • GLOBE(Global Learning Objects Brokered Exchange) enables share and reuse between several learning object repositories • We harvested GLOBE through OAI-PMH and got around 770,000 metadata records • Most frequent language is English (also pointed out Ochoa 2011) , while large amount of resource has no language declared
GLOBE materials: Keyword • There exist around 5,5 million keywords in the sample ( ~ 7 keywords per resource) • Large number of keywords generated via machine translation (referenced by codes starting with “x-mt-”) • There are also around 3,2 million records seem generated by human practices ( ~ 4 keywords per resource) • Frequencies are high for relatively high number of keywords (beyond 15) (might be attributed to automated extraction)
GLOBE materials: Classification • A total of ~ 700k classifications distributed across ~500k resources were found with ~1 million taxon entries • About 92% of all the resources have at most two • classifications, and only 187 resources have more than 10. • There were only 43 different classification purposes found, with discipline being “discipline” a 60% and “Technical design” around 18%. The latter is from a vocabulary specific of the MACE project. Another 11% of the purposes were blank. • Keywords and classifications were matched against each other for the same resources ( ~270k coincidences)
Interlinking to other resources:DBpedia • In linked open data RDF links exposed through the web express relationships between elements • DBpedia is the central dataset and most interlinking tools are providing automated ways to interlink with this dataset • Keywords and classifications can be approached from the perspective of external data sources • Keywords and classifications could be linked to large knowledge base e.g. Dbpedia (less than 30%)
Discussion and conclusion • English dominates the distribution of languages in GLOBE with a few other represented languages • There is a considerable amount of keywords generated via machine translation. • English again dominates the linguistic space of classifications • Classifications result in a more concise representation, as becomes evident with the contrast of the more than 3 million keywords (excluding machine translation) with the 1 million classification entries
Discussion and conclusion • The amount of coincidence with lexical variants in DBPedia entries is limited and there is not a significant difference, so that they appear to have a similar potential for interlinking. • It is important to highlight that the coincidence analysed have been based on equal string match without any consideration of polysemy and lexical variants. • It should be noted that GLOBE has to be considered a highly heterogeneous repository in several aspects as described by Ochoa (2011), including the way the metadata is created in the repositories, ranging from automatic creation to quality-controlled, internal mechanisms