460 likes | 592 Views
Metadata domain-Knowledge Driven Search engine in “hypermanymedia” e-learning Resources University of Louisville Dept. of Computer Engineering and Computer Science & Western Kentucky University Office of distance Learning. Prof . Robert Wyatt. Overview. Motivation & Insight
E N D
Metadata domain-Knowledge Driven Search engine in “hypermanymedia” e-learning Resources University of LouisvilleDept. of Computer Engineering and Computer Science & Western Kentucky UniversityOffice of distance Learning Prof. Robert Wyatt
Overview • Motivation & Insight • What is the problem? Why is it an Interesting problem? • Proposed Architecture • System Implementation • Evaluation Methodology • Evaluation measures • Results • Conclusion and Future works
Motivation “Nearly 30% of all U.S. higher education students were taking at least one online course in the fall of 2007, a nearly 20% increase over the number reported the previous year” http://www.sloan-c.org/publications/survey/
What is the problem? Why is it an Interesting problem? • Western Kentucky University hosts a ”HyperManyMedia” open-source repository of lectures. • Thousands of online lectures are available in different formats: text, power-point, audio, video, podcast, vodcast, and RSS. • This web-based platform is a main medium of communication between WKU online faculty and online students.
What is the problem? Why is it an Interesting problem? • Searching for a specific college, course name, topic, media format is time consuming, and the results are not always accurate. • Searching for combinations of results is impossible (e.g., finding all video lectures in the business college related to accounting).
System Implementation 1) Domain-knowledge Extraction • As of November 2007, more than 2400 lectures from 11 different colleges: “English”, “Social Work”, “History”,” Chemistry”, “Accounting”, “Math”, “Management”, “Consumer and Family Sciences”, “Architect and Manufacturing Sciences”, “Engineering "and “Communication Disorders" • Each lecture is delivered in six different media formats: • Text • Powerpoint • Audio • Video • Podcast • Vodcast
System Implementation 2) Parsing Learning Objects (Lectures) and Adding Metadata • Parsing the webpages (lectures) • Adding metadata • college name • course name • professor name • lecture name • media format
System Implementation • Nutch searches and indexes components with a powerful fetcher (crawler robot), which is designed to handle crawling, indexing, and searching of several billion frequently updated web pages. • Nutch search engine was implemented in two stages: • first as a ``Generic'' search engine; • second, as an enhanced ``Metadata'' search engine.
System Implementation • Nutch Scoring is based on a combination of the Vector Space Model (VSM) and the Boolean Model. • It applies the Boolean Model first to select the most relevant documents for the query; then, it uses the Vector Space Model as a content-based ranking algorithm.
System Implementation Search Engine Scoring Mechanism
System Implementation 3) Re-conguring Search Engine Scoring • Modified Nutch Search Engine's Boosting Mechanism We changed Nutch's boosting algorithm to accommodate metadata, knowing that Nutch uses (3) • We modiedNutch's boosting score as shown in(4)
System Implementation Designing and Embedding the Parser, Indexer, and Query Plugins
System Implementation 4- Encapsulating the Metadata Search engine within the “HyperManyMedia" Platform
System Implementation 4- Encapsulating the Metadata Search engine within the “HyperManyMedia" Platform
Evaluation Methodology Research Questions: 1) Will there be an increase in precision when using the metadata search engine compared to the generic search engine? 2) Will relevant documents be ranked higher when using the metadata search engine?
Evaluation Methodology • Selection of Queries: • A great deal of research on search engine queries has found that searchers rarely use Boolean operators ; typically, this usage is around 10% . Another study observed that the highest distribution of the number of terms in queries range between 1 and 3, and these are primarily noun phrases. • Accordingly, we ran our comparison between the two search engines (generic) and (metadata) based on ``single-term'', ``two-terms'', and ``three-term'' queries without Boolean operators. • query logs containing queries submitted to our “HyperManyMedia'' search engine during two semesters (fall & winter terms in 2007-2008).
Evaluation Measures • Most of the ranking algorithms evaluate the ranking quality based on precision and recall • One of the limitations of the recall measure is the difficulty of counting the number of relevant documents in the corpus. • We used a new algorithm SEREET for ranking efficiency, which was recently proposed. • This algorithm evaluates the performance of search engines based on a comparison between the order of relevant documents and retrieved documents. • This algorithm starts at 100 points in the top of the rank and deducts points each time that a relevant document is not found.
Evaluation Methodology 2) Precision: Precision is the ratio of the number of relevant documents to all retrieved documents (5)
Evaluation measures 3) Selection of Ranking Algorithm:
Precision Results Will there be an increase in precision when using the metadata search engine? We found that the metadata-driven search engine has a significant impact on the precision with overall precision values equal to 0.810 (for single-term queries), 0.856 (for two-term queries), and 0.925 (for three-term queries), compared to 0.619 (for single-term queries), 0.717 (for two-term queries), and 0.851 (for three-term queries) for the generic search engine.
SEREET Results (2) Will relevant documents be ranked higher when using a metadata search engine? We found that the metadata-driven search engine has a significant impact on the ranking performance with overall values of SEREET equals to 0.803 (for single-term queries), 0.846 (for two-term queries), and 0.914 (for three-term queries), compared to 0.597 (for single-term queries), 0.684 (for two-term queries), and 0.834 (for three-term queries)) for the generic search engine.
Conclusion • In this work, we presented a metadata domain-knowledge driven search engine in ``HyperManyMedia'' E-learning resources. • Our results of Precision and SEREET ranking showed a significant improvement in retrieving relevant resources to the submitted queries when we used the metadata search engine.
Future Work • Hybrid metadata and a semantically enriched search engine which will be built on top of the domain-knowledge (of E-learning) • Personalized Ontology learners' profiles • Visualize online students communities with their associated learning objects and their relationships.