110 likes | 207 Views
Vertical Search for Courses of UIUC. by Jessica Bell, Alexander Loeb, Sharon Paradesi, Michael Paul, Jing Xia, Jie Zhang. Demo. http://greedy.cs.uiuc.edu/dssi/course/search.php. Goals of the project.
E N D
Vertical Search for Courses of UIUC by Jessica Bell, Alexander Loeb, Sharon Paradesi, Michael Paul, Jing Xia, Jie Zhang
Demo http://greedy.cs.uiuc.edu/dssi/course/search.php
Goals of the project • construct a database of UIUC courses across all departments ultimately creating a centralized knowledgebase about each course. • augment the database by drawing relations between courses both within and between departments and further by finding similarities among courses outside of the University of Illinois.
Architecture DATABASE Query by Course Name Instructor Description … PHP script DATA SOURCE Course Catalog Basic Course Info Book Info Course homepage PHP JAVA script Book Store Heritrix WEKA Webpages Keywords Other Universities AgentIDE Related Courses
Tools used • Web Crawling • Wget, AgentIDE and Heritrix • Parsers • Python and Java • Learning Tools • WEKA • Website Design • PHP and MySQL
Tasks finished • Data Mining – • Basic course information • Similar course recommendation • Prerequisite course list • Recommended book information • Learning – • Clustering • Classification
Keywords • Pull from course descriptions • Remove uninformative/common words
Search • Search by name, instructor, or content • Clean up search string • “cs125” becomes “CS 125” • “real-time” becomes “real time realtime” • Split search string into individual words and query database for word matches • Score and rank results by match frequencies and keyword informativeness scores • Look at distribution of scores and display the top results
Classification • NBTree Classifier • Training set: 34 instances • Test set: 38 instances • Attributes: 17 • Accuracy - 94.74% • Precision - 0.947 • Recall - 0.947 • F-Measure - .947
Clustering • Cobweb Clustering Algorithm • Instances: 20 • Attributes: 112 • Number of clusters: 17 • Incorrectly clustered instances: 7.0 (i.e. 35%)