140 likes | 151 Views
Learn about the integration of collaborative filtering with search engine technology to improve search result rankings, as explored in the study by Eugene Cushman, Dan Murphy, George Stuart, and Prof. Mark Claypool.
E N D
Smarter Search Engines Using Personalization to Improve Search Results Eugene Cushman Dan Murphy George Stuart Advised by Professor Mark Claypool
The Problem • There are billions of web pages on the Internet • They vary greatly in quality • Growth is Exponential • Search engines must adapt to keep up
Existing Systems • Google • Layered Architecture • PageRank™ • GroupLens • Applied to USENET • Different domain space • Uses collaborative filtering
Personalization • “Qualitative” rankings • Example: “Good Low-Fat Dessert Recipes” • Example: “Theories of dinosaur extinction” • Contrast with specific, factual searches • Example: “The batting lineup for the Boston Red Sox on October 28, 1986” • Exploratory versus “narrow-band” searches
Collaborative Filtering • Uses aggregate data to predict user preference • User A like Foo • User B trusts User A’s preference • User B can be predicted to prefer Foo • (extremely simplified) • Algorithms • Pearson Correlation Coefficient
Foible: the best of both worlds • Foible integrates disparate technologies to provide a powerful web-searching experience • Search Engine Indexing • Collaborative Filtering • Results in demonstrable improvement in search results
Foible Architecture • Spider • Analyzer • Cache • Collaborative Engine • Search Engine • Web Interface
Web Spider • Parallelized Depth-first crawl of web • Create lists of nodes by parsing HTML, looking for links • Starts with link-heavy “seed node” • Custom seed node incorporating search results on “dinosaurs” from Yahoo, Google, and others • Foible Statistics • Over 27,000 web pages crawled • In excess of 500 Megs of web data cached • Total database size of 1 Gigabyte • 7.269 Million rows in Word Frequency table
Analyzer • Parses HTML to create describe attributes of web page • Document Size, Number of Sentences • Reading Level (Fog, Flesch-Kincaid) • Number of Images • Content-to-HTML ratio • Number of Links • Precomputes word-frequency tables
Collaborative Searching • Three components of search algorithm • Word Frequency • Profile Correlation • Recommender System • Computes ranking of all pages • Returns results to user
User Study • Approximately 50 Users • 20 Completed study in its entirety • Consisted of 5 Searches • Predefined broad topics • Users provided explicit feedback • Search results presented in two column format • Enhanced Collaborative Results • Control – Word Frequency Only
Results and Conclusion • Users unanimously prefer collaborative ratings to non-collaborative • Smarter searches produced pages ranked in better order according to study • Introducing collaborative filtering into traditional search engine technology results in better search results!