230 likes | 339 Views
WWW Search and Navigation. Mark Levene SCIS, Birkbeck College University of London www.dcs.bbk.ac.uk/~mark/. Talk Overview. Hypertext and the navigation problem NavigationZone ’s solution Problems being researched A Demonstration. Hypertext and Navigation. Long history
E N D
WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London www.dcs.bbk.ac.uk/~mark/
Talk Overview • Hypertext and the navigation problem • NavigationZone’s solution • Problems being researched • A Demonstration
Hypertext and Navigation • Long history • Bush 1945, memex – trail blazing • Nelson 1965, Xanadu - network of documents • Problem of “getting lost in hyperspace” • Navigation aids • Bookmarks • History • Overview diagrams • Recommendations
State-of-the-Art Navigation Aids • Novel User-Interfaces to visualise web sites • Clustering (e.g. Self-Organising Maps) • Web data mining – finding user patterns • Semi-automated navigation, BestTrail algorithm – motivation to follow …
A typical search scenario • Submit a query to a search engine • Is it too broad / too specific? • Does it capture my information needs? • Select a URL from the result set • Have I made the right choice? • Start manual navigation • Where - am I? have I come from ? am I going to ? • Goto (1) to reformulate the query
e d a e d * a b c Content centric approach
Problems with standard Search • Page level relevance scoring • sensitive to query terms • No look ahead • ‘click and discover’ • No context • results are totally isolated • No navigation support • Users are left on their own to find their way
Possible solutions (information retrieval) • Improve basic IR • Link analysis, e.g. pagerank and HITS • Meta data tagging • Keywords and taxonomies (semantic web) • Natural language • Q&A, sentence analysis, synonyms
Possible solutions (information seeking) • Suggestion engines • Link and content generation • Categories and directories • Explicit manual construction • Automatic classification • Machine learning techniques
Are these feasible? • Re-architecting corporate information infrastructure is extremely expensive • Sophisticated approaches are not always intuitive and are yet to be proven • Same problem every couple of years • Mergers and acquisitions
There is, actually, a better way! • Treat sequence of pages, or trails, as first-class citizens for search • Consider the topology of the area in which you are searching • Employ navigational aids
e d * a b e c e d * a b d * a b c c Context centric approach
The information value of a trail is higher than the sum of it parts!
Our approach • Provide information retrieval of the highest quality and in addition, • Find out what is beyond the most relevant pages by ‘exploring the area’ • Present users with precise and relevant trails • Provide navigation assistance within the UI
NavZone Usability Study First Monday paper Task – find answers to 5 types of questions Fact Finding – What are the term dates? Judgement – Is CSIS a “good” place to do research? Fact Comparison – Which train station is closest to the college? Judgement Comparison – Is the research in deptA better than that in deptB? General Navigational – How do you get to the checkout?
NavZone vs. Google and Compass % of subjects, 4+ questions correct • 59% Google • 75% Compass • 83% NavZone
Average # clicks to complete task • 44 Google • 40 Compass • 27 NavZone NavZone is bandwidth “green” !
Average time taken per task (min) • 18 Compass • 17 Google • 13 NavZone Wilcoxon Test - Statistically Significant
user interface BestTrail crawler indexer trail engine web graph user interface robot Parser HTML, XML, PDF, PostScript, Word, Other BestTrail postprocessor web graph generic format inverted file The main ingredients
Under Development • Alternative User-Interfaces • Seamless integration with relational databases and file systems • Data mining and personalisation • Mobile/PDA support
Open Problem • How do we make use of statistical regularities that are present in the web to improve search and navigation? • See, Levene et al. A stochastic model for the evolution of the web., Condensed Matter Archive, cond-mat/0110016, 2001- many distributions related to the web graph follow a power law