300 likes | 420 Views
Search Text Mining Web Site Usability. Marti Hearst SIMS. BAILANDO Projects. Better Access to Information using Language Analysis and Novel Dynamic Organizations. Current BAILANDO Projects. CHA-CHA & FLAMENCO: Better Search Interfaces LINDI: UI support for Search
E N D
SearchText MiningWeb Site Usability Marti Hearst SIMS UCB CS Research Fair
BAILANDO Projects Better Access to Information using Language Analysis and Novel Dynamic Organizations UCB CS Research Fair
Current BAILANDO Projects • CHA-CHA & FLAMENCO: • Better Search Interfaces • LINDI: • UI support for Search • Text Data Mining • TANGO: • Automated Web Site Usability UCB CS Research Fair
Search UIs Combine Browsing & Search Place Search Results in Context Large Category Hierarchies UCB CS Research Fair
Cha-Cha Students: Mike Chen, Jamie Laflen, Jason Hong, Jimmy Lin, Shiang Chen UCB CS Research Fair
Medical Category Hierarchy UCB CS Research Fair
DynaCat (Pratt, Hearst, & Fagan 99) UCB CS Research Fair
DynaCat Study • Design • Three queries • 24 cancer patients • Compared three interfaces • ranked list, clusters, categories • Results • Participants strongly preferred categories • Participants found more answers using categories • Participants took same amount of time with all three interfaces • Similar results have been verified by another study by Chen and Dumais (CHI 2000) UCB CS Research Fair
FLAMENCO:Improving Search via Large Category Hierarchies • How to show intersections across category types? • How to preview related categories in a user-tailored, dynamic manner? UCB CS Research Fair
Text Data Mining Relationships between information in documents can create new facts, not previously known. UCB CS Research Fair
Imagine You are a medical researcher Your patient has spinal inflammation numbness in fingers low TC levels negative results for all tests How can you help her? UCB CS Research Fair
Idea A new way of searching text. Link pieces of information together to formulate hypotheses … UCB CS Research Fair
LINDILinking Information for New DIscoveries • Three main parts • Search UI for building and reusing hypothesis seeking strategies. • Statistical language analysis techniques for interpreting the text. • Backend for interfacing with various databases and translating different formats. UCB CS Research Fair
Gathering Evidence Spinal Inflammation Numbness in fingers Low TC Levels UCB CS Research Fair
Gathering Evidence Find diseases associated with each Spinal Inflammation Numbness in fingers Low TC Levels UCB CS Research Fair
Spinal Inflammation Numbness in fingers Low TC Levels Supporting Cascaded Search Operations UCB CS Research Fair
New Language Analysis • First use category labels to retrieve candidate documents • Then use language analysis to detect causal relationships between concepts • Title: • Magnesum deficiency implicated in increased stress levels. • Interpretation: • <nutrient><reduction> related-to <increase><symptom> • Use these to find relationships and formulate hypotheses UCB CS Research Fair
Statistical Semantic Parsing • Modern statistical techniques • Mainly applied to syntactic structure • Probabilistic knowledge representation • Represent hypotheses with different degrees of certainty. UCB CS Research Fair
Automating Assessment of Web Site Usability UCB CS Research Fair
Why Worry? • Problem: IBM's extranet • Heavy use of help and search • Unhappy users • Solution • Massive web site redesign • Focus on info-organization, not the purchasing process. • Cost: "in the millions" • Results • Not announced or trumped up • Use of "help" decreased 84% • Sales increased 400% UCB CS Research Fair
Web TANGOTool for Assessing NaviGation & Organization • Goal: automated support for comparing design alternatives • How: Assess usability of the information architecture • Approximate people’s information-seeking behavior (Monte Carlo simulation) • Output quantitative usability metrics UCB CS Research Fair
Guidelines • There are many usability guidelines • A survey of 21 sets of web guidelines found little overlap (Ratner et al. 96) • Why? • Our hypothesis: not empirically validated • So … let’s figure out what works! UCB CS Research Fair
An Empirical Study: Which features distinguish well-designed web pages? UCB CS Research Fair
Methodology • Data collection • 1108 pages • 163 sites • 3 levels per site • 14 metrics • About 85% accurate • Text cluster and text positioning counts less accurate UCB CS Research Fair
Metrics UCB CS Research Fair
Preliminary Results • Linear regression to predict Webby judges ratings • Top 30% vs bottom 30% • Prediction accuracy: • 72% if categories not taken into account • 83% if categories assessed separately UCB CS Research Fair
Goals • Create empirical foundations for what is still guesswork • Next step: • A free online tool • Long term goal: • An monte carlo simulator for comparing potential designs UCB CS Research Fair
For More Information http://webtango.berkeley.edu hearst@sims.berkeley.edu UCB CS Research Fair