110 likes | 218 Views
Inside Internet Search Engines: Products. William Chang and Jan Pedersen. Web Oracle One, Two, Three... . Network of computers? Network of hypertext? Network of people? Internet...is a place where you can always find someone to help answer any question, or get anything done.
E N D
Inside Internet Search Engines:Products William Chang and Jan Pedersen Sigir’99
Web Oracle One, Two, Three... • Network of computers? • Network of hypertext? • Network of people? • Internet...is a place where you can always find someone to help answer any question, or get anything done. • Productize that! Sigir’99
Who’s Who and What’s What? • Query logs • what do people look for, besides sex? • What are indexible terms unbounded? • Can you index all possible phrases? • Formatting cue helps • Syntax helps • Stemming helps • Precision vs recall • WordNet -> PhraseNet? Sigir’99
Who Likes What? • Too many hits! • the problem of indistinguishable scores • Spamming • the relevant and irrelevant • The web to the rescue • inside-out indexing Sigir’99
Citation Index or Popularity Contest? • Counting hyperlinks • Avoiding double-counting • Site clustering; what’s a site? • Judging the source • Hyperlinks revisited • Anchor text context; Yanhong Li • Why is this result hard to duplicate? • Does adding more context help? Sigir’99
Who asks What? • Query logs revisited • Query-based indexing – why index things people don’t ask for? • If they ask for A, give them B • From atomic concepts to query extensions • Structure of questions and answers • Shyam Kapur’s chunks Sigir’99
FAQs and not so FAQs • Usenet FAQs –Robin Burke’s FAQFinder • FAQ discovery • Where are the answers? Sigir’99
Indexing • Different ways of crawling the web • Frequency of change • Frequency of request • Managing Terabytes or GigaURLs? • Real-time indexing Sigir’99
Searching • Multiway merge and scoring • Logical operations • Query parsing and phrase searching • Query refinement • Distributed searching and the perfect merge Sigir’99
Design Issues • Managing complexity • Managing memory • Managing parallelism • Managing data turnover • Managing scalability Sigir’99
Futures • Vertical markets – healthcare, real estate, jobs and resumes, etc. • Localized search • Search as embedded app • Shopping 'bots • Open Problems • Has the bubble burst? Sigir’99