170 likes | 381 Views
Search Engines. Vynarack Xaykao INF 385F: WIRED Dr. Turnbull September 30, 2004. Outline. Google’s origins Marketing your site to search engines Meta Search Engines (MSEs) Future of web searching. Google’s Origins. Sergey Brin & Lawrence Page (Stanford U.)
E N D
Search Engines Vynarack Xaykao INF 385F: WIRED Dr. Turnbull September 30, 2004
Outline • Google’s origins • Marketing your site to search engines • Meta Search Engines (MSEs) • Future of web searching
Google’s Origins • Sergey Brin & Lawrence Page (Stanford U.) • Dark arts: advertiser-driven search engines • Up to academics to make good engines
Google focused on basic elements of IR • Content: scalability (though perfect recall is impossible) • Relevance: PageRank • Information need • Similar pages • Stemming (bowl, bowling, bowler)
PageRank Factors • Number of links pointing to a site • PageRanks of referring pages Can you think of a disadvantage of using PageRank to order results?
Google Ranking • Classify words in hit list by type • Relative font size • HTML tags • Position • IR score: count-weights & type-weights • Final rank: IR score & PageRank
Marketing your site to search engines • search engine optimization: use keywords • directory submission & link development • pay-for-placement campaigns: top position guaranteed (Overture) • trusted feed and paid inclusion programs: guaranteed frequent indexing, top placement not guaranteed
Meta Search Engines • Search several engines simultaneously Pros • Saves the searcher time • Relevant results Cons • Engines accept different syntax • Searches can be slow and time out
Types of Meta Search Engines • Real MSEs: combine results from different engines (Vivisimo) • Pseudo MSEs type I:groups the results by search engine (My Net Crawler) • Pseudo MSEs type II: opens a window for each search engine (Multi-Search-Engine.com) • Search Utilities:software that searches engines (Copernic)
Future of Web Searching • Search engines give people starting points • Hard part is using sites themselves • Card & Pirolli’s information foraging theory • Maximum benefit for minimum effort • Information has a scent • Don’t want user to resort to the site search
Next Generation Web Searching “We would like a train system that magically lays down new track to suggest useful directions to go based on where we have been so far and what we are trying to do.” (Hearst, 2002, p. 3) • How?
Metadata • Types of Metadata • Creation • Descriptive • Administrative • Good for searching collections of similar items (recipes) • Searching metadata yields higher relevance
Faceted classification S. R. Ranganathan’s Colon Classification (1933) • Example: design of wooden furniture in 18th century America • personality : furniture • matter : wood • energy : design • space : America • time : 18th century
Next Generation Web Searching • Figure out people’s tasks • Ideal site incorporates • metadata using facets for browsing • search tool for refining
Additional References Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the Web. Retrieved September 29, 2004, from http://dbpubs.stanford.edu:8090/aux/index-en.html Pirolli, P. and Card, S. K. (1995). Information foraging in information access environments.ACM Conference on Human Factors in Software (CHI '95), Denver, Colorado 51–58. Steckel, M. (2002, October 7). Ranganathan for IAs. Boxes and Arrows. Retrieved September 26, 2004, from http://www.boxesandarrows.com/archives/ ranganathan_for_ias.php