1.19k likes | 1.32k Views
Search and the ‘Net at 2004 Trends, Challenges and Cutting-Edge Developments in Internet Search Services. Michael Hunter Reference Librarian Hobart and William Smith Colleges for Rochester Regional Library Council Member Libraries’ Staff Sponsored by the Rochester Regional Library Council
E N D
Search and the ‘Net at 2004Trends, Challenges and Cutting-Edge Developments in Internet Search Services Michael Hunter Reference Librarian Hobart and William Smith Colleges for Rochester Regional Library Council Member Libraries’ Staff Sponsored by the Rochester Regional Library Council Supported by Library Services and Technology Act (LSTA) and/or Regional Bibliographic Databases and Resources Sharing (RBDB) funds granted by the New York State Library 2003
For Today …. • State of the ‘Net and its Users • Search Industry Overview • Recent Developments in Established Services • New Services • The Deep Web at 2004 • Tracking the Living Web: Weblogs and RSS • Cutting-edge Developments • Trends and Challenges to Today’s Search Services
How large is the Web? • What do you mean by the Web? • The totality of all Web sites • Sounds simple …. • BUT IS IT?
UC Berkeley’s How Much Information Projecthttp://www.sims.berkeley.edu/research/projects/how-much-info-2003/internet.htmNOTE: 10 terabytes = total print collections of the Library of Congress
“Top Ten” things our users do onlinehttp://www.pewinternet.org
“Top Ten” things our users do onlinehttp://www.pewinternet.org
Undergraduates and Search EnginesColaric, S. “Instruction for Web Searching: An Empirical Study” College and Research Libraries 64 (2) March 2003 p. 111-116
The Internet Search Industry:ConsolidationPerformance MeasuresPopularity
The Shrinking Search IndustryEditorial control of search is shared among few • Yahoo owns • AlltheWeb, Altavista, Inktomi, Overture (paid listings) • Google • MSN • AskJeeves owns Teoma • LookSmart owns Wisenut • Gigablast • NOTE: Ownership is different from database affiliation
Database Freshnesshttp://www.searchengineshowdown.com/stats/freshness.shtml • Based on a series of 6 current topic searches • Pages that are updated daily • AND report that date on the page • Queries submitted May 17, 2003
Database Freshnesshttp://www.searchengineshowdown.com/stats/freshness.shtml • Most have some results indexed in the last few days • The bulk of most of the databases is about 1 month old • Some pages may not have been re-indexed for much longer
Popularity: Searches per dayself-reported data, as of 2/28/03http://searchenginewatch.com/reports/article.php/2156461
Google • Froogle • Phonebook • Wildcard Words • Info: • Synonym feature • Supplemental Index • Search by location • News Advanced Search and News Alerts • ???
Froogle • Locates information about products for sale online • Gives URL’s of sites offering the item • Provides links to exact page in the site where you can make the purchase
Froogle • Ranking follows normal Google ranking processes • Paid placements always clearly marked • Price range limits available • Access at http://froogle.google.com or via Google Advanced Search
Phonebook Command Search • Searches US residential (rphonebook:) and business (bphonebook:) listings of Yahoo, MapQuest and other services • rphonebook: • MUST INCLUDE • Last name City and/or State • MAY INCLUDE • First name • bphonebook: • MUST INCLUDE • Business name (min. 1 word) City and/or State • MAY INCLUDE • Full Business name
Wildcard Words • Google offers a word-sized asterisk to function as a wildcard • Stands for a whole word • Cannot be used for part of a word • “three * mice” = 22,000 • “three bl* mice” = 0
Wildcard Words • Several * can be used together milosevic “International * * Hague” Retrieves military tribunal OR military court OR war tribunal OR military tribunal
info: • Not exactly hidden, but not well-known • Searches for any information Google has about a site • Convenient way to monitor linkage • Typing a URL in the search box will give the same results
Synonym Feature • Place a tilde ~ immediately before a term to retrieve synonyms or related terms from the Google Index • Eliminate the original term by placing a minus sign before it. ~hiking -hiking
Google’s Supplemental Index • For obscure or unusual searches • Queried when Google fails to find good matches within its main web index. • Live 9/9/03 • Sample queries: • “St. Andrews United Methodist Church” Homewood IL • “nalanda residential junior college” alumni • “illegal access error” jdk 1.2b4 • supercilious supernovas
Search by Location (beta) • http://labs.google.com/location • U.S. only • Keyword(s) combined with address, city, state or zip • Search results appear on a map
News Advanced Searchand News Alerts • Advanced News Search added this Fall • News Alerts • Requires a (free) account • One query per alert; limit of 50 alerts per e-mail address • Alerts contain links to news containing your alert keywords • Cannot edit a query; delete and create a new one instead • Alerts sent once a day or “as it happens”
More about Google…. • Google World http://indicateur.com • Maintained by a French Search Engine Site and listed under Guides. Use Google translator (see Language Tools) to translate the site) • Google Lab http://labs.google.com • Place for cutting edge developments, many in beta awaiting user feedback and testing.
Beyond Google: AskJeeves • Simpler, cleaner interface • Teoma crawler-based results blended with AJ “answers” • Improved image database • “Smart Answers” • Popular queries mapped to news, image and other sources “appropriate to the query”
ATW (FAST)http://alltheweb.com • Continued commitment to a large database (2nd to Google) • Powerful, new advanced search capabilities • Extensive page customization options • Results clustered by topic (“Folders”) • Both HTML and Multimedia given, when available • NOTE: Folders located at the BOTTOM of each results screen
Altavista • Simpler interface • More language options • Expanded image and multimedia collections • Results labeled“Refreshed in last 48 hours” • Includes PDF files • “US” and “Local” search options • “Prisma” query refinement
AltavistaPrisma Query Refinement • Offers a maximum of 12 terms having the strongest associations with the original query term(s) • Selected from the top 50 results of the original query • NOTE: Clicking on a “Prisma” term adds it to your original query, creating a new set of Prisma terms. • Similar to Refine (1997) but less graphic
Teoma • Ranking Includes a site’s relationship to other sites with similar content • Results • Ranked database results, with “Related Pages” • Refine • Clustering of your results and other related sites based on term relationships and web community linkages derived from your original results • Resources • “Link Collections from experts and enthusiasts” (Subject metasites)
Hotbot • Searches Hotbot (Inktomi) OR Google OR Lycos OR AskJeeves • Not a true metaengine • Advanced features operable only if supported by source engines
Metacrawler • Along with Dogpile and Webcrawler, owned by Infospace • Simpler interface • Offers the following customizations: • Selection of sources searched • Total number of results retrieved • Length of search (“time-out period”) • Offers a wide range of vertical searches: Images, MP3, Shopping, Subject Directory, Multimedia, News, Message Boards
Gigablast • Launched April, 2002 • Smaller database than others • Over 200 million on 10/4/03 • pope canterbury Google:83,200 Gigablast:24,919 • Created and maintained by Matt Wells (alone) • Only search engine “continuously updated with index refreshed in real time” (Site submissions are immediately searchable) • Ranking depends less on linkage than Google’s ranking, to avoid penalizing newer pages. • No advertising (to date)
Gigablast Search Features • Basic search Full Boolean • Advanced Search: Full Boolean and 2 (!) phrase boxes • Limit by site • Limit by domain (URL) • Links to a page available • Most “generic” html metatags indexed, searched and made available for display • Unique to Gigablast!!!
Gigablast Search Features • Field searches include title, IP address and non-html filetypes: • PDF, Word, Excel, PPT, PostScript, Ascii Text • Results from one site clustered • Cached version available • Results include date indexed and lastmodified (!!) • Linking to Gigablast improves ranking there
KillerInfohttp://www.killerinfo.com • Metaengine searching Google, AOL, Lycos, Gigablast, MSN, Altavista, LookSmart and Open Directory • 9 topical Deep Web channels offered • Boolean and phrase search • No other Advanced Search features • Results clustering (a la Vivisimo) • Number of results not given • Adult content filter
Surfwaxhttp://surfwax.com • Demo site for federated search software • Simultaneous search of Deep Web, Intranets, Web and more • Metaengine searches Wisenut, AOL, MSN, Yahoo, Incarta, CNN, LookSmart • FOCUS search refinement feature • Online thesaurus of related terms and definitions