1 / 119

Michael Hunter Reference Librarian Hobart and William Smith Colleges

Search and the ‘Net at 2004 Trends, Challenges and Cutting-Edge Developments in Internet Search Services. Michael Hunter Reference Librarian Hobart and William Smith Colleges for Rochester Regional Library Council Member Libraries’ Staff Sponsored by the Rochester Regional Library Council

matteo
Download Presentation

Michael Hunter Reference Librarian Hobart and William Smith Colleges

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Search and the ‘Net at 2004Trends, Challenges and Cutting-Edge Developments in Internet Search Services Michael Hunter Reference Librarian Hobart and William Smith Colleges for Rochester Regional Library Council Member Libraries’ Staff Sponsored by the Rochester Regional Library Council Supported by Library Services and Technology Act (LSTA) and/or Regional Bibliographic Databases and Resources Sharing (RBDB) funds granted by the New York State Library 2003

  2. For Today …. • State of the ‘Net and its Users • Search Industry Overview • Recent Developments in Established Services • New Services • The Deep Web at 2004 • Tracking the Living Web: Weblogs and RSS • Cutting-edge Developments • Trends and Challenges to Today’s Search Services

  3. The Internet and its Users at 2004

  4. How large is the Web? • What do you mean by the Web? • The totality of all Web sites • Sounds simple …. • BUT IS IT?

  5. UC Berkeley’s How Much Information Projecthttp://www.sims.berkeley.edu/research/projects/how-much-info-2003/internet.htmNOTE: 10 terabytes = total print collections of the Library of Congress

  6. Internet Use Worldwide

  7. Internet Use in the UShttp://www.pewinternet.org

  8. Internet Use in the UShttp://www.pewinternet.org

  9. “Top Ten” things our users do onlinehttp://www.pewinternet.org

  10. “Top Ten” things our users do onlinehttp://www.pewinternet.org

  11. Undergraduates and Search EnginesColaric, S. “Instruction for Web Searching: An Empirical Study” College and Research Libraries 64 (2) March 2003 p. 111-116

  12. The Internet Search Industry:ConsolidationPerformance MeasuresPopularity

  13. The Shrinking Search IndustryEditorial control of search is shared among few • Yahoo owns • AlltheWeb, Altavista, Inktomi, Overture (paid listings) • Google • MSN • AskJeeves owns Teoma • LookSmart owns Wisenut • Gigablast • NOTE: Ownership is different from database affiliation

  14. GoogleDatabase Affiliates

  15. Database Freshnesshttp://www.searchengineshowdown.com/stats/freshness.shtml • Based on a series of 6 current topic searches • Pages that are updated daily • AND report that date on the page • Queries submitted May 17, 2003

  16. Database Freshnesshttp://www.searchengineshowdown.com/stats/freshness.shtml • Most have some results indexed in the last few days • The bulk of most of the databases is about 1 month old • Some pages may not have been re-indexed for much longer

  17. Popularity: Searches per dayself-reported data, as of 2/28/03http://searchenginewatch.com/reports/article.php/2156461

  18. Recent Developments among Established Services

  19. Google • Froogle • Phonebook • Wildcard Words • Info: • Synonym feature • Supplemental Index • Search by location • News Advanced Search and News Alerts • ???

  20. Froogle • Locates information about products for sale online • Gives URL’s of sites offering the item • Provides links to exact page in the site where you can make the purchase

  21. Froogle • Ranking follows normal Google ranking processes • Paid placements always clearly marked • Price range limits available • Access at http://froogle.google.com or via Google Advanced Search

  22. Phonebook Command Search • Searches US residential (rphonebook:) and business (bphonebook:) listings of Yahoo, MapQuest and other services • rphonebook: • MUST INCLUDE • Last name City and/or State • MAY INCLUDE • First name • bphonebook: • MUST INCLUDE • Business name (min. 1 word) City and/or State • MAY INCLUDE • Full Business name

  23. Wildcard Words • Google offers a word-sized asterisk to function as a wildcard • Stands for a whole word • Cannot be used for part of a word • “three * mice” = 22,000 • “three bl* mice” = 0

  24. Wildcard Words • Several * can be used together milosevic “International * * Hague” Retrieves military tribunal OR military court OR war tribunal OR military tribunal

  25. info: • Not exactly hidden, but not well-known • Searches for any information Google has about a site • Convenient way to monitor linkage • Typing a URL in the search box will give the same results

  26. Synonym Feature • Place a tilde ~ immediately before a term to retrieve synonyms or related terms from the Google Index • Eliminate the original term by placing a minus sign before it. ~hiking -hiking

  27. Google’s Supplemental Index • For obscure or unusual searches • Queried when Google fails to find good matches within its main web index. • Live 9/9/03 • Sample queries: • “St. Andrews United Methodist Church” Homewood IL • “nalanda residential junior college” alumni • “illegal access error” jdk 1.2b4 • supercilious supernovas

  28. Search by Location (beta) • http://labs.google.com/location • U.S. only • Keyword(s) combined with address, city, state or zip • Search results appear on a map

  29. News Advanced Searchand News Alerts • Advanced News Search added this Fall • News Alerts • Requires a (free) account • One query per alert; limit of 50 alerts per e-mail address • Alerts contain links to news containing your alert keywords • Cannot edit a query; delete and create a new one instead • Alerts sent once a day or “as it happens”

  30. More about Google…. • Google World http://indicateur.com • Maintained by a French Search Engine Site and listed under Guides. Use Google translator (see Language Tools) to translate the site) • Google Lab http://labs.google.com • Place for cutting edge developments, many in beta awaiting user feedback and testing.

  31. Beyond Google: AskJeeves • Simpler, cleaner interface • Teoma crawler-based results blended with AJ “answers” • Improved image database • “Smart Answers” • Popular queries mapped to news, image and other sources “appropriate to the query”

  32. ATW (FAST)http://alltheweb.com • Continued commitment to a large database (2nd to Google) • Powerful, new advanced search capabilities • Extensive page customization options • Results clustered by topic (“Folders”) • Both HTML and Multimedia given, when available • NOTE: Folders located at the BOTTOM of each results screen

  33. Altavista • Simpler interface • More language options • Expanded image and multimedia collections • Results labeled“Refreshed in last 48 hours” • Includes PDF files • “US” and “Local” search options • “Prisma” query refinement

  34. AltavistaPrisma Query Refinement • Offers a maximum of 12 terms having the strongest associations with the original query term(s) • Selected from the top 50 results of the original query • NOTE: Clicking on a “Prisma” term adds it to your original query, creating a new set of Prisma terms. • Similar to Refine (1997) but less graphic

  35. Teoma • Ranking Includes a site’s relationship to other sites with similar content • Results • Ranked database results, with “Related Pages” • Refine • Clustering of your results and other related sites based on term relationships and web community linkages derived from your original results • Resources • “Link Collections from experts and enthusiasts” (Subject metasites)

  36. Hotbot • Searches Hotbot (Inktomi) OR Google OR Lycos OR AskJeeves • Not a true metaengine • Advanced features operable only if supported by source engines

  37. Metacrawler • Along with Dogpile and Webcrawler, owned by Infospace • Simpler interface • Offers the following customizations: • Selection of sources searched • Total number of results retrieved • Length of search (“time-out period”) • Offers a wide range of vertical searches: Images, MP3, Shopping, Subject Directory, Multimedia, News, Message Boards

  38. New Services Attracting Attention

  39. Gigablast • Launched April, 2002 • Smaller database than others • Over 200 million on 10/4/03 • pope canterbury Google:83,200 Gigablast:24,919 • Created and maintained by Matt Wells (alone) • Only search engine “continuously updated with index refreshed in real time” (Site submissions are immediately searchable) • Ranking depends less on linkage than Google’s ranking, to avoid penalizing newer pages. • No advertising (to date)

  40. Gigablast Search Features • Basic search Full Boolean • Advanced Search: Full Boolean and 2 (!) phrase boxes • Limit by site • Limit by domain (URL) • Links to a page available • Most “generic” html metatags indexed, searched and made available for display • Unique to Gigablast!!!

  41. Gigablast Search Features • Field searches include title, IP address and non-html filetypes: • PDF, Word, Excel, PPT, PostScript, Ascii Text • Results from one site clustered • Cached version available • Results include date indexed and lastmodified (!!) • Linking to Gigablast improves ranking there

  42. KillerInfohttp://www.killerinfo.com • Metaengine searching Google, AOL, Lycos, Gigablast, MSN, Altavista, LookSmart and Open Directory • 9 topical Deep Web channels offered • Boolean and phrase search • No other Advanced Search features • Results clustering (a la Vivisimo) • Number of results not given • Adult content filter

  43. Surfwaxhttp://surfwax.com • Demo site for federated search software • Simultaneous search of Deep Web, Intranets, Web and more • Metaengine searches Wisenut, AOL, MSN, Yahoo, Incarta, CNN, LookSmart • FOCUS search refinement feature • Online thesaurus of related terms and definitions

More Related