160 likes | 217 Views
Week 9. Search Engines and the Invisible Web. Resource Pages. Collections of Links Compiled by “experts” Sometimes annotated Targeted Information for a Specific User Group Examples: Voice of the Shuttle : http://vos.ucsb.edu/
E N D
Week 9 Search Engines and the Invisible Web
Resource Pages • Collections of Links • Compiled by “experts” • Sometimes annotated • Targeted Information for a Specific User Group • Examples: • Voice of the Shuttle: http://vos.ucsb.edu/ • Computer Science Research Guide: http://guides.library.cmu.edu/SCS
Anatomy of a Search Engine • Basically, there are three parts to a search engine: • “Spider” or “Crawler” • - Finds the pages • - Brings them home • “Index” or “Database” • - Storehouse of pages • - Size matters, frequency of updates matters • “Search Tool” • - What we use to find the pages in the engine’s index • - This is the user interface; the only part we see
How Search Engines Rank Pages • Relevance retrieval • Location of search terms • Frequency of search terms • Meta-tags (in the HTML source code of a Web page)
Other Ranking Methods • Positions of Words • Term Co-Occurrence • Proximity • Pay for Placement • “Featured Web Sites!” • Link Analysis • Search Engine Showdown Chart: • http://www.searchengineshowdown.com/features/
What Many Search Engines Cannot Find • Some file types: some engines can, some cannot • Dynamically-generated pages • Pages locked behind firewalls or in fee-based online • databases (such as Dialog) • Lots of the “Deep Web” stuff: • http://www.completeplanet.com
Differences Between the “Deep Web” and Search Engine Results The Deep Web is another phrase for the Invisible Web • Deep Web resources are usually: • Subject specific / more focused • Less content but tends to be of higher quality • Updated more frequently • Have specialized search interfaces • Have a target audience in mind
Overview of the Deep Web • What is Still Invisible: • Disconnected, loose pages • Password-protected pages and sites • “robots.txt” files • Dynamically-created pages: no static URLs • Information bound in database structures that • are uncrawlable by many search engines
When to Consider the Deep Web • When you are familiar with a topic • When you want authoritative information • When you want specific information • When you want timely information
Clinical Trials • Environmental Information • Grant Information • Historical Documents and Images • Art Collections • Patents • Demographic and Economic Data • Government Information Popular Deep Web Information
Look at SomeDeep Web Resources • Salary.com Database • http://www.ecomponline.com/ • U.S. Patent & Trademark Officehttp://www.uspto.gov • Los Angeles Municipal Codehttp://www.municode.com/Library/clientCodePage.aspx?clientID=6662
How to Find the Deep Web • Use a search engine: search “database” as a term • Use a print directory: try OCLC WorldCat to find • those specific to your subject need • Ask your colleagues • Take note in the professional literature
How to Find the Deep Web (cont.) • Use Alerting Services: • The Scout Report (Internet Scout Project) • http://scout.wisc.edu/ • INFOMINEhttp://infomine.ucr.edu/
Evaluation ofWeb-Based Information Continuously evaluate as you look at “information” on the free Web. The key principles to look for are: • Currency / Timeliness • Authenticity • Objectivity • Completeness and Accuracy • Verifiability Example: Thinking Critically about Web 2.0 and Beyond http://www2.library.ucla.edu/libraries/college/11605_12008.cfm
Staying Current • Subscribe to alerting services for Deep Web resources • Look at reviewing tools • Research Buzz: http://www.researchbuzz.org/wp/ • Search Engine Watch and Search Engine Reporthttp://searchenginewatch.com/
Search Engines Don’t Find Information—People Do! • Use the right combination of tools for the job, including offline (paper) resources • Use the right tools the best way possible • Sometimes a search engine, Deep Web resource or other Web finding tool is not appropriate to the information need A “good” search engine is one that finds what you want.