1 / 40

Surfing the Invisible Web

The Invisible Web, also known as the Deep Web or Hidden Web, contains vast amounts of valuable information that is not easily accessible through traditional search engines. Learn about the different types of invisible web content, why search engines can't find it, and discover powerful research tools to navigate and uncover hidden resources.

cavender
Download Presentation

Surfing the Invisible Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Surfing the Invisible Web

  2. A great deal of Web content is invisible to search engines…The Invisible Web… • General-purpose search engines do not have comprehensive coverage of the Web

  3. Also referred to as the “Deep Web” or the“Hidden Web” • Consists largely of content-rich databases from • universities, • libraries, • associations, • businesses and • government

  4. BrightPlanet Study (2010) shows that the • Deep/Invisible Web is approx. 500 times larger than the visible Web and growing faster • 7500 terabytes of info on Deep Web • 19 terabytes on visible Web

  5. Topical databases = 54% Internal Sites = 13% Publications = 11% Shopping / Auctions = 5% Classifieds = 5% Libraries = 2% Yellow & White Pages = 2% Jobs = 1% Message or Chat = 1% General Search = 1% • BrightPlanet -breakdown of Invisible Web content:

  6. Why Search Engines Can’t Find It • Technical and non-technical issues prevent search engines from indexing the Invisible Web • Spiders/crawlers don’t index information stored in databases • Prohibit search engines from searching more often or more deeply • Some content is non-textual – a problem for search engines

  7. Requires registration or login • Fee-based or licensed • Resides on an Intranet • Archives (newspapers) • “Noindex” meta tags

  8. Examining URLs = easiest way to determine if a Web page is invisible • Direct URLs • point to a specific Web page • Ex. www.yahoo.ca or www.tc.gc.ca/en/modes/htm • Crawlers can follow these URLs • Indirect URLs • Don’t point to a specific page. • Contain information to be executed by a script on server • Contain symbols (?) or words (cgi-bin or javascript) • Eg. www.elections.ca/scripts/info/edMap_e.asp?edID=35059&showLink=no

  9. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> • <html xmlns="http://www.w3.org/1999/xhtml"> • <head> • <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> • <title>Welcome to the Temples of God’s Own Country</title> • <META NAME="KEYWORDS" CONTENT="Concept of temples,Temple Complex,Dieties,Rites and Rituals,Temple Customs,Priests,Temple Arts,Classification of Temples,Temple Administration, Offerings,Festivals and an exhaustive directory of district-wise Temples."> • <META NAME="DESCRIPTION" CONTENT="The website gives an overview of Kerala Temples such as Concept of temples,Temple Complex,Dieties,Rites and Rituals,Temple Customs,Priests,Temple Arts,Classification of Temples,Temple Administration, Offerings,Festivals and an exhaustive directory of district-wise Temples. etc. "> • <TITLE>Welcome to the Temples of God’s Own Country</TITLE>

  10. Four Types of Invisibility 1)The Opaque Web 2)The Private Web 3)The Proprietary Web 4)The Truly Invisible Web

  11. The Opaque Web • Files can not be included in search engine indices because of • Disconnected URLs

  12. The Private Web • ‘Index’, have been excluded by the Webmaster • Password protection, • “noindex” meta tag

  13. The Proprietary Web • Sites only accessible to those who register • Fee-based sites

  14. The Truly Invisible Web • Crawlers can’t handle the file formats • Stored in relational databases

  15. Invisible Web Research Tools • Librarians Index to the Internet (look for Databases) • Digital Librarian • Library of Congress Online Catalog • CompletePlanet • Union Institute A-Z Database List • New York Public Library Databases OnlineSearchSystems Public Records Directories • GeniusFind

  16. OAIster, (pronounced "oyster") • LookSmart's Find Articles.com -popular magazines to scholarly journals. • The Library Spot is a collection of databases, online libraries, references, and other good info from the Invisible Web.

  17. Infoplease.com and its searchable Invisible Web databases. World Factbook, a searchable directory of flags of the world, reference maps, country profiles etc.

  18. Lund University Libraries Directory of Open Access Journals, a collection of searchable scientific and scholarly journals on the Invisible Web. • USDA's Plants Database on the Invisible Web. • The Human Genome Database human genome on the Deep Web

  19. The Combined Health Information Database, or CHID online. For human health information The National Database of Nonprofit Organizations is an extensive site on the Invisible Web that not only provides locations and contact information for nonprofits, but also gives detailed fiscal reports. EEVL Xtra, cross-search 20 engineering, mathematics and computing databases, including content from 50 publishers

  20. Deeperweb Mednar

More Related