400 likes | 408 Views
The Invisible Web, also known as the Deep Web or Hidden Web, contains vast amounts of valuable information that is not easily accessible through traditional search engines. Learn about the different types of invisible web content, why search engines can't find it, and discover powerful research tools to navigate and uncover hidden resources.
E N D
A great deal of Web content is invisible to search engines…The Invisible Web… • General-purpose search engines do not have comprehensive coverage of the Web
Also referred to as the “Deep Web” or the“Hidden Web” • Consists largely of content-rich databases from • universities, • libraries, • associations, • businesses and • government
BrightPlanet Study (2010) shows that the • Deep/Invisible Web is approx. 500 times larger than the visible Web and growing faster • 7500 terabytes of info on Deep Web • 19 terabytes on visible Web
Topical databases = 54% Internal Sites = 13% Publications = 11% Shopping / Auctions = 5% Classifieds = 5% Libraries = 2% Yellow & White Pages = 2% Jobs = 1% Message or Chat = 1% General Search = 1% • BrightPlanet -breakdown of Invisible Web content:
Why Search Engines Can’t Find It • Technical and non-technical issues prevent search engines from indexing the Invisible Web • Spiders/crawlers don’t index information stored in databases • Prohibit search engines from searching more often or more deeply • Some content is non-textual – a problem for search engines
Requires registration or login • Fee-based or licensed • Resides on an Intranet • Archives (newspapers) • “Noindex” meta tags
Examining URLs = easiest way to determine if a Web page is invisible • Direct URLs • point to a specific Web page • Ex. www.yahoo.ca or www.tc.gc.ca/en/modes/htm • Crawlers can follow these URLs • Indirect URLs • Don’t point to a specific page. • Contain information to be executed by a script on server • Contain symbols (?) or words (cgi-bin or javascript) • Eg. www.elections.ca/scripts/info/edMap_e.asp?edID=35059&showLink=no
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> • <html xmlns="http://www.w3.org/1999/xhtml"> • <head> • <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> • <title>Welcome to the Temples of God’s Own Country</title> • <META NAME="KEYWORDS" CONTENT="Concept of temples,Temple Complex,Dieties,Rites and Rituals,Temple Customs,Priests,Temple Arts,Classification of Temples,Temple Administration, Offerings,Festivals and an exhaustive directory of district-wise Temples."> • <META NAME="DESCRIPTION" CONTENT="The website gives an overview of Kerala Temples such as Concept of temples,Temple Complex,Dieties,Rites and Rituals,Temple Customs,Priests,Temple Arts,Classification of Temples,Temple Administration, Offerings,Festivals and an exhaustive directory of district-wise Temples. etc. "> • <TITLE>Welcome to the Temples of God’s Own Country</TITLE>
Four Types of Invisibility 1)The Opaque Web 2)The Private Web 3)The Proprietary Web 4)The Truly Invisible Web
The Opaque Web • Files can not be included in search engine indices because of • Disconnected URLs
The Private Web • ‘Index’, have been excluded by the Webmaster • Password protection, • “noindex” meta tag
The Proprietary Web • Sites only accessible to those who register • Fee-based sites
The Truly Invisible Web • Crawlers can’t handle the file formats • Stored in relational databases
Invisible Web Research Tools • Librarians Index to the Internet (look for Databases) • Digital Librarian • Library of Congress Online Catalog • CompletePlanet • Union Institute A-Z Database List • New York Public Library Databases OnlineSearchSystems Public Records Directories • GeniusFind
OAIster, (pronounced "oyster") • LookSmart's Find Articles.com -popular magazines to scholarly journals. • The Library Spot is a collection of databases, online libraries, references, and other good info from the Invisible Web.
Infoplease.com and its searchable Invisible Web databases. World Factbook, a searchable directory of flags of the world, reference maps, country profiles etc.
Lund University Libraries Directory of Open Access Journals, a collection of searchable scientific and scholarly journals on the Invisible Web. • USDA's Plants Database on the Invisible Web. • The Human Genome Database human genome on the Deep Web
The Combined Health Information Database, or CHID online. For human health information The National Database of Nonprofit Organizations is an extensive site on the Invisible Web that not only provides locations and contact information for nonprofits, but also gives detailed fiscal reports. EEVL Xtra, cross-search 20 engineering, mathematics and computing databases, including content from 50 publishers
Deeperweb Mednar