190 likes | 655 Views
content that has been excluded from general purpose search engines and Web directories. ... Another tip is to search using the words web directory and then your topic. ...
E N D
Slide 1:Exploring the Invisible Web
Kevin R. Morgan Ed.D Professor/Designer St Petersburg College
“I readily believe there are more invisible than visible things in the universe.” (Burnett, 1692) motto in Coleridge’s The Rime of the Ancient Mariner Introduction: The Visible and Invisible Web There are vast amounts of information available online, but even more information exists beyond the grasp of the general search engine. There is a much larger universe of invisible information in databases and directories which can’t be accessed by general purpose search engines, but it is nevertheless online, free, and of the highest academic standards.Slide 3:The Information Age and the WWW
The Internet: Network to Knowledge A 21st century library of Alexandria A content of rich and interactive information network Internet- information resources for teaching and learning Utilizing the WWW for research and learning . The World Wide Web is estimated to contain over 3 billion documents. (Barker, 2003) The Invisible Web is estimated to be 2-50 more times bigger than the visible web.
Slide 4:What is the Invisible Web?
content that has been excluded from general purpose search engines and Web directories. examples include databases from universities, libraries, organizations, businesses, and government agencies. The “Invisible Web” is a metaphor used to describe The vast depth or domain of information that lies beyond the visibility of our tools for gathering information. a substantial part of the total Internet. It is not really invisible, just passed over or missed. The Invisible Web includes the following and more:
One study conducted by search company BrightPlanet, estimated that the inaccessible part of the web is about 500 times larger than what search engines already provide access. How Big is the Invisible Web ? More conservative estimates place the Invisible Web at 2-50 times bigger than the visible web. (Barker, 2003) Even using the most conservative estimates, the Invisible Web represents a considerable quantity of information that lies beneath the surface of the Web. It is deeper than we thought! They estimated about 500 billion pages of information available on the web, and only 1/500 of that information could be reached via general search engines. (Sulivan, 2000) I Visible Web 3 billion documents Captured in General Purpose Search Engines Invisible Web 100-500% larger than Visible Web Government Directories Educational Research Scientific Research Eric Library of Congress Institutional Directories Colleges and Universities Organizations Public/Private Specialized Search Engines and Directories 6 billion- 30 billion documents ??? Google, Alta Vista, Look Smart and others general purpose search engines all cover the surface of the Web but are limited in going into the deeper reaches of cyber space. There is an even greater amount of invisible information in databases which can’t be directly accessed by general purpose search engines, but it is never the less online and freely available to the savvy searcher. The Web: Visible and Invisible Information The visible Web is made up of HTML Web pages that search engines have chosen to include in their indices. Search Engines: Robots, Knowbots and Spiders Search Engines do not really search the Web directly. Computer robot programs, referred to sometimes as "crawlers" or "knowledge-bots" or "knowbots" are used by search engines to roam the World Wide Web. Most large search engines operate several robots or spiders all the time. Even so, the Web is so enormous that it can take six months for spiders to cover it, resulting in a degree of "out-of-datedness" in results. (Barker 2003) Spiders or crawlers are programmed to retrieve general information by avoiding unfriendly or dangerous URLs that can trap them in endless loops of information or spider traps. There are certain types of pages that search engine companies routinely exclude by policy to save time and money. Reasons for Invisibility of Some Pages Some pages present technical barriers to web crawlers and are passed over by general browsers for time and efficiency. For example, A spider or crawler will back off when encountering a question mark (?) in a URL. To save time and money, spiders are programmed to avoid or exclude many sites, including educational, Governmental, and organizational databases.Slide 10:Visibility and Invisibility
Visible Web Invisible Web Educator’s Reference Desk ERIC Database The Library of Congress Special Collections a page has a ? in its URL URLs ending in edu, org, gov Institutions and Organizations Internal directories General Search Engines and Subject Gateways It is very difficult to predict what sites will or won't be part of the Invisible Web. As Search Engines change their policies, what is invisible today can become visible tomorrow. Many sites are already hybrid- with both visible and invisible components.
The Value in Using the Invisible Web Invisible Web resources offer the highest level of authority as educational institutions and government organizations maintain a high level of quality control over their information. Specialized search interfaces provide more control over search input and output with increased precision. The search can yield exhaustive results of timely content. Invisible Web databases have the most current information available online as they are updated often. Comprehensive resources allow searchers to perform exhaustive searches within a specific subject area and keep up-to-date and current.Slide 12:Understanding the Invisible Web
The data found in the Invisible Web cannot be accessed easily via general purpose search engines. The Invisible Web is not the sole solution to all one’s information needs. It should be used in conjunction with other informational sources, including general searches. Invisible Web resources clearly identify who is providing the information, making it easy to judge the authority of the content and its provider. Targeted crawlers offer more comprehensive coverage of their subjects than general purpose search engines.
Much of the Invisible Web is made up of the contents of thousands of specialized databases accessible online. Many databases can be found by using the word, database after a subject term, such as “humanities database” or “history database.” Another tip is to search using the words web directory and then your topic. If a directory web page refers to itself using the words "web directory," you will locate it. Have a clear subject in mind to find the best specialized databases for your subject of study or field of research. Finding the Subject Databases and Directories Solution: An easier and more fruitful method for finding databases relating to a specific subject area is to use some of the gateway sites that have already been organized by subject and content. Problem: Many of the databases are password protected. These subject gateways are organized from general and specific, enabling students, educators, and researchers to finding valuable visible and invisible sources on the Internet. Searching through subject databases and web directories may be unfruitful for the novice searcher or student. Many of These independent searches can end in blocked access. Searching Tip: Use Subject GatewaysSlide 15:Educational Gateways
Infomine provides a gateway to scholarly Internet resource collections: http://infomine.ucr.edu/ Academic Info also provides an educational subject directory and subject gateways: http://www.academicinfo.net/ The Educator’s Reference Desk has become the new access gateway to the ERIC databases: http://www.eduref.org/ The Alliance for Life-Long Learning offers online classes from Stanford, Yale, and Oxford Universities and provides a library of online resources through its Academic Subject directories that meet the highest academic standards: http://www.alllearn.org/er/directories.cgi
Slide 16:General Purpose Subject Gateways
Use the Invisible Web Directory from Sherman and Price’s companion site to The Invisible Web: http://www.invisible-web.net/ See this multi-subject guide to specialized search engines: http://www.searchability.com/ Explore CompletePlanet to link to over 103,000 searchable databases and specialty search engines : http://www.completeplanet.com/
Slide 17:Evaluating Invisible Web Resources
The Librarians Index to the Internet provides an annotated directory with cross-reference links to both visible and invisible content: http://lii.org/ ResearchBuzz provides daily updates on search engines, new software, browser technology Web directories and databases: http://researchbuzz.com The Scout Report provide academics, researchers, librarians, and the K-12 community with valuable online information: http://scout.cs.wisc.edu/index.php The Internet Resources Newsletter is a monthly newsletter for academics, students, scientists, and social scientists: http://www.hw.ac.uk/libWWW/irn/irn.html
References Barker, (2003) “Recommended Search Engines: Table of Features” UC Berkley http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/SearchEngines.html Sherman & Price, (2001) The Invisible Web: Uncovering Information Sources Search Engines Can’t Find. CyberAge Sullivan, (2000) “Invisible Web Gets Deeper”, The Search Engine Report, August 2000. http://searchenginewatch.com/sereport/article.php/2162871Slide 19:Exploring the Invisible Web
morgank@spcollege.edu Contact Information Dr. Kevin R. Morgan St. Petersburg College: eCampus Seminole, Florida