1 / 30

The Invisible Web

The Invisible Web. Definition Searching. The Invisible Web. Also called: deep content hidden internet dark matter. The Invisible Web. The vast number of pages that search engines cannot or will not index

jaguar
Download Presentation

The Invisible Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Invisible Web Definition Searching

  2. The Invisible Web Also called: • deep content • hidden internet • dark matter

  3. The Invisible Web The vast number of pages that search engines cannot or will not index • Restricted: login, password (such as intranets, databases; private, proprietary) • Sites not linked from anywhere (undiscovered) • Sites that use a robots.txt file to keep files off limits from spiders • Unsearchable or un-indexable file formats • Non-static - searchable databases that only produce results dynamically in response to a specific search request (such as CGI, ASP, CFM) • Real-time data – changes rapidly – too “fresh” • Sites that are too “deep”

  4. The Invisible Web • Search engines often avoid indexing web pages that are delivered dynamically, such as via database programs: • Often, the search engine may not like the URL used in order to retrieve the document. Many dynamic delivery mechanisms make use of the ? symbol. • For example, a page may be found this way: http://www.website.com/cgi-bin/getpage.cgi?name=sitemap • Most search engines will not read past the ? in that URL.

  5. The Invisible Web Invisible Web sources tend to be: • More current • More comprehensive • Searchable (however, not by SE’s) • More specific/targeted • Deeper breadth • Often better quality

  6. The Invisible Web Top types of “invisible” information • News • RSS • Blogs • Public company filings, stock prices • Customized maps and directions • Clinical trials • Telephone numbers and addresses, postal codes • Definitions • Job postings • Grant information • Statistics • Weather • Museum, gallery, and library holdings

  7. Finding the “Dark Matter” • Search Engines • Specialized Search Engines • Directories • Vortals

  8. Traditional Search Engines Traditional Search Engines incorporation of “Invisible” Databases • Weather • Maps • Phone directories • Catalogs • Stock prices

  9. Traditional Search Engines Unless specially, programmed, though, spiders can’t find all the valuable resources available

  10. Specialized Search Engines Search deeper into sites: • Go beyond top page, or homepage • Choose sources to spider—topical sites only • “Smart” ranking and indexing based on knowledge of the specific subject

  11. Specialized Search Engines There are hundreds of specialized search engines for almost every topic- • Search Engine Guide • Specialty Search Engines

  12. Directories • Collections of pre-screened web-sites into categories based on a controlled ontology • Ontology: classification of human knowledge into topics, similar to traditional library catalogs

  13. Directories • Closed Model: paid editors; quality control (LookSmart, Yahoo) • Open Model: volunteer editors; (Open Directory Project, Google)

  14. Directories • Easier access to relevant results • Faster • Access to materials not always indexed by search engines—content in databases or file types not searched by spiders

  15. Directories Issues with directories: • Inherently small • Unseen editorial policies • May charge for listing • Lopsided coverage • Timeliness--Harder to keep updated

  16. Search

  17. Vortals • Vortals: vertical-portal. Instead of being a horizontal, all-inclusive entry point into the Web, they are vertical, specialized entry points. • Comprehensive sites focusing on gathering and providing links to the best resources in a specific topic. • Usually are combined subject-specific search engines and subject-specific directories • Also called “focused crawlers”; metasites; guru; authority; industry guide; subject directory site

  18. Vortals Advantages – best of directories and subject specific search engines • More up-to-date - crawl subject specific pages more often • Deeper crawl - gets more of the content on each server • More precision, less recall

  19. Searching the Invisible Web How do you find these sites? • Use directories known directories to find invisible web searching and browsing tools: • Librarians’ Index to the Internet • Open Directory • Google Directory • Teoma works well, too.

  20. Searching the Invisible Web Rethink your search: • Think key terms specific details – macro vs. micro • Example you want to find the melting point of hydrogen peroxide. On the general web, you’d put in the key words melting, point, and “hydrogen peroxide” On the invisible web, you look for chemical databases, which included melting points as one feature of the database, once in the database, then you’d search for hydrogen peroxide

  21. Searching the Invisible Web Remember some concepts are assumed • Do not use the subject a search term • Example: If you are looking for information on gender inequity in math education, exclude terms like education from your search in AskERIC, an education specific search tool

  22. Mining the Invisible Web • Tips: Certain kinds of sites can prove to be clearinghouses of information: • Government - statistics of all kinds • Professional organizations - archives of relevant research and statistics • Media sites (TV and Radio) – transcripts and speeches • College and university professor sites – lectures and personal publications

  23. Mining the Invisible Web • Look for library guides and commercial portals for more guidance in finding the hidden, valuable content available for free on the Web (more on this in the next lesson): • My Ready Reference on the Web Resource

More Related