1 / 15

Characteristics of Web Searching

Intelligent Meta-Search and Clustering Technology http://tamas.nlm.nih.gov/metasearch/ http://toxseek.nlm.nih.gov Tamas Doszkocs, Ph.D. Computer Scientist National Library of Medicine doszkocs@nlm.nih.gov. Characteristics of Web Searching.

terah
Download Presentation

Characteristics of Web Searching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intelligent Meta-Search and Clustering Technologyhttp://tamas.nlm.nih.gov/metasearch/http://toxseek.nlm.nih.govTamas Doszkocs, Ph.D.Computer ScientistNational Library of Medicinedoszkocs@nlm.nih.gov

  2. Characteristics of Web Searching • Content is created by diverse organizations and individuals • Information on the Web is inherently heterogeneous • Content is distributed on multiple servers in multiple locations and multiple formats and languagesaimed for diverse audiences and purposes (In its April 2005 survey NetCraft received responses from 62,286,451 web sites) • The “Open Web” of billions of static Web pages is indexed and searched via multiple search engines and directories

  3. Problems in Web Searching • Even the largest of the current search engines index only a fraction of all Web pages (The WayBackMacine of Internet Archive hasindexed 40 billion pages, Google about 8.1 billion, Yahoo about 20.8 billion -- August 2005) • The not so “Hidden Web” of content databases (e.g. PubMed, Web of Science) is estimated to be thousands of times larger than the Open Web. • Both the Open Web and the Hidden Web are characterized by problems of information coverage, quality, overload, relevancy, currency and completeness, as well as inherent language ambiguity and incompatible user interfaces

  4. Meta-Searching • Meta-Search Engines may simultaneously search multiple Open Web and Hidden Web sites in order to increase content coverage, precision, relevance and/or search efficiency and effectiveness.

  5. Overlap Among 3 Major Search Engineshttp://missingpieces.dogpile.com/whitepaper.pdfhttp://comparesearchengines.dogpile.com/OverlapAnalysis.pdf

  6. Overlap Among AskJeeves, Google, MSN and YahooGoogle Isn’t Everything!http://www.forbes.com/business/free_forbes/2005/0815/056.html?partner=yahoomag

  7. First Generation Second Generation Third Generation Next Generation “Broadcast” or “Federated” search List of results Merging and Ranking Increased coverage Result Clustering Focused drill-down Dynamic Query Mods Semantic and Pragmatic Intelligence tamas.nlm.nih.gov/metasearch/ toxseek.nlm.nih.gov http://bestmeta.com Generations of Meta-Search Engines

  8. Moving Targets:Nine Search Engines ComparedBy Ben Patterson (May 9, 2005)http://reviews.cnet.com/4520-10572_7-6219242-2.html?tag=txt

  9. Moving Targetsand the need forAutomatic Change Detection and MonitoringandIntegrating New Capabilities

  10. The ToxSeek Meta-Search and ClusteringProject • Goals: • Integrate best practices Information Retrieval and Natural Language Processing techniques with AI heuristics to create an advanced general purposemeta-search, result clustering and knowledge discovery tool • Apply ToxSeek to efficiently access diverse biomedical and environmental health information resources • Createspecialized applications for accessing quality information sources on HIV/AIDS, consumer health, homeland security, public health law, library research and other applications

  11. ToxSeek Features • Integrates multiple spellcheckers and sophisticated lexical, morphologic, syntactic and semantic resources • Merges and ranks the results from heterogeneous information sources • Employs efficient Natural Language Phrase Parser and AI heuristics to automatically identify Key Concepts and their Associations in queries and retrieved documents • Uses the automatically identified Key Concepts and Associations to create topicalResult Clusters • Supports focused multi-concept drill-down, dynamic query refinement, multi-media and limited question answering

  12. ToxSeek Implementation • Production applications and research prototypes have been implemented for meta-searching diverse content on: • Toxicology and Environmental Health • Consumer Health • Library Catalogs and Proprietary Databases • HIV/AIDS • BioDefense • Homeland Security • “Shift Happens…” • http://library.nps.navy.mil/home/staff/gmarlatt/HSDL%20ALI%20April%202005%20%20final%20rev%207%20april.ppt

  13. ToxSeek Web Search Query: “terrorism”

  14. ToxSeek Query: “police state”

  15. Win the Search Engine Wars with Intelligent Meta-Search and Clustering Technologyhttp://tamas.nlm.nih.gov/metasearch/http://toxseek.nlm.nih.govTamas Doszkocs, Ph.D.Computer ScientistNational Library of Medicinedoszkocs@nlm.nih.gov

More Related