160 likes | 315 Views
Intelligent Meta-Search and Clustering Technology http://tamas.nlm.nih.gov/metasearch/ http://toxseek.nlm.nih.gov Tamas Doszkocs, Ph.D. Computer Scientist National Library of Medicine doszkocs@nlm.nih.gov. Characteristics of Web Searching.
E N D
Intelligent Meta-Search and Clustering Technologyhttp://tamas.nlm.nih.gov/metasearch/http://toxseek.nlm.nih.govTamas Doszkocs, Ph.D.Computer ScientistNational Library of Medicinedoszkocs@nlm.nih.gov
Characteristics of Web Searching • Content is created by diverse organizations and individuals • Information on the Web is inherently heterogeneous • Content is distributed on multiple servers in multiple locations and multiple formats and languagesaimed for diverse audiences and purposes (In its April 2005 survey NetCraft received responses from 62,286,451 web sites) • The “Open Web” of billions of static Web pages is indexed and searched via multiple search engines and directories
Problems in Web Searching • Even the largest of the current search engines index only a fraction of all Web pages (The WayBackMacine of Internet Archive hasindexed 40 billion pages, Google about 8.1 billion, Yahoo about 20.8 billion -- August 2005) • The not so “Hidden Web” of content databases (e.g. PubMed, Web of Science) is estimated to be thousands of times larger than the Open Web. • Both the Open Web and the Hidden Web are characterized by problems of information coverage, quality, overload, relevancy, currency and completeness, as well as inherent language ambiguity and incompatible user interfaces
Meta-Searching • Meta-Search Engines may simultaneously search multiple Open Web and Hidden Web sites in order to increase content coverage, precision, relevance and/or search efficiency and effectiveness.
Overlap Among 3 Major Search Engineshttp://missingpieces.dogpile.com/whitepaper.pdfhttp://comparesearchengines.dogpile.com/OverlapAnalysis.pdf
Overlap Among AskJeeves, Google, MSN and YahooGoogle Isn’t Everything!http://www.forbes.com/business/free_forbes/2005/0815/056.html?partner=yahoomag
First Generation Second Generation Third Generation Next Generation “Broadcast” or “Federated” search List of results Merging and Ranking Increased coverage Result Clustering Focused drill-down Dynamic Query Mods Semantic and Pragmatic Intelligence tamas.nlm.nih.gov/metasearch/ toxseek.nlm.nih.gov http://bestmeta.com Generations of Meta-Search Engines
Moving Targets:Nine Search Engines ComparedBy Ben Patterson (May 9, 2005)http://reviews.cnet.com/4520-10572_7-6219242-2.html?tag=txt
Moving Targetsand the need forAutomatic Change Detection and MonitoringandIntegrating New Capabilities
The ToxSeek Meta-Search and ClusteringProject • Goals: • Integrate best practices Information Retrieval and Natural Language Processing techniques with AI heuristics to create an advanced general purposemeta-search, result clustering and knowledge discovery tool • Apply ToxSeek to efficiently access diverse biomedical and environmental health information resources • Createspecialized applications for accessing quality information sources on HIV/AIDS, consumer health, homeland security, public health law, library research and other applications
ToxSeek Features • Integrates multiple spellcheckers and sophisticated lexical, morphologic, syntactic and semantic resources • Merges and ranks the results from heterogeneous information sources • Employs efficient Natural Language Phrase Parser and AI heuristics to automatically identify Key Concepts and their Associations in queries and retrieved documents • Uses the automatically identified Key Concepts and Associations to create topicalResult Clusters • Supports focused multi-concept drill-down, dynamic query refinement, multi-media and limited question answering
ToxSeek Implementation • Production applications and research prototypes have been implemented for meta-searching diverse content on: • Toxicology and Environmental Health • Consumer Health • Library Catalogs and Proprietary Databases • HIV/AIDS • BioDefense • Homeland Security • “Shift Happens…” • http://library.nps.navy.mil/home/staff/gmarlatt/HSDL%20ALI%20April%202005%20%20final%20rev%207%20april.ppt
Win the Search Engine Wars with Intelligent Meta-Search and Clustering Technologyhttp://tamas.nlm.nih.gov/metasearch/http://toxseek.nlm.nih.govTamas Doszkocs, Ph.D.Computer ScientistNational Library of Medicinedoszkocs@nlm.nih.gov