1 / 44

Swetswise Searcher

Swetswise Searcher. Powered by Explorit Research Accelerator. By Abe Lederman President and CTO Copenhagen, Denmark 11 June 2012. About Deep Web Technologies. Founded by Abe Lederman in 2002 A co-founder of Verity, acquired by Autonomy BS & MS Degrees in Computer Science from MIT

hugh
Download Presentation

Swetswise Searcher

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Swetswise Searcher Powered by Explorit Research Accelerator • By Abe Lederman • President and CTO • Copenhagen, Denmark • 11 June 2012

  2. About Deep Web Technologies... • Founded by Abe Lederman in 2002 • A co-founder of Verity, acquired by Autonomy • BS & MS Degrees in Computer Science from MIT • 25 years experience in Information Retrieval • 20 person company based in Santa Fe, New Mexico • Over $5M in DOE SBIR Grants (2002-2011) • Pioneer/trailblazer in federated search

  3. Government: Defense Technical Info Center (DTIC) Office of Sci. & Tech. Info (DOE-OSTI) UNECA European Space Agency Corporate: Boeing BASF Intel HP P&G Customers Include... • Academic: • Stanford University • George Mason University • Texas Medical Center • University College of Cork • Tennessee Community College Consortia • Public Portals: • WorldWideScience.org • Science.gov • Biznar • Mednar • ScienceResearch.com

  4. What is the Deep Web? The Deep Web is a collection of internet information sources that are generally not accessible to web spiders or crawlers and can not, therefore, be indexed for search by popular search engines such as Google, Yahoo! or Bing (the Surface Web). It is estimated that there is more than 500 times more content in the Deep Web than the Surface Web.

  5. What is “Federated Search”? “Federated Search is an application or service that allows users to submit a real-time search in parallel to multiple, distributed information sources and retrieve aggregated, ranked and de-duplicated results.”

  6. One Search, Many Sources OPACs Blogs Subscription Sources eBooks Wikis Enter Your Search… Begin Search Internal Databases Public Web Sources Journals

  7. Why Federated Search? 4 Big Reasons… • Provides greater efficiency than searching sources one by one • Returns the most current information because sources are searched in real-time • Eliminates learning disparate publisher interfaces • Simplifies discovery of the most relevant results

  8. Best Science-Focused Engines Science.gov WorldWideScience.org ScienceResearch.com ScienceAccelerator Scitopia.org 5 of 9 created by DWT

  9. Science.gov (2002)

  10. WorldWideScience.org (2007)

  11. Science Accelerator (2006)

  12. ScienceResearch.com (2005)

  13. Scitopia.org (2007-2011)

  14. Presentation available at: www.deepwebtech.com/ala2011.ppt

  15. Federated Search Has Gotten a Bad Reputation • It is too slow • Connectors break • Brings back too few results from each source • Brings back too many results • Unable to rank results well (meta-data differences, lack of info)

  16. SW Searcher vs. Discovery Services

  17. Drawbacks of Discovery Services • Lack of transparency of what’s in Service • Incomplete coverage of publisher content • Lag between when content appears on publisher site and when available on Discovery Service • Normalized metadata loses content source-specific metadata • Content in Service limited by relationships, content of general interest

  18. Landscape is Not So Clear • Summon (ProQuest) • Discovery Service • EDS (EBSCO) • Discovery Service + Federated Search • WorldCat Local (OCLC) • Discovery Service + Federated Search • Primo (Ex Libris) • Discovery Service + Federated Search • Encore Synergy (Innovative Interfaces) • Limited Discovery Service + Federated Search • Explorit (Deep Web Technologies) • Federated Search

  19. When Should You Choose Federated Search? • Access to up-to-date information is important. • You want control of your sources. • You want to search internal/non-mainstream sources • Your research is specialized (ex. Medical and legal) • You have a wide range of subscribed content (ex. EBSCO and ProQuest)

  20. Partners since January 2010

  21. Major Advantages of SwetsWise Searcher • Rich, easy-to-use interface • Incremental display of results • Sophisticated connector technology • Retrieve 50-100 results or more per source • Relevance ranking • Smart clustering • Alerts and Search Builder • Metrics

  22. Easy-to-use Interface Simple Search Box • One-Search, “Google-like” box • Can be embedded in your home page, blog or intranet.

  23. AND, OR, NOT Advanced Search Page • Unlimited categories (sources can be in multiple categories) • Select sources to search • One or Two columns • Fielded Searching • Boolean Searching

  24. Incremental Results

  25. Connectors: Think “Connections” Connectors make it possible to talk to other data sources • Each source is unique so connectors “normalize” a query • Submit proper authentication to sources • Extract the right results • Parse results to display the data

  26. Connector Monitoring • Proactively monitor connectors • Monitor: source health, speed, responsiveness and errors • Evaluated by dedicated software maintenance engineers • Generally errors are discovered by our team before users ever notice a problem

  27. Relevance Ranking • Occurance of search terms within titles & snippets • Assigning weight to sources • More current reults are assigned greater weight Read: “Ranking: The Secret Sauce for Searching the Deep Web”

  28. Clustering • Real-time semantic analysis of results creates clusters on-the-fly. • Discover relationships behind the results, not just “keywords.” Read: “Clusters That Think”

  29. Alerts • Delivery online or via email • Daily, Weekly, Monthly • Pick and choose your sources • Export to RSS reader • Maintain database of past results

  30. Search Builder • Create search pages easily • Choose collections and search fields • Integrates with Course Management Software • Embed search box using built-in widget

  31. SwetsWise Searcher Metrics • Graphics-based or tabular • Single day (hourly breakdown) or entire month • Downloadable to spreadsheet • Reports include: • Number of queries run • Number of results retrieved per source • Average time to retrieve results from a source • Average rank of results retrieved per source • Timeouts/errors by source • Searches run (query strings) • Clickthrough stats

  32. Hosted vs. Installed Solutions Hosted Installed

  33. Multilingual WorldWideScience.org

  34. WorldWideScience.org is an Excellent Candidate for Multilingual Search • A global gateway to international science databases and portals • All content is from national governments or vetted by national governments • Developed in partnership with the DOE Office of Scientific and Technical Information (OSTI), WWS Alliance and Microsoft Research • One-stop searching • Includes databases from China, Japan, Korea, Germany, and other non-English countries

  35. How Multilingual Federated Search Works Results in source’s language Foreign language search engines German Chinese Russian Query in source’s language Results returned to user Ranking Microsoft Translator Query to be translated for each source Ranked results translated by Microsoft to user’s language EXPLORIT Ranked results in user’s language Query in user’s language

  36. Coming in the Fall • Visualization • Full-Faceted Navigation • Mendeley Integration • Document Type and Document Format Clusters • Full Text Filter

  37. Visualization Using our clustering technology, results visualization allows users to see relationships between topics easily.

  38. Mendeley

  39. Document Type and Document Format Clusters

  40. Full Text Filter Access Full Text!

  41. Future - Mobile Searching

  42. Thank you! Abe Lederman abe@deepwebtech.com

More Related