1 / 12

Federated & Meta Search

Federated & Meta Search. What are they? Environment Library (institutional), Everywhere (Web) Content Web, Databases, Catalogs (books), (numerical) data Users Researchers, Students, Academics, Anyone How are they used? Comparing results Widest possible information set for retrieval.

misty
Download Presentation

Federated & Meta Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Federated & Meta Search • What are they? • Environment • Library (institutional), Everywhere (Web) • Content • Web, Databases, Catalogs (books), (numerical) data • Users • Researchers, Students, Academics, Anyone • How are they used? • Comparing results • Widest possible information set for retrieval

  2. Do you use Metasearch? • For research? • Research papers • General information seeking • When shopping? • Trips • Books • Technical support (help)? What else?

  3. What are Digital Libraries? • What’s not a digital library? • The Web, Lexis-Nexis, UTNetCAT, ACM DigLib, YouTube, Amazon.com, your laptop’s hard drive? • Users think they’re content • Librarians think they’re institutions & services • Are they digital content only? • An easier, digital way to find physical content or help? • “Content, collections & communities” • How do all of these fit together for Info Retrieval? • Organizing everything for effective retrieval seems to be the key challenge • Making everything (possible) searchable is the key feature for users. • Metasearch is the key to Digital Libraries

  4. Digital Library = Virtual Library? • Freely available Web content is a pretty good digital library • Your own content is a good library (for re-finding content) • Databases & Indexes are traditional library content. Now more digital • Should it matter where the content is? • Costs? • Findability? • Scalability?

  5. Federated Search • Everything is accessible • Legal issues & pricing is coordinated • Clustering & redundant information is processed accordingly (cheapest first?) • Query syntax is universal & transformed for each dataset • Databases, catalogs & text • Relevancy is weighted & precise • Multiple vendors & open access sources • A balance? How “deep” in the deep Web?

  6. Web Dynamics & Metasearch • Different documents have many different characteristics • Web documents vs. other types of content • Links, Metadata, Genre, Dynamically changing • How well is the Web indexed? • In terms of completeness? 60%? • Metasearch is an index of the indices • Parallel queries are not always the same • Special purpose search engines a better idea? • Google Scholar vs. Google • Is Personalized (meta) search the answer? • Special purpose is your purpose • Relevance, ranking & importance • Pricing, availability, locality

  7. Categorizing Web search results • The interface on metasearch may be more important to users than the content • Understanding results over finding (all) content • Show results in context - use categories • Understanding searches • Building a taxonomy for results • Customized for each result set? • Show when there aren’t any results • When results don’t rank high enough • Do we need more overviews for results? • Visualization for clustering

  8. Category Building for Search • How deep, shallow, lean or rich should categories be? • Should the content be the main criteria for categories? • Host, links, user perspective, genre? • What features of content should be used to cluster results? • For a metasearch?

  9. Fast-feature categorization • Online lean techniques • DNS, time visited, format, language, size, index date • Online rich techniques • Fit to existing categories such as ODP, Yahoo!, Music, Gov, Inventory • Offline techniques • Directory hierarchy • Query probing • Results, pages, words, (category) nodes, depth & type of hierarchy • Understanding the content is critical

  10. Yahoo! Cataloging the Web • A non-automated, technique • How do information professionals build an “index” of the Web? • Cataloging applies to the Web • Indexing with synonyms • Browsing indexes vs searching them • Comprehensive index not the goal • Quality • Information Density • Yahoo’s own ontology – points to site for full info • Subject Trees with aliases (@) to other locations • “More like this” comparisons as checksums

  11. Yahoo uses tools for indexing

  12. More metasearch tools • Scroogle • Thumshots.org Ranking • Jux2 • Search Engine Relationship Chart

More Related