Open Search

Open Search David Wolber

Overview • Proliferation of Digital Libraries • Metasearch and Fixed Lists of Sources • Open Search Architecture • PublishMe for P2P knowledge Sharing • Webtop Metasearch Clients

Contributors • Michael Kepe • Igor Ranitovic • Iman Sadreddin • Senior Team ’03 • Ken Chong • Rudd Stevens • Colin Bean • Tim Chan • Julian Chan • Pooja Garg

Information Source Explosion • Google, Amazon APIs • Internet Archive • Technorati– The World Live Web • Domain Specific: • ACM Digital Library for CS • Lexis-Nexis for law • MLA for literature

Nth Degree 2nd Degree 1st Degree PersonalWeb End-User Created Digital Libraries • Personal Web (shared Google desktop) • Personal Web Neighborhood • Topic-Specific Personal Crawlers • Ordinary people creating search engines as easily as web pages

Subsets of the Web

Motivation for Small, Independent Subsets of the Web • Avoid information being channeled through a single portal: Googleopoly • Google does no evil, but… • Censorship in China • Creeping level of commercialization • Unregulated manipulation of secret ranking algorithms (see PageKing case) • Other media is lost, this is the last frontier

Little support for using multiple search engines

Metasearch • Help users discover and use digital libraries • Send queries to multiple, selected search engines • filter, process, and unify results • A9.com – Amazon’s metasearch

Web Services Basis html server Web Page Model html server software xml server Web Service Model

How does metasearch evolve? New Digital library

How does metasearch evolve? New Digital library Metasearch clients discover it

How does metasearch evolve? New Digital library Metasearch clients discover it Metasearch Programmers write adaptor/scraper

How does metasearch evolve? New Digital library Metasearch clients discover Metasearch Programmers write adaptor/scraper User can access within metasearch SLOWLY…

Goal: Automate the Process • Metasearch engines should provide users with up-to-date lists of existing digital libraries • Digital libraries should be able to register and be made immediately available to all Metasearch clients. • Metasearch and Library development is independent.

What is Necessary? • Standard Search API • So Metasearch clients can use polymorphism to access sources. for each source s in sourceList { searchEngine.endPointUrl = s.endPointUrl; resultList += searchEngine.keywordSearch(keywords)} • Search API Registry • Metasearch clients can get dynamic list

Web Service Standards • WSDL – Web Service Description Language • SOAP – Simple Object Access Protocol • UDDI – Universal Description, Discovery, and Integration

Standards on top of Web Services • WSDL, SOAP, UDDI basis for standards in many domains. • e.g., MS initiated for securities information providers • Businesses agree on a standard, then client applications can use polymorphism and new businesses can register services. • In this case, we want cross-domain standard.

Open Search Architecture • Open Search Protocol (OSP) • Cross-Domain: Search-related services • Not just keyword search, but citations, authorOf, etc. • Open Search Registry • Based on UDDI • Can add customization, e.g., parsing to find out which search operations are implemented. • Web and web service access

Open Search Architecture OSP metasearch clients source list OS Registry Register service OSP-Conforming Libraries

User Can Choose Sources

Open Search Protocol • Keyword search • Citations (inward links, outward links) • AuthorOf and other associative operations… • Metadata object results based on Dublin Core • Restriction object for “advanced search” stuff

Publishing a Library • Access OSP WSDL Specification from webtop.cs.usfca.edu • Generate code in language of choice • Implement the search operations for the digital library • Deploy the service • Register with Open Search registry

Deploying an Open Search Lib. Library server 4. deployed service Open Search information Registry 1. OS wsdl programmer 2.wsdl 5. registration info 3. skeleton code wsdl2java

Wrapping a Library Custom search API, e.g., Google API 2. Custom query 3. Custom Result Open Search Wrapper Located on 3rd party server 1. OSP Query 4. OSP Result Metasearch Client

Wrappers Developed at USF • Google • Amazon (sort of) • Internet Archive • Technorati • Feedster

PublishMe • Like Google Desktop, but shared. • Periodically updates inverse index and linkbase on PC • Deploys Web Service on User’s PC • Auto-Registers with Open Search Registry

Metasearch with P2P Knowledge Sharing WEBTOP

Integrating Global and Personal Libraries

Motivation for Sharing Personal Webs People create knowledge everyday when they bookmark, annotate, link, organize, and synthesize. Communication is a separate step which often doesn’t happen

Motivation for Sharing Personal Webs Collaborative Work Experts

Computers are designed using our brains for a model • Knowledge creation and dissemination separate • Explicit effort required to communicate • Just as we model our word processors on paper.

Additions to OSP for P2P • GetFile • OnLine(ip) • Handles user starting up • Dynamic IPs • OffLine

But What About PRIVACY? The Big Question: How much of the information hidden within your personal web is hidden due to privacy concerns?

I Want you to be a Search Engine!

Overview • Proliferation of Digital Libraries • Metasearch and Fixed Lists of Sources • Open Search Architecture • PublishMe for P2P knowledge Sharing • Metasearch Clients

Goal: Implement Vannevar Bush’s Association Trails View a document/thing in context History of an idea

Thinkmap-like Interface

Association Types • Outward links • Inward links • Similar-Content links • People Links • author, people referenced in paper • Domain-Specific links • law citations • movie-actor • Associations specified by Annotators

Webtop Tree View webtop.cs.usfca.edu

Expanding a Tree • Bird’s Eye View • Local/Web files integrated • Follow different Associative Trails • Ins of Outs of Ins, etc. • Siblings • Weird though, as ins and outs both expand right

Webtop Side Panel View

Project Status Too many bugs, Dad

Future Work • Open Search Protocol • In-depth study of existing search APIs • Provide Rest alternative to SOAP • Metasearch development • Complete and refine existing clients • Dream up new ones • Thinkmap Graph • Automated Source Selection and Reputation System • Page Ranking • Initiate grass-roots involvement

Future Work: Documents and Things resourceassociationsannotations document person creative work html word pdf film book

Stop talking about Webtop daddy! webtop.cs.usfca.edu

Open Search

Open Search

Presentation Transcript

Open Content, Open Courses, Open Degrees

Open Open Value Open Value Subscription

Introduction to Open Source Search with Apache Lucene and Solr

1. Open the online catalog & search with keywords. 2. Click the Keyword button to search.

What Is Open Search?

The Power of Prefix Search (with a nice open problem)

Open-Source Search Engines and Lucene/Solr

SEARCH Open Science mtg., October 03

In Search of Funding: Providing Open Access to Secondary Discourses

CoxR: Open Source Development History Search System

________________________________________________ SEARCH Open Science Meeting Seattle, WA

LibX : the open source search and discovery tool

The Nutch Open-Source Search Engine

Open Standards Open Source Open Data

Web Open Lab: Search Engine Optimization

Search Algorithms Sequential Search (Linear Search) Binary Search

Job Search Website Script | Recruitment Software | Open Source Recruitment Software

Open-Source Search Engines and Lucene/Solr

Search form Search

Open Meetings & Public Records Laws: Search Committee Tips

Open Source Open Standards

Search Consultancy | Search Recruitment | Search Agency

Open Search

Open Search

Presentation Transcript

Open Content, Open Courses, Open Degrees

Open Open Value Open Value Subscription

Introduction to Open Source Search with Apache Lucene and Solr

1. Open the online catalog &amp; search with keywords. 2. Click the Keyword button to search.

What Is Open Search?

The Power of Prefix Search (with a nice open problem)

Open-Source Search Engines and Lucene/Solr

SEARCH Open Science mtg., October 03

In Search of Funding: Providing Open Access to Secondary Discourses

CoxR: Open Source Development History Search System

________________________________________________ SEARCH Open Science Meeting Seattle, WA

LibX : the open source search and discovery tool

The Nutch Open-Source Search Engine

Open Standards Open Source Open Data

Web Open Lab: Search Engine Optimization

Search Algorithms Sequential Search (Linear Search) Binary Search

Job Search Website Script | Recruitment Software | Open Source Recruitment Software

Open-Source Search Engines and Lucene/Solr

Search form Search

Open Meetings &amp; Public Records Laws: Search Committee Tips

Open Source Open Standards

Search Consultancy | Search Recruitment | Search Agency

1. Open the online catalog & search with keywords. 2. Click the Keyword button to search.

Open Meetings & Public Records Laws: Search Committee Tips