220 likes | 339 Views
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management. Eytan Adar et al Presented by Xiao Hu CS491CXZ . Outline. What is Haystack Haystack and IR Data Model System Architecture Information Gathering Problems. What is Haystack?.
E N D
Haystack: Per-User Information Environment1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ
Outline • What is Haystack • Haystack and IR • Data Model • System Architecture • Information Gathering • Problems
What is Haystack? • A software for organizing and retrieving personal information • Totally personalized • One user, one Haystack • Personal digital bookshelf • A prototype
Haystack and IR • Haystack • personal collection • user’s satisfaction • particular user • focus on searching • specific to one user • can observe user’s implicit information needs • IR • large corpus • precision-recall metric • “expert”relevance judge • IF (collaborative filtering) • preference for similar users • require explicit user input All Users / Groups ofUsers A Single User
Haystack Functionality • Automated data gathering • Information maximization • gathering as much information as possible • Customized information collection • Adaptation to individual query needs A IR system that adapts to its user ?
General Data Model • Accommodate all information • arbitrary pieces of data • metadata • links between them • Facilitate data growth • new data • user’s annotation • user’s information behavior • A semantic network • full text searching • bibliographic info. Searching • associate searching • adapt to the user
General Data Model (summary) • Inheritance hierarchy • Straw needle: primitive information • bale: collection of related straws • tie: relationship b/w straws • Metadata representation • Recursive metadata annotation • Interface Haystack to external “services” • Index agents controlling external devices
System Architecture • Database, searching engine • an adapter as interface to various engines • Core Haystack system (root server) • data model implementation • operation-system-like services • Client level services • user interface • proxy services • data augmenting services • annotation, querying, browsing • observing interaction with external information resources • modifying data, adding links,…
Indexing in Haystack • Straws generate textual information • IR system stores such information • Info. from each straw will be regarded as one unit of indexing • allows to associate pieces of information • Incrementally indexing • whenever a series of changes happen
Outline • What is Haystack • Haystack and IR • Data Model • System Architecture • Data Gathering • Problems
Information gathering • User’s explicit annotation • User’s behaviors observed by the system • interaction with outside world (www, emails) • interaction with Haystack • building query paths – adapting to the user’s style • Analyzing corpus already in Haystack • indexing • metadata extraction • adding links between documents
User’s explicit annotation • Probably the best information source • Might not be realistic • Nicer interface to encourage users • HCI studies
Observers • Proxy services • WWW, email proxies • Recording webpages the user sees • Tracing the path of browsing • Recording visiting time • …… • Query observer • Using query interactions to mold the data model to the user • Plug in new data • Adding links b/w nodes • Facilitating retrieval
Query Observer • Integrates queries into the data model • Query straw • a bale, containing query text, rank of docs, …. • attached nodes of matched documents • annotations from user’s choices relevance feedback • Query path • a chain of query straws in a single searching • good for future retrievals: • presenting similar query terms • adapting relevance of documents by reindexing documents with text of the query path tuned to a particular user
Information gathering • User’s explicit annotation • User’s behaviors observed by the system • interaction with outside world (www, emails) • interaction with Haystack • building query paths – adapting to the user’s style • Analyzing corpus already in Haystack • indexing • metadata extraction • adding links between documents Data augmenting clients Data driven clients
Data augmenting clients • digesting existing information, generating new information • Independent but cooperating • Fetch clients • Type inference clients • Extractor clients • Field finder clients • Triggered by events: data changes in Haystack
Summary • A prototype of a personalized information organization and retrieval system • Relationship with IR • General Data Model • graph, straws, … • System Architecture • three layers: DB, core, clients • Data Gathering • three approaches
Problems • Information maximization assumption • the more, the better? • for one user, but has to be prepared for all users • what are useful clues? • Efficiency issues • dynamic indexing • a slow system (512M memory, 2G disk…) • Today’s haystack project • semantic web, RDF, ontology, user interface …