250 likes | 430 Views
Interfaces for Selecting and Understanding Collections. Selecting from Collections. Collections are sets of documents that have been coalesced by a human or system. Traditional collections: NLM’s MedLine ACM Digital Library LEXIS-NEXIS Library/museum resources from a particular donor
E N D
Selecting from Collections • Collections are sets of documents that have been coalesced by a human or system. • Traditional collections: • NLM’s MedLine • ACM Digital Library • LEXIS-NEXIS • Library/museum resources from a particular donor • How do people with information needs locate and identify the appropriate collections?
Does it Matter? • Web search engines (e.g. Google) get us the information we need … • well maybe • Web search drops users into the middle of a collection without any understanding of the collection and its overall characteristics. • Web search misses • Lots of more structured materials • “the hidden web” • Subscription-based content • Which is likely the best edited, most accurate, and most valuable in specialized domains
Interfaces over Multiple Collections • Interfaces for Selecting and Understanding Collections • Lists • Overviews • Examples • Automated source selection
Lists of Collections • Usually just provides a list of collection names. • Difficult to select from if user does not know the collections beforehand • Over time people bookmark collections of value • Need tools for helping users who are outside of their areas of expertise
Overviews of Collections • Overviews provide a sense of what is in a collection • Overviews can be • Based on a category or directory structure • Automatically derived from the collection • Presentation of an overview is often a form of information visualization
Category-based Overviews • MedLine – biomedical collection • Medical Subject Headings (MeSH) consists of 18,000 categories in a directed acyclic graph • ACM Digital Library – computer science collection • Hierarchy of 1200 catgory (keyword) labels • Yahoo – the Web • Graph of directories (probably a DAG) • Humans have to place documents in categories • Author for ACM DL, subject experts for MedLine, surfers for Yahoo
Automatically Derived Overviews • Apply clustering algorithms to document collection • Remember Automatic Global Analysis • Use of co-occurrance and co-citation • Use of distance-based clustering approaches like hierarchic agglomerative clustering • Need methods to determine labels for clusters • Could be a document • identification of centroid (document most similar to all others) • Identification of hubs (document most mentioned by cluster) • Could be one or more terms • Use most common / best differentiator (using TF-IDF) • No human intervention required • but people are likely to be valuable as editors
Evaluation of Scatter-Gather • Scatter-Gather • Scatter-Gather conveyed overview of collection contents • Scatter-Gather without search was less effective than a basic search • Need to combine clustering with search
Evaluation of Graphical Overviews • One study found that non-experts found the clustering results difficult to use (worse than text-based views like Scatter-Gather) • Comparison of Kohonen map and Yahoo • 11 of 15 subjects found “interesting” page using Kohonen • 8 were able to find same page using Yahoo • 14 of 16 subjects found “interesting” page using Yahoo • 2 were able to find same page using Kohonen • Subjects liked ability to jump between categories without backing out of current category • Unsupervised thematic overviews probably better for giving a gist of what is in a collection than for search.
Examples, Dialogs, Wizards • Retrieval by reformulation • Start with example queries • Rabbit, Helgon • Can be difficult to find appropriate starting query • Wizards • Found to be helpful for users without necessary domain knowledge get through many step processes • Not helpful when wizard not accompanied with help • Not useful when goal is teaching how to use the interface. • Guided tours • Presents a logical sequence of navigation choices for accomplishing a goal (e.g. Waldens Paths) • Not evaluated with regards to information access
Automated Source Selection • Selecting collection automatically (but explicitly) • Need a model of each collection • What it covers, need model of topics • What it is good at, need metric for good • Develop a model of the user’s information need • Match the information need to the most valuable collections for that topic • Used in meta-search – interesting area of research • Could be starting point for interactive collection selection.