Recognizing Document Value from Reading and Organizing Activities in Document Triage

Recognizing Document Value from Reading and Organizing Activities in Document Triage Rajiv Badi, Soonil Bae, J. Michael Moore, Konstantinos Meintanis, Anna Zacchi, Haowei Hsieh, Frank Shipman Center for the Study of Digital Libraries &Department of Computer ScienceTexas A&M University Catherine C. Marshall Microsoft Corporation

Document Triage • Document triage is the rapid evaluation of a set of documents for later use. • Document triage places different demands on attention than single-document reading activities • Continuum of types of reading: • working in overview (metadata), • reading at various levels of depth (skimming), • reading intensively

Visual Knowledge Builder (VKB)

Search in VKB

Supporting Document Triage • Central problem in document triage is limited time. • VKB enables rapid expression of human assessment using visual cues • Goal: have system aid in selecting documents • How: observe user’s triage activities to provide cues that will aid in the selection of further documents

Process for Providing Support • Recognize user interest in and interpretation of documents • Generate a representation of user interests • Identify documents that match these interests • Provide visual cues to indicate the potential value of documents

Process for Providing Support • Recognize user interest in and interpretation of documents • Generate an abstract representation of user interests • Identify documents that match these interests • Provide visual cues to indicate the potential value of documents

Acquiring User Interest Model • Explicit Methods • users tend not to provide explicit feedback • Implicit Methods • Reading time has been used in many cases • Scrolling and mouse events have been shown somewhat predictive • Annotations have been used to identify passages of interest • Problem: Individuals vary greatly and have idiosyncratic work practices

Data from an Earlier Study Task: subjects placed in role of a reference librarian, selecting and organizing information on ethnomathematics for a teacher Setting: top 20 search results from NSDL & top 20 search results from Google Subjects given as much time as they deemed necessary (after training) After completing task, the 24 subjects were asked to identify: • 5 documents they found most valuable • 5 documents they found least valuable

slide w/vkb + IE

What Actions Were Correlated with Document Preferences? Lots (ordered from most to least correlated) • Number of object moves • Scroll offset • Number of scrolls • Number of border color changes • Number of object resizes • Total number of scroll groups • Number of scrolling direction changes • Number of background color changes • Time spent in document • Number of border width changes • Number of object deletions • Number of document accesses • Length of document in characters

Reading Application Interest User Interest Profile Estimation Engine Manager Reading Application Reading Application Organizing Location/Overview Interest Profile Application Application Modeling based on Reading and Interpretation • Document triage combines multiple forms of reading and interpretation • Infrastructure for applications to construct and share interest models

Interest Models Based on data from an earlier study, we developed four interest models • Three were mathematically derived • Reading-Activity Model • Organizing-Activity Model • Combined Model • One hand-tuned model included human assessment based on observations of user activity and interviews with users.

Quick Comparison of Models • How much of difference in original data was modeled? • Reading-activity model 47.7% • Organizing-activity model 63.6% • Combined model 70.8% • How well would models do for new data?

Evaluation of Models • 16 Subjects with same • Task (collecting information on ethnomathmatics for teacher) and • Setting (20 NSDL and 20 Google results) • Different display configuration • Using a single display in this case where used two displays before • Different rating of documents • Subjects rated all documents on a 5-point Likert scale (with 1 meaning “not useful” and 5 meaning “very useful”)

Predictive Power of Models • Models were conservative due to data from original study. • Used aggregated user activity and user evaluations to evaluate models Model Avg. Residue Std. Dev. Reading-activity model 0.258 0.192 Organizing-activity model 0.216 0.146 Combined model 0.175 0.138 Hand-tuned model 0.197 0.134

Size of Errors

Next Steps • Update models • Revise weights based on Likert-scale data • Incorporate additional features of user activity • Run another set of subjects with same form of document evaluation • Evaluate predictive power for individuals • Evaluation with other domains/tasks • Effect of document set • Effect of domain/subject matter expertise

Summary • Our goal is to support document triage by inferring user interest • Developed infrastructure for applications to share interest model • Compared reading-activity, organizing-activity, and combined models • Combined model better than reading-activity model (p=0.02) and organizing-activity model (p=0.07). • Lots of work left to do …

Contact Information Frank Shipman shipman@cs.tamu.edu Download VKB 2 from: http://www.csdl.tamu.edu/VKB

Recognizing Document Value from Reading and Organizing Activities in Document Triage

Recognizing Document Value from Reading and Organizing Activities in Document Triage

Presentation Transcript

Document Solutions

Document Scanning Services in Ontario: How Document Scannin

Document Imaging

Document Designer – Delivery Document

DOCUMENT DELIVERY ?

Redesigning School Counseling ACTIVITIES Document 7.4

DOCUMENT EXAMINATION

Document Analysis

Document Preparation

Multilingual document mining and navigation using self-organizing maps

Cluster:Juices Processing Industry Document: Activities implemented

Document A

Delivering Value Driven Document Management

ILL/Document Delivery from Korea

DoT Face Value Document

Document Organization using Self – Organizing Feature Maps (WEBSOFM)

Organizing a Document*

Document Scanning, Document Capture and Management - Data Di

Secure Document Destruction - Smart Data and Document Destruction

Document

document

document