1 / 23

Recognizing Document Value from Reading and Organizing Activities in Document Triage

This study explores the process of document triage and proposes the use of visual cues to aid in the selection of valuable documents. The study evaluates different interest models and their predictive power in identifying valuable documents.

salbert
Download Presentation

Recognizing Document Value from Reading and Organizing Activities in Document Triage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recognizing Document Value from Reading and Organizing Activities in Document Triage Rajiv Badi, Soonil Bae, J. Michael Moore, Konstantinos Meintanis, Anna Zacchi, Haowei Hsieh, Frank Shipman Center for the Study of Digital Libraries &Department of Computer ScienceTexas A&M University Catherine C. Marshall Microsoft Corporation

  2. Document Triage • Document triage is the rapid evaluation of a set of documents for later use. • Document triage places different demands on attention than single-document reading activities • Continuum of types of reading: • working in overview (metadata), • reading at various levels of depth (skimming), • reading intensively

  3. Visual Knowledge Builder (VKB)

  4. Search in VKB

  5. Supporting Document Triage • Central problem in document triage is limited time. • VKB enables rapid expression of human assessment using visual cues • Goal: have system aid in selecting documents • How: observe user’s triage activities to provide cues that will aid in the selection of further documents

  6. Process for Providing Support • Recognize user interest in and interpretation of documents • Generate a representation of user interests • Identify documents that match these interests • Provide visual cues to indicate the potential value of documents

  7. Process for Providing Support • Recognize user interest in and interpretation of documents • Generate an abstract representation of user interests • Identify documents that match these interests • Provide visual cues to indicate the potential value of documents

  8. Acquiring User Interest Model • Explicit Methods • users tend not to provide explicit feedback • Implicit Methods • Reading time has been used in many cases • Scrolling and mouse events have been shown somewhat predictive • Annotations have been used to identify passages of interest • Problem: Individuals vary greatly and have idiosyncratic work practices

  9. Data from an Earlier Study Task: subjects placed in role of a reference librarian, selecting and organizing information on ethnomathematics for a teacher Setting: top 20 search results from NSDL & top 20 search results from Google Subjects given as much time as they deemed necessary (after training) After completing task, the 24 subjects were asked to identify: • 5 documents they found most valuable • 5 documents they found least valuable

  10. slide w/vkb + IE

  11. What Actions Were Correlated with Document Preferences? Lots (ordered from most to least correlated) • Number of object moves • Scroll offset • Number of scrolls • Number of border color changes • Number of object resizes • Total number of scroll groups • Number of scrolling direction changes • Number of background color changes • Time spent in document • Number of border width changes • Number of object deletions • Number of document accesses • Length of document in characters

  12. Reading Application Interest User Interest Profile Estimation Engine Manager Reading Application Reading Application Organizing Location/Overview Interest Profile Application Application Modeling based on Reading and Interpretation • Document triage combines multiple forms of reading and interpretation • Infrastructure for applications to construct and share interest models

  13. Interest Models Based on data from an earlier study, we developed four interest models • Three were mathematically derived • Reading-Activity Model • Organizing-Activity Model • Combined Model • One hand-tuned model included human assessment based on observations of user activity and interviews with users.

  14. Quick Comparison of Models • How much of difference in original data was modeled? • Reading-activity model 47.7% • Organizing-activity model 63.6% • Combined model 70.8% • How well would models do for new data?

  15. Evaluation of Models • 16 Subjects with same • Task (collecting information on ethnomathmatics for teacher) and • Setting (20 NSDL and 20 Google results) • Different display configuration • Using a single display in this case where used two displays before • Different rating of documents • Subjects rated all documents on a 5-point Likert scale (with 1 meaning “not useful” and 5 meaning “very useful”)

  16. Predictive Power of Models • Models were conservative due to data from original study. • Used aggregated user activity and user evaluations to evaluate models Model Avg. Residue Std. Dev. Reading-activity model 0.258 0.192 Organizing-activity model 0.216 0.146 Combined model 0.175 0.138 Hand-tuned model 0.197 0.134

  17. Size of Errors

  18. Next Steps • Update models • Revise weights based on Likert-scale data • Incorporate additional features of user activity • Run another set of subjects with same form of document evaluation • Evaluate predictive power for individuals • Evaluation with other domains/tasks • Effect of document set • Effect of domain/subject matter expertise

  19. Summary • Our goal is to support document triage by inferring user interest • Developed infrastructure for applications to share interest model • Compared reading-activity, organizing-activity, and combined models • Combined model better than reading-activity model (p=0.02) and organizing-activity model (p=0.07). • Lots of work left to do …

  20. Contact Information Frank Shipman shipman@cs.tamu.edu Download VKB 2 from: http://www.csdl.tamu.edu/VKB

More Related