110 likes | 129 Views
Email history. Analysis. for mining virtual reference Qs. Sample Dataset : a Virtual Library 2002 archived emails.(#:11051) Objective: Mining the virtual reference questions from reference email repository for the aim of developing right-on-target, efficient virtual reference service.
E N D
Email history Analysis for mining virtual reference Qs Sample Dataset : a Virtual Library 2002 archived emails.(#:11051) Objective: Mining the virtual reference questions from reference email repository for the aim of developing right-on-target, efficient virtual reference service. What are presented here are some warming-up projects, attempting to find/design good tools and algorithms for further research by applying various of visualization technologies Ning Yu, SLIS, IU-Bloomington
Literature Review of Email Research Source: www.emailresearch.org Task Management User Analysis Agent/ Collaborative filtering Application/ Case Studies Burst Detect Data Mining: LSA( Latent Semantic Analysis) Visualization: PathFinder
Hierarchy Structure ---- Got awhole view of the mailbox & Quickly find the pending email Data Mining: Java, XML Visualization: Hyperbolic Tree
A Demo of searching via Hyperbolic tree Email Body Data Mining: Java, XML Visualization: Hyperbolic Tree
GRIDL repository for Email --- fit better for more interactive emails (eg: personal) Data Mining: Access Visualization: GRIDL
Time-Series Statistic Visualization ---- some interesting findings on library email string Timeof A Day Based on analyzing the incoming email amount over a day, librarians are able to make a more reasonable work schedule . May from the international students Day of a week Interestingly, the average curves of bothemail length and number over a week are near symmetric Data Mining: Java, Excel Visualization: TimeSearcher Sun Mon Tues Thur Fri Sat Wed
Are we working hard enough?Monthly statistics Searching field Raw data Student/ Email number Average Line Librarian/ Email number Day of month Summary Windows Student/ Email length Data Mining: Java, Excel Visualization: TimeSearcher Librarian/ Email length
How much shall we help this student?student-base statistics X: Email Number A query for the students who hold emails more than 31between the librarians An Active student Transaction 1 Transaction2
Goal: Observe the evolvement of virtual reference question by identifying the highest bursts (terms) in the history email (full text) . The result can be applied to help librarians set up their knowledge base right on target and update their knowledge & material on time . . Problem & possible solution Hard to overcome the synonymy and polysemy problems need to combined with LSA Cannot find the fundamental problems. (How, Why, When, etc.) auto classification method? . Assumption If two terms burst together and have similar burst span, they may be related to each other. What’s more, if the weight of the burst are similar, then they are tend to have strong relationship and belong to one topic. (e.g. gum and chewing) . Top 100 Bursts
Top 100 Bursts Top 100 Bursts in virtual reference emails Any comment are welcome. (esp. ethical issue in email research) InfoVis Lab Open House 2003-12-05 Ning Yu nyu@indiana.edu