1 / 11

Email history

Email history. Analysis. for mining virtual reference Qs. Sample Dataset : a Virtual Library 2002 archived emails.(#:11051) Objective: Mining the virtual reference questions from reference email repository for the aim of developing right-on-target, efficient virtual reference service.

nealm
Download Presentation

Email history

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Email history Analysis for mining virtual reference Qs Sample Dataset : a Virtual Library 2002 archived emails.(#:11051) Objective: Mining the virtual reference questions from reference email repository for the aim of developing right-on-target, efficient virtual reference service. What are presented here are some warming-up projects, attempting to find/design good tools and algorithms for further research by applying various of visualization technologies Ning Yu, SLIS, IU-Bloomington

  2. Literature Review of Email Research Source: www.emailresearch.org Task Management User Analysis Agent/ Collaborative filtering Application/ Case Studies Burst Detect Data Mining: LSA( Latent Semantic Analysis) Visualization: PathFinder

  3. Hierarchy Structure ---- Got awhole view of the mailbox & Quickly find the pending email Data Mining: Java, XML Visualization: Hyperbolic Tree

  4. A Demo of searching via Hyperbolic tree Email Body Data Mining: Java, XML Visualization: Hyperbolic Tree

  5. GRIDL repository for Email --- fit better for more interactive emails (eg: personal) Data Mining: Access Visualization: GRIDL

  6. Time-Series Statistic Visualization ---- some interesting findings on library email string Timeof A Day Based on analyzing the incoming email amount over a day, librarians are able to make a more reasonable work schedule . May from the international students Day of a week Interestingly, the average curves of bothemail length and number over a week are near symmetric Data Mining: Java, Excel Visualization: TimeSearcher Sun Mon Tues Thur Fri Sat Wed

  7. Are we working hard enough?Monthly statistics Searching field Raw data Student/ Email number Average Line Librarian/ Email number Day of month Summary Windows Student/ Email length Data Mining: Java, Excel Visualization: TimeSearcher Librarian/ Email length

  8. How much shall we help this student?student-base statistics X: Email Number A query for the students who hold emails more than 31between the librarians An Active student Transaction 1 Transaction2

  9. Goal: Observe the evolvement of virtual reference question by identifying the highest bursts (terms) in the history email (full text) . The result can be applied to help librarians set up their knowledge base right on target and update their knowledge & material on time . . Problem & possible solution Hard to overcome the synonymy and polysemy problems need to combined with LSA Cannot find the fundamental problems. (How, Why, When, etc.)  auto classification method? . Assumption If two terms burst together and have similar burst span, they may be related to each other. What’s more, if the weight of the burst are similar, then they are tend to have strong relationship and belong to one topic. (e.g. gum and chewing) . Top 100 Bursts

  10. Hand out

  11. Top 100 Bursts Top 100 Bursts in virtual reference emails Any comment are welcome. (esp. ethical issue in email research) InfoVis Lab Open House 2003-12-05 Ning Yu nyu@indiana.edu

More Related