1 / 19

WordSieve: Learning Task Differentiating Keywords Automatically

WordSieve: Learning Task Differentiating Keywords Automatically. Travis Bauer Sandia National Laboratories (Research discussed today was done at Indiana University). Learning Task Contexts: Calvin. Learn what characterizes a user’s task contexts Unobtrusive Observing Keyword Extraction

Gabriel
Download Presentation

WordSieve: Learning Task Differentiating Keywords Automatically

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WordSieve: Learning Task Differentiating Keywords Automatically Travis Bauer Sandia National Laboratories (Research discussed today was done at Indiana University)

  2. Learning Task Contexts:Calvin • Learn what characterizes a user’s task contexts • Unobtrusive Observing • Keyword Extraction • Index based on Context

  3. Currently Used Algorithms • TFIDF • Latent Semantic Analysis • Log-Entropy

  4. Currently Used Algorithms • TFIDF • "One of the most successful and well tested techniques in Information Retrieval." - Pazanni • Syskill & Webert (Pazanni '96) • Hierarchical Feature Map (Merkl '97) • Learning in Document Filtering (Callen '98) • Topic Detection (Shultz '99) • Remembrance Agent (Rhodes '00) • Lexical Signatures (Park '02) • Latent Semantic Analysis • Log-Entropy

  5. Currently Used Algorithms • TFIDF • Latent Semantic Analysis • Well known, popular, well covered in the literature • Grading Essay Tests • Taking Physics tests • Taking synonym exams • Cross Linguistic IR (Dumais '97) • Assigning papers for peer review (Dumais '92) • Information Filtering (Foltz '90) • Log-Entropy

  6. Currently Used Algorithms • TFIDF • Latent Semantic Analysis • Log-Entropy • Not used as much for Personal Information Retrieval • Higher overhead than TFIDF • Indexes based on the distribution of terms across documents – potentially better performance

  7. Current Techniques Static Corpora Comprehensive Statistics WordSieve Neural Network-like processing Stream of data Local learning Competitive Learning Comparison to Current Techniques

  8. Good Discriminator of Context

  9. WordSieve Concept User Browsing Attributes Term Activation Priming

  10. Doc Stream WordSieve 1 Words Absent in Document Sequences User Profile Context Profile Words Occurring in Document Sequences Words Currently Occurring Frequently

  11. Doc Stream WordSieve 2 User Profile Words Reflecting Context Context Profile Words Currently Occurring Frequently

  12. Web Browsing Data Set • Sixteen Users • Four Topics, 10 minutes Each • Political Life Al Gore • Political Life George Bush • Traditional Indonesian Cooking • Traditional Thai Cooking Categorized Document Set Automatically Generated Queries

  13. Browsing Results

  14. Contributions • It is possible to extract context differentiating terms from document streams using unsupervised competitive learning. • Comprehensive statistics are not necessary in the described situations given an ordering of the documents. • Performance is comprable to LSI and better than Log-Entropy and TFIDF

  15. Potential Next Steps • WordSieve • Automate Parameter Optimization • Co-occurrance of terms • Other Domains • Multi-dimensional data stream • Machine Vision

  16. Support This work was conducted under the advisement of David Leake at Indiana University. It was sponsored in part by the GAANN fellowship. The original version of the personal information agent was designed and written with partial support from NASA under award No NCC 2-1035

  17. For More Information Travis Bauer www.cs.indiana.edu/~trbauer/publications.htm

  18. Usenet Data Set Three sets of 5 newsgroups • alt.atheismtalk.religion.miscsoc.religion.christianrec.sport.baseballrec.sport.hockey • comp.os.ms-windows.misccomp.sys.ibm.pc.hardwarecomp.sys.mac.hardwarerec.autosrec.motorcycles • talk.politics.gunstalk.politics.miscsci.electronicssci.medsci.space Categorized Document Set Automatically Generated Queries

  19. Usenet Results

More Related