1 / 32

Incremental Context Mining for Adaptive Document Classification

Incremental Context Mining for Adaptive Document Classification. Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Rey-Long Liu Yun-Ling Lu. Outline. Motivation Objective Introduction Overview of the approach Incremental context mining for ACclassifier Experiments

polly
Download Presentation

Incremental Context Mining for Adaptive Document Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Incremental Context Mining for Adaptive Document Classification Advisor:Dr. Hsu Graduate:Chien-Shing Chen Author:Rey-Long Liu Yun-Ling Lu

  2. Outline • Motivation • Objective • Introduction • Overview of the approach • Incremental context mining for ACclassifier • Experiments • Conclusions • Personal Opinion • Review

  3. Motivation • Adaptive document classification (ADC) that adapts a DC system to the evolving contextual requirement of each document category, so that input documents may be classified based on their contexts of discussion.

  4. Objective • 1.CR terms should be mined by analyzing multiple documents from multiple categories. • 2.Inappropriate feature may introduce the problems of inefficiency and errors. • 3.ADC may serve as the basis for supporting efficient and high-precision DC.

  5. 1.Introduction Two components of ACclassifier (Adaptive Context-based Classifier). 1. An incremental context miner 2. Document classifier. Both components work on a given text hierarchy in which a node corresponds to a document category.

  6. 2.Overview of the approach

  7. 3-1.An incremental context miner

  8. 3-2.An incremental context miner

  9. 3-3.CR CR : Contextual Requirement of the category

  10. 3-4. TFIDF Strength: w serving as a context word for the documents under c TFIDF (Term Frequency * Inverse Document Frequency)

  11. 3-5. TFIDF Strength(Wcomputer,CMIS)= Strength(Wdos,CMIS)=

  12. 3-6. The incremental context miner 電機 S(computer)>0.909 S(computer)=0.909 S(dos)=2 S(EC)=0.476 S(computer)=0.022

  13. 3-7.An incremental context miner

  14. 4-1. DOA Given a document d to be classified, the basic idea is to compute the degree of acceptance (DOA). The DOA is computed based on the strengths of d ’s distinct words on c.

  15. 4-2. Two phases of classifier • The estimation of DOA for each category. • The identification of the winner category.

  16. 4-3. Estimation of DOA for each C

  17. 4-4. DOA If w is a strong context word in c and occurs many times in d, c is more likely to “accept” d. Frequency:5 D1 : 5000 minSupport:0.001

  18. 4-5. Constraint I New Di Computer 20/40 DOS 10/40 Java 2/40 Mouse 3/40 Delphi 1/40

  19. 4-6. Constraint II

  20. 4-7. Given a document to be classified If w is a strong context word in c and occurs many times in d, c is more likely to “accept” d. New Di Computer 20/40 DOS 10/40 MIS DSS S(computer)=0.909 S(dos)=2 S(EC)=0.476 S(computer)=0.022

  21. 4-8. DOA DOAMIS=0.909 * 20/40 = 0.4545 DOAMIS=2 * 10/40 = 0.5 DOAMIS of Dnew DOAMIS=0.9545

  22. 4-9. Complete the DOA of all Category

  23. 4-9. The document classifier

  24. 5-1. correct classification • Builting from the 1100 documents for initial training.

  25. 5-2. correct classification • Baseline :allowed to use 5000 features in their feature set.

  26. 5-3. correct classification • Using all training documents to build their feature set and classifiers.

  27. 5-4. Consider the test document entitled • “Setting up Email in DOS with today’s ISP using a dialup PPP TCP/IP connection”. • Baseline systems: “Software”,””Windows”,and “Operating Systems” • ACclassifier:”TCP/IP”,”connection”,”computernetworking”,”userID”

  28. 5-5. cumulative training & testing time(sec.) • The time spent by ACclassifier grew slower when about 1400 training documents were entered.

  29. 5-6. cumulative training & testing time(sec.) • The time spent by ACclassifier grew slower when about 1400 training documents were entered.

  30. 6. Conclusions 1.Efficient mining of the contextual requirements for high-precision DC. 2.Incremental mining without reprocessing previous documents. 3.Evolutionary maintenance of the feature set. 4.Efficient and fault-tolerant hierarchical DC.

  31. 7.Personal Opinion It’s acceptable on purity in hierarchy.

  32. 8.Review

More Related