Incremental Context Mining for Adaptive Document Classification

Incremental Context Mining for Adaptive Document Classification Advisor：Dr. Hsu Graduate：Chien-Shing Chen Author：Rey-Long Liu Yun-Ling Lu

Outline • Motivation • Objective • Introduction • Overview of the approach • Incremental context mining for ACclassifier • Experiments • Conclusions • Personal Opinion • Review

Motivation • Adaptive document classification (ADC) that adapts a DC system to the evolving contextual requirement of each document category, so that input documents may be classified based on their contexts of discussion.

Objective • 1.CR terms should be mined by analyzing multiple documents from multiple categories. • 2.Inappropriate feature may introduce the problems of inefficiency and errors. • 3.ADC may serve as the basis for supporting efficient and high-precision DC.

1.Introduction Two components of ACclassifier (Adaptive Context-based Classifier). 1. An incremental context miner 2. Document classifier. Both components work on a given text hierarchy in which a node corresponds to a document category.

2.Overview of the approach

3-1.An incremental context miner

3-3.CR CR : Contextual Requirement of the category

3-4. TFIDF Strength: w serving as a context word for the documents under c TFIDF (Term Frequency * Inverse Document Frequency)

3-5. TFIDF Strength(Wcomputer,CMIS)= Strength(Wdos,CMIS)=

3-6. The incremental context miner 電機 S(computer)>0.909 S(computer)=0.909 S(dos)=2 S(EC)=0.476 S(computer)=0.022

4-1. DOA Given a document d to be classified, the basic idea is to compute the degree of acceptance (DOA). The DOA is computed based on the strengths of d ’s distinct words on c.

4-2. Two phases of classifier • The estimation of DOA for each category. • The identification of the winner category.

4-3. Estimation of DOA for each C

4-4. DOA If w is a strong context word in c and occurs many times in d, c is more likely to “accept” d. Frequency:5 D1 : 5000 minSupport:0.001

4-5. Constraint I New Di Computer 20/40 DOS 10/40 Java 2/40 Mouse 3/40 Delphi 1/40

4-6. Constraint II

4-7. Given a document to be classified If w is a strong context word in c and occurs many times in d, c is more likely to “accept” d. New Di Computer 20/40 DOS 10/40 MIS DSS S(computer)=0.909 S(dos)=2 S(EC)=0.476 S(computer)=0.022

4-8. DOA DOAMIS=0.909 * 20/40 = 0.4545 DOAMIS=2 * 10/40 = 0.5 DOAMIS of Dnew DOAMIS=0.9545

4-9. Complete the DOA of all Category

4-9. The document classifier

5-1. correct classification • Builting from the 1100 documents for initial training.

5-2. correct classification • Baseline :allowed to use 5000 features in their feature set.

5-3. correct classification • Using all training documents to build their feature set and classifiers.

5-4. Consider the test document entitled • “Setting up Email in DOS with today’s ISP using a dialup PPP TCP/IP connection”. • Baseline systems: “Software”,””Windows”,and “Operating Systems” • ACclassifier:”TCP/IP”,”connection”,”computernetworking”,”userID”

5-5. cumulative training & testing time(sec.) • The time spent by ACclassifier grew slower when about 1400 training documents were entered.

5-6. cumulative training & testing time(sec.) • The time spent by ACclassifier grew slower when about 1400 training documents were entered.

6. Conclusions 1.Efficient mining of the contextual requirements for high-precision DC. 2.Incremental mining without reprocessing previous documents. 3.Evolutionary maintenance of the feature set. 4.Efficient and fault-tolerant hierarchical DC.

7.Personal Opinion It’s acceptable on purity in hierarchy.

8.Review

Incremental Context Mining for Adaptive Document Classification

Incremental Context Mining for Adaptive Document Classification

Presentation Transcript

Data Mining: Classification

Machine Learning Classification for Document Review

Adaptive Skin Color Classification

Document Classification Comparison

CONTEXT DEPENDENT CLASSIFICATION

Adaptive Subjective Triggers for Opinionated Document Retrieval

Document Classification

Data Stream Mining and Incremental Discretization

Incremental Mining Association Rules

An Efficient Algorithm for Incremental Mining of Association Rules

Incremental Mining of Association Rules

XML Document Mining Challenge

Data Mining Classification:

Incremental and Interactive Sequence Mining

Document classification

Incremental Clustering for Mining in a Data Warehousing Environment

Data Mining: Classification

Incremental Mining of Association Rules

Naive Bayes for Document Classification

CONTEXT DEPENDENT CLASSIFICATION