340 likes | 450 Views
Efficient Process for Constructing a Hierarchical Classification System. Yong-wook Yoon Dec 22, 2003 NLP Lab., POSTECH. Contents. Introduction Related Work Measure for Hierarchical Classifier Hierarchical Classification Demo Experiment Contribution Future Work. Introduction.
E N D
Efficient Process for Constructing a Hierarchical Classification System Yong-wook Yoon Dec 22, 2003 NLP Lab., POSTECH
Contents • Introduction • Related Work • Measure for Hierarchical Classifier • Hierarchical Classification Demo • Experiment • Contribution • Future Work
Introduction • New Trends in text categorization • Massive amount of documents produced everyday • Often require On-line classification • Flat vs. Hierarchical Classification • Hierarchical method is feasible in a large amount of documents that have many levels of hierarchy • Advantages of Hierarchical Classification • Well fit to a large number of categories • Efficient in training time • Better performance than flat classifier
Issues in Hierarchical Classification • There are no appropriate measure • To evaluate the performance of a Hierarchical Classifier • There are no systematic process • To construc a large level of Hierarchical classification system • Our Suggestions • New evaluation scheme that is well fit to Hierarchical Classification system • Efficient process to construct an optimal hierarchical classification system
Flat vs. Hierarchical Classification Business Root Grain Oil C1 C2 C3 C1 C2 Ci Cj Cj+1 Cn Cn
Variations in Hierarchical Classification • Virtual Category tree vs. Category tree • Categories are organized as trees (cf. DAG) • Documents can be assigned to • Leaf categories only (cf. Category tree) • Two methods in Hierarchical classification • Big-Bang Approach • By only one classification • A document is assigned to a leaf-node class or an internal-node class • Top-down level-based Approach • A classifiier at each node of a hierarchy tree • A document is classified by • Applying a sequence of classifiers from the root node to a leaf node
Virtual Category Tree withTop-down level-based Classification At root node, there exists kclassifiers, where k is the # of child nodes Doc Root yes yes Each classifier determine whether to descend the document to the lower level according to the sign of SVM score → called ‘Pachinko-Machine’ no Comp. Alt.athesim Talk. Class_i Finally, at the leaf nodes the correctness of prediction is examined Class_1 Classifier Classifier Class_N
Contents • Introduction • Related Work • Measure for Hierarchical Classifier • Hierarchical Classification Demo • Experiment • Contribution • Future Work
Previous Evaluations in Hierarchical Classification • Dumais and Chen (SIGIR-2000) • Traditional Precision and Recall of each leaf-node classifier • The probability of leaf-node classifier • L1: internal-node, L2: leaf-node classifier • Boolean Scoring Function: P(L1)&&P(L2) • Multiplicative Scoring Function: P(L1)*P(L2) • Limitations in Dumais and Chen • May be feasible to simple cases such as 2-levels hierarchy (But large hierarchy?) • No concern about the internal node performance
Previous Evaluations in Herarchical Classification (2) • Aixin Sun et al (JASIST’03) • “Expanded Precision and Recall” • Considering category similarity and • Contributions of misclassified documents • Limitations • Difficult to compare with flat methods directly • Too complex to calculate • No concern about internal node performance
SVM in text categorization • Firstly suggested Joachims(1997) • Shows superiority of SVMs over others • with experiments on Reuters-21578 (flat method) • Theoretical learning model in TC (SiGiR’01) • SVM with Hierarchical method • Dumais and Chen (SiGiR’00) • LookSmart Web directory (www.looksmart.com) • 17173 categories organized into 7-level hierarchy • Tao Li et al (SiGiR’03) • 20-Newsgroups, Optimally clustered 2-level hierarchy • Only measure accuracy of a classifier
Contents • Introduction • Related Work • Measure for Hierarchical Classifier • Hierarchical Classification Demo • Experiment • Contribution • Future Work
New evaluation on Hierarchical classification • Intermediate Precision and Recall • For Internal node Classifier • Selecting the classifier with the optimal performance at intermediate level • Approximate Precision and Recall • Performance of entire system in the middle of construction process • Overall P and R of Hierarchical System • Applicable to Hierarchical Classifier • Compatible to the traditional P and R of flat classification
A BC D B C Evaluation of multi-labeled hierarchical classification • Given 4 category, 10 documents for test • # of predictions: 4 x 10 = 40 A B C D Doc_1: Ac Pr Doc_2: Doc_1: Ac Pr Ac Pr A - - TN BC + + TP D - + FP B + - FN C + + TP A - - TN B + - FN C + + TP D - + FP A - - TN B + - FN C + + TP D - + FP Delayed Evaluation Doc_2: A - - TN BC + - FN D - + FP B + - FN C + - FN Ac: Actual class Pr: Predicted class Pre-expanded Evaluation
Intermediate Recall of an Internal Classifier • NLCj is the weighting factor • The number of all leaf node classifiers that are descendants of node j • Reasonable in micro-averaged evaluation
Internal node classifier vs. Leaf node classifier Business Oil Grain Meat Cj Cj+1 Cn Cn C1 C2 Ci C2 Pork Ex) NLCj (Meat) = 5
Approximate Precision and Recall at the level-k Where TPi: # of true positive at leaf classifier I TPj,FPj, and FNj: at the internal classifier j
Overall Recall in HTC, Rh • Definition where TPi: # of true positive at leaf classifier i FNj: # of false negative at leaf classifier j WFNk: weighted FNk at internal classifier k
Contents • Introduction • Related Work • Measure for Hierarchical Classifier • Hierarchical Classification Demo • Experiment • Contribution • Future Work
Contents • Introduction • Related Work • Measure for Hierarchical Classifier • Hierarchical Classification Demo • Experiment • Contribution • Future Work
20 newsgroup dataset • Usenet news article collection • 19,997 documents in 20 newsgroups • Each document consists of two parts: header and body • Able to consider the intrinsic hierarchy • 4.5 % of the articles have been posted to more than one newsgroup • The cause of ‘Multi-classes’ • ‘alt.atheism’ and ‘talk.religion.misc’
root alt comp misc rec sci soc talk forsale religion atheism os graphics sys windows crypt electronics med space sport motor- cycles autos christian x ms-windows ibm mac baseball hocky politics religion misc hardware hardware guns mideast misc misc 총 8개의 classifier가 필요
Classification Result • 20 Newsgroups in Three-level tree
Selection of Optimal Internal Classifier • using Intermediate P and R
Approximate P and R • For Three-level Hierarchy • Result of Another Hierarchies • 2-level shows better performance than 3-level • Clustered hierarchy is comparable to ours
Contents • Introduction • Related Work • Measure for Hierarchical Classifier • Hierarchical Classification Demo • Experiment • Contribution • Future Work
Contribution • Evaluation measure for a Hierarchical Classification system • Final performance in terms of P and R • Fully compatible the previous measure • Makes possible to compare the performances • Between flat and hierarchical classifier • Between different hierarchical classifiers • Algorithm to efficiently construct a hierarchical classification system • With good performance • Intermediate evaluation in the middle of construction process • Maintaining original benefit of hierarchical method • Training time saving • Good performance of SVM • Easily applicable in the on-line execution environment
Future Work • Further Research is required • The appropriate number of sub classes • 2-level performs better than 3-level • The criterion in the selection of optimal internal node classifier • Recall or BEP or Interpolation ? • 실험대상 문서집합의 확대 • Reuter News articles • WEB KB • 실제 Web 문서
The End 감사합니다.
Approximate Precision and Recall • Another helpful measures to construct a hierarchical classification system • Given K height of category tree, • Compute the Approximate P and R at the level-k • Helpful to recognize • How close the approximate performance is to the final performance of entire system
Selection Criteria of Optimal Internal Classifier The performance of cost 150 is Superior to the one of cost 80 !
root Clustered 2-level Hierarchy 5 1 2 3 7 8 4 6 rec comp sci misc soc.religion autos alt talk talk .graphics .os.ms-windows.misc .sys.ibm.pc.hardware .sys.mac.hardware .windows.x sci crypt forsale christian religion politics atheism rec electronics space misc med mideast motor- cycles guns sport.baseball misc sport.hockey
Support Vector Machine • Widely used in Text Categorization Recently • Shows good performance in classification tasks with large amount of data and high dimension • SVM training involves solving a quadratic programming (αi, b) • The optimal solution gives rise to adecision functionwhich we use in prediction phase • Given l data points {(x1,y1), … , (xl, yl)}
Focus in Our Paper • Suggestion of New Measuring Scheme • Well fit to Hierarchical Classification • Efficient construction of Hierarchical Classifier • Compatible to the previous measures • Enables easy comparison between flat and hierarchical classifier • Efficient Hierarchical Classification model • Virtual Category Structure + SVM • Evaluation by Intermediate Precision and Recall