220 likes | 243 Views
Partially Supervised Classification of Text Documents. Authors: Bing Liu Wee Sun Lee Philip S. Yu Xiaoli Li Presented by: Swetha Nandyala. Overview. Introduction Theoretical Foundation Background Methodology NB-C EM-Algorithm Proposed Strategy
E N D
Partially Supervised Classification of Text Documents Authors: Bing Liu Wee Sun Lee Philip S. Yu Xiaoli Li Presented by: Swetha Nandyala CIS 525: Neural Computation
Overview • Introduction • Theoretical Foundation • Background Methodology • NB-C • EM-Algorithm • Proposed Strategy • Evaluation Measures & Experiments • Conclusion CIS 525: Neural Computation
… the activity of labeling natural language texts with thematic categories from a pre-defined set [Sebastiani, 2002] Text Categorization is a task of automatically assigning to a text document d from a given domain D, a category label c selected among a predefined set of category labels C. c1 c2 … ….. cj … ck Categorization System … … c1 c2 ……... ck Text Categorization D C CIS 525: Neural Computation
Text Categorization(Contd.) • Standard Supervised Learning Problem • Bottleneck: Need for very large number of labeled training documents to build accurate classifier • Goal: To identify a particular class of documents from a set of mixed unlabeled documents • Standard classifications inapplicable • Partially Supervised Classification is used CIS 525: Neural Computation
Theoretical foundations • AIM: To show PSC is a constrained optimization problem • fixed distributionD over X x Y, where Y = {0,1} • X, Y: sets of possible documents, classes • Two sets of documents • labeled as positiveP of size n1 drawn from X for DX|Y=1 • Unlabeled U of size n2 drawn from X for DX independently • GOAL: Find the positive documents in U CIS 525: Neural Computation
Theoretical foundations • learning algorithm: selects a function f F: X {0, 1}(a class of functions) to classify unlabeled documents • probability oferror: Pr[f(X) Y]issum of “false positive” and “false negative” cases • rewritten as Pr[f(X) Y]= Pr[f(X) = 1 Y=0]+Pr[f(X) = 0 Y=1] After transforming Pr[f(X) Y] = Pr[f(X) = 1]-Pr[Y = 1] + 2Pr[f(X) = 0|Y = 1]Pr[Y = 1] CIS 525: Neural Computation
Theoretical foundations (contd..) Pr[f(X) Y] = Pr[f(X) = 1]-Pr[Y = 1] + 2Pr[f(X) = 0|Y = 1]Pr[Y = 1] • Note that Pr[Y = 1]is constant • approximation: keepingPr[f(X) = 0|Y = 1]small error Pr[f(X) = 1]-Pr[Y = 1] Pr[f(X) = 1] – const i.e. minimizingPr[f(X) = 1] minimizing error minimizingPrU[f(X) = 1]) & keeping PrP[f(X) = 1]) r • NOTHING BUT CONSTRAINT OPTIMIZATION PROBLEM Learning Possible CIS 525: Neural Computation
Naïve Bayesian Text Classification • D be set of training documents • C = {c1, c2, ...,c|C|}: predif. classes, here: c1, c2 • For diD,Pr[cj|di]: posterior probs are calculated • in NB model:class with the highest Pr[cj|di] is assigned to the document CIS 525: Neural Computation
The EM-Algorithm • Iterative algorithm for maximum likelihood estimation in problems with incomplete data • Two step method • Expectation Step • Fills in missing data • Maximization Step • Estimate parameters after the missing data is filled CIS 525: Neural Computation
Proposed Strategy • Step 1: Re-initialization • Iterative-EM: by applying EM-algorithm over P and U • Identifying a set of reliable negative documents from the unlabeled set, by introducing spies • Step 2: Building and selecting a classifier • Spy-EM: building a set of classifiers iteratively • selecting a good classifier from the set of classifiers constructed above CIS 525: Neural Computation
Iterative EM with NB-C • Assign each document in P(ositive) to class label c1 and in U(nlabeled) to class c2 • Pr[c1/ di] = 1 & Pr[c2/ di] = 0 for each di in P • Pr[c2/ dj] = 1 & Pr[c1/ dj] = 0 for each dj in U • After initial labeling, a NB-C is built and used to classify documents in U • revise posterior probabilities for documents in U • After revising, a NB-C with new posterior probs. is built • Iterative process goes on until EM converges • Setback: strongly biased towards positive documents CIS 525: Neural Computation
Step1: Re-Initialization • Sample a certain % of positive examples say “S” and put them into unlabeled set to act as “spies” • I-EM algorithm is utilized but the U(nlabeled) set now has some spy documents • After EM completes, the probabilistic labels of spies are used to decide which documents are most likely negative(LN) • threshold t used for decision making: • if Pr[c1|dj] < t: denoted as L(ikely)N(egative) • if for dj S Pr[c1|dj] > t: denoted as U(nlabeled) CIS 525: Neural Computation
positives Step-1 effect negatives BEFORE AFTER LN (likely negative) U(un-labeled) U un-labeled spies some spies P(positive) P(positive) initial situation: U = P N no clue which are P and N spies from P added to U help of spies: most positives in U get into unlabeled set, while most negatives get into LN; purity of LN higher than that of U CIS 525: Neural Computation
Step-2: S-EM • Apply EM over P, LN and U • algorithm proceeds as follows: • put all spies S back to P (where they were before) • diP: c1 (i.e. Pr[c1|di] = 1); (fixed thru iterations) • djLN: c2 (i.e. Pr[c2|dj] = 1); (changing thru EM) • dkU: initially assigned no label (will be after EM(1)) • run EM using P, LN and U until it converges • final classifier is produced when EM stops CIS 525: Neural Computation
Selecting Classifier Pr[f(X) Y] = Pr[f(X) = 1]-Pr[Y = 1] + 2Pr[f(X) = 0|Y = 1]Pr[Y = 1] • S-EM generates set of classifiers but classification is not necessarily improving • remedy: stop iterating of EM at some point • estimating the change of the probability error between iterations i and i+1 • i = Pr[fi+1(X) Y] - Pr[fi(X) Y] • if i > 0 for the first time, then ith classifier produced is the final classifier CIS 525: Neural Computation
Accuracy (of a classifier) A = m/(m+i) , where m, i are numbers of correct and incorrect decisions, respectively F-Score: F = 2pr / (p+r) is a classification performance measure Where recall r = a/(a+c) precision p = a/(a+b) The F-value reflects the average effect of both precision and recall Evaluation measures CIS 525: Neural Computation
Experiments • 30 datasets created from 2 large document corpora • objective: • recovering positive documents placed into mixed sets • for each experiment: • dividing full positive set into two subsets: P and R • P: positive set used in the algorithm with a% of the full positive set • R: set of remaining documents with b% have been put into U (not all in R put to U) CIS 525: Neural Computation
Experiments (contd…) • techniques used NB-C: applied directly to P (c1) and U(c2) to built a classifier to classify data in set U I-EM: applies EM to P and U as long as converges (no spy yet); final classifier to be applied to U to identify its positives S-EM: spies used to re-initialize; I-EM to build the final classifier; threshold t used CIS 525: Neural Computation
Experiments (contd…) • S-EM outperforms NB and I-EM in F dramatically • S-EM outperforms NB and I-EM in A as well • comment: datasets skewed, so A is not a reliable measure of classifier’s performance CIS 525: Neural Computation
Experiments (contd…) • results show great effect of re-initialization with spies: • S-EM outperforms I-EMbest • re-initialization is not, however, the only factor of improvement: • S-EM outperforms S-EM4 • conclusions: both Step-1 (reinitializing) and Step-2 (selecting the best model) are needed! CIS 525: Neural Computation
Conclusion • Gives an overview of the theory on learning with positive and unlabeled examples • Describes a two-step strategy for learning which produces extremely accurate classifiers • Partially supervised classification is most helpful when initial model is insufficiently trained CIS 525: Neural Computation
Questions? CIS 525: Neural Computation