Automated Text Categorization: The Two-Dimensional probability Mode

Automated Text Categorization: The Two-Dimensional probability Mode Abdulaziz alsharikh

Agenda • Introduction • Background on ATC • The Two-Dimensional Model • Probabilities • Peculiarity • Document Coordinates • Experiments • Results Analyses • Critiques

Introduction • What is ATC? • Build a classifier by observing the properties of the set of pre-classified documents. (Naïve Bayes Model) • Simple to implement • Gives remarkable accuracy • Gives directions for SVM (parameter tuning) • 2DPM starts from different hypotheses from NB: • Terms are seen as disjoint event • Documents are seen as union of these events • Visualization tool for understanding the relationships between categories • Helps users to visually audit the classifier identify suspicious training data.

Background on ATC • Set of categories • Set of documents • D X C {T, F} • Multi Vs Single label categorization • CSVi : D [0,1] • Degree of membership of document • Binary categorization • C categories and D documents to assigned to (c) or it is complement • Document Dtr for classifier and Dte to measure performance

The Two-Dimensional Model • Presenting the document on a 2-d Cartesian plan • Based on two parameters • Presence and expressiveness , measuring the frequency of the term in the document of all the categories. • Advantages • No explicit need for feature to reduce dimensionality • Limited space required to store objects compare to NB • Less computation cost for classifier training.

Basic probabilities • Given C = {c1, c2, …..} • Given V = {t1, t2, …..} • Find Ω={(tk, ci)}

Basic probabilities • Probability of a term given category or a set of categories • For set of categories

Basic probabilities • Probability of set of terms given category or a set of categories • And ..

Basic probabilities • “Peculiarity” of a tem, chosen a set of categories • It is the probability of finding a term in a document and the probability of not finding the same term in the document complement. • Presence (how frequently the term is in the categories) • expressiveness (how distinctive the term is for that set of categories) • It is useful when finding the probability of complements of sets

Basic probabilities • Probability of a term given complementary sets of categories

Basic probabilities • Probability of a term having chosen a sets of categories • Indicates finding a term from set of documents and not finding the same term in the complement of the set of documents

2-D Representation and Categorization • Breaking down the expression to plot it • Natural way to assign d to c • With threshold to improve the separation (q) • With angular coefficient (m) • To find the probability of set of terms in a set of categories

Experiments • Dataset • Reuters-21578 (135 potential categories) • First experiment with 10 most frequent categories • Second excrement with 90 most frequent categories • Reuters collection volume 1 • 810,000 newswire (Aug. 1996 to Aug. 1997) • 30,000 training documents • 36,000 test documents

Experiments • Pre-processing • Removing all the punctuation and convert to lower case • Removing the most frequent terms of the English language • K-fold cross validation (k=5) for FAR alg. For best separation • The recall and precision are computer for each category and combining them into a single measure (F1) • Marco-averaging and micro-averaging is computed for each of the previous measurements

Analyses and Results • Comparing 2DPM with NB multinomial • Case 1 : NB performs better than 2DPM • Case 2 : almost the same but the macro-avar. Is halved • Case 3 : same case 2 but macro-avar. increased • Case 4 : NB performs better than 2DPM in micro but worst in macro.

Analyses and Results

Conclusion • Plotting makes the understanding of the classifier decision easy • Rotation of the decision line would result in better separation of the two classes. (Focused Angular Region) • 2DPM performance is at least equal to NB model • 2DPM is better with macro-avar. F1

Critiques • The paper was not clear in directing the reading to the final results • There are many cases of getting the probability but it does not show how to use them • The paper focused on explaining the theoretical side, while the results and analysis part is almost 10% • The main algorithm that the paper depends on was just mentioned with no enough explanation (focused angular region)

Thank you

Automated Text Categorization: The Two-Dimensional probability Mode