120 likes | 259 Views
Opinion Mining and Topic Categorization with Novel Term Weighting. Roman Sergienko , Ph.D student Tatiana Gasanova , Ph.D student Ulm University, Germany Shaknaz Akhmedova , Ph.D . student Siberian State Aerospace University, Krasnoyarsk , Russia. Contents. Motivation Databases
E N D
Opinion Mining and Topic Categorization with Novel Term Weighting Roman Sergienko, Ph.Dstudent Tatiana Gasanova, Ph.Dstudent Ulm University, Germany ShaknazAkhmedova, Ph.D. student Siberian State Aerospace University, Krasnoyarsk, Russia
Contents • Motivation • Databases • Text preprocessingmethods • The noveltermweightingmethod • Features selection • Classificationalgorithms • Resultsofnumericalexperiments • Conclusions
Motivation • The goaloftheworkisto evaluate the competitiveness of the novel term weighting in comparison with the standard techniques for opining mining and topic categorization. • The criteria are: • Macro F-measure for the test set • Computational time
The existing text preprocessing methods • Binary preprocessing • TF-IDF (Salton and Buckley, 1988) • Confident Weights (Soucy and Mineau, 2005)
The novel term weighting method L – the number of classes; ni – the number of instances of the i-th class; Nji – the number of j-th word occurrence in all instances of thei-thclass; Tji=Nji/ni– the relative frequency of j-th word occurrence in the i-th class; Rj=maxiTji,Sj=arg(maxiTji) – the number of class which we assign to j-th word.
Features selection • Calculating a relative frequency for each word in the each class • Choice for each word the class with the maximum relative frequency • For each classification utterance calculating sums of weights of words which belong to each class • Number of attributes = numberofclasses
Classification algorithms • k-nearest neighbors algorithm with distance weighting (we have varied k from 1 to 15); • kernel Bayes classifier with Laplace correction; • neural network with error back propagation (standard setting in RapidMiner); • Rocchio classifier with different metrics and parameter; • support vector machine (SVM) generated and optimized with Co-Operation of Biology Related Algorithms (COBRA) (Akhmedova and Semenkin, 2013).
Computational effectiveness DEFT’08 DEFT’07
Conclusions • The novel term weighting method gives similar or better classification quality than the ConfWeight method but it requires the same amount of time as TF-IDF.