910 likes | 1.01k Views
Novel representations and methods in text classification . Manuel Montes, Hugo Jair Escalante Instituto Nacional de Astrofísica, Óptica y Electrónica, México. http://ccc.inaoep.mx/~mmontesg / http://ccc.inaoep.mx/~hugojair/ { mmontesg , hugojair } @ inaoep.mx
E N D
Novel representations and methods in text classification Manuel Montes, Hugo Jair Escalante Instituto Nacional de Astrofísica, Óptica y Electrónica, México.http://ccc.inaoep.mx/~mmontesg/ http://ccc.inaoep.mx/~hugojair/ {mmontesg, hugojair}@inaoep.mx 7th Russian Summer School in Information RetrievalKazan, Russia, September 2013
Overview of thecourse • Day 1: Introductiontotextclassification • Day 2: Bag of conceptsrepresentations • Day 3: Representationsthatincorporatesequential and syntacticinformation • Day 4: Methodsfor non-conventionalclassification • Day 5: Automatedconstruction of classificationmodels
Novel represensations and methods in textclassification Automated construction of classification models: full model selection
Outline • Patternclassification • Modelselection in a broadsense: FMS • Relatedworks • PSO for full modelselection • Experimental results and applications • Conclusions and futureworkdirections
Pattern classification Answers 1 (is a car) Training data Learning machine Trained machine -1 (is not a car) Queries
Applications • Natural language processing, • Computer vision, • Robotics, • Information technology, • Medicine, • Science, • Entertainment, • ….
Instances Example: galaxyclassificationfromimages Features … Featureselection Classificationmethod Data Data Preprocessing Training model Modelevaluation ERROR [ ] ?
Some issues with the cycle of design • How to preprocess the data? • What method to use? Featureselection Classificationmethod Data Data Preprocessing Training model Modelevaluation
Some issues with the cycle of design • What feature selection method to use? • How many features are enough? • How to set the hyperparameters of the FS method? Featureselection Classificationmethod Data Data Preprocessing Training model Modelevaluation
Some issues with the cycle of design • What learning machine to use? • How to set the hyperparameters of the chosen method? Featureselection Classificationmethod Data Data Preprocessing Training model Modelevaluation
Some issues with the cycle of design • What combination of methods works better? • How to evaluate model’s performance • How to avoid overfitting? Featureselection Classificationmethod Data Data Preprocessing Training model Modelevaluation
Some issues with the cycle of design • The above issues are usually fixed manually: • Domain expert’s knowledge • Machine learning specialists’ knowledge • Trial and error • The design/development of a pattern classification system relies on the knowledge and biases of humans, which may be risky, expensive and time consuming • Automated solutions are available but only for particular processes (e.g., either feature selection, or classifier selection but not both) Is it possible to automate the whole process?
Full model selection Full modelselection Data H. J. Escalante. Towards a ParticleSwarmModelSelectionalgorithm. Multi-levelinferenceworkshop and modelselectiongame, NIPS, Whistler, VA, BC, Canada, 2006. H. J. Escalante, E. Sucar, M. Montes. ParticleSwarmModelSelection, In Journal of Machine LearningResearch, 10(Feb):405--440, 2009.
Full model selection • Given a set of methods for data preprocessing, feature selection and classification select the combination of methods (together with their hyperparameters) that minimizes an estimate of classification performance H. J. Escalante, E. Sucar, M. Montes. ParticleSwarmModelSelection, In Journal of Machine LearningResearch, 10(Feb):405--440, 2009.
Full model selection • Full model: A model composed of methods for data preprocessing, feature selection and classification • Example: chain { 1:standardize center=1 2:svcrfe svckernel linear coef0=0 degree=1 gamma=0 shrinkage=0.001 f_max=Inf 3:pc_extract f_max=2000 4:svm kernel linear C=Infridge=1e-013 balanced_ridge=0 nu=0 alpha_cutoff=-1 b0=0 nob=0 } chain { 1:normalize center=0 2:relief f_max=Infw_min=-Infk_num=4 3:neural units=10 shrinkage=1e-014 balance=0 maxiter=100 }
Full model selection • Pros • The job of the data analyst is considerably reduced • Neither knowledge on the application domain nor on machine learning is required • Different methods for preprocessing, feature selection and classification are considered • It can be used in any classification problem • Cons • It is real function + combinatoric optimization problem • Computationally expensive • Risk of overfitting
Novel represensations and methods in textclassification Overview of related works
Modelselectionviaheuristicoptimization • A single modelisconsidered and theirhyperparameters are optimizedviaheuristicoptimization: • Swarmoptimization, • Geneticalgorithms, • Patternsearch, • Geneticprogramming • …
GEMS • GEMS (Gene Expression Model Selection) is a system for automated development and valuation of high-quality cancer diagnostic models and biomarker discovery from microarray gene expression data A. Statnikov, I. Tsamardinos, Y. Dosbayev, C.F. Aliferis. GEMS: A SystemforAutomatedCancer Diagnosis and BiomarkerDiscoveryfromMicroarray Gene Expression Data. International Journal of MedicalInformatics, 2005 Aug;74(7-8):491-503. N. Fanananapazir, A. Statnikov, C.F. Aliferis. TheFast-AIMS ClinicalMassSpectrometryAnalysisSystem. Advaces in Bioinformatics, 2009, Article ID 598241.
GEMS • The user specifies the models, and methods to be considered • GEMS explores all of the combinations of methods, using grid search to optimize hyperparameters
GEMS • The user specifies the models, and methods to be considered • GEMS explores all of the combinations of methods, using grid search to optimize hyperparameters
Modeltypeselectionforregression • Geneticalgorithms are usedfortheselection of modeltype (learningmethod, featureselection, preprocessing) and parameteroptimizationforregressionproblems http://www.sumo.intec.ugent.be/ D. Gorissen, T. Dhaene, F. de Turck. Evolutionary Model Type Selection for Global Surrogate Modeling. In Journal of Machine LearningResearch, 10(Jul):2039-2078, 2009
Meta-learning: learningtolearn • Evaluates and compares the application of learning algorithms on (many) previous tasks/domains to suggest learning algorithms (combinations, rankings) for new tasks • Focuses on the relation between tasks/domains and learning algorithms • Accumulating experience on the performance of multiple applications of learning methods Brazdil P., Giraud-Carrier C., Soares C., Vilalta R. Metalearning: Applicationsto Data Mining. SpringerVerlag. ISBN: 978-3-540-73262-4, 2008. Brazdil P., Vilalta R, Giraud-Carrier C., Soares C.. Metalearning. Encyclopedia of Machine learning. Springer, 2010.
Meta-learning: learningtolearn Brazdil P., Giraud-Carrier C., Soares C., Vilalta R. Metalearning: Applicationsto Data Mining. SpringerVerlag. ISBN: 978-3-540-73262-4, 2008. Brazdil P., Vilalta R, Giraud-Carrier C., Soares C.. Metalearning. Encyclopedia of Machine learning. Springer, 2010.
Google prediction API • “Machine learning as a service in the cloud” • Upload your data, train a model and perform queries • Nothing is for free! https://developers.google.com/prediction/
Google prediction API Your data become property of Google!
Novel represensations and methods in textclassification PSMS: our approach to full model selection
Full model selection Full modelselection Data
PSMS: Ourapproachtofull modelselection • Particle swarm model selection: Use particle swarm optimization for exploring the search space of full models in a particular ML-toolbox Normalize + RBF-SVM (γ = 0.01) Neural Net ( 3 units) PCA + Neural-Net (10 h. units) Normalize + PolySVM (d= 2) s2n (feat. Sel.) +K-ridge Relief (feat. Sel.) +Naïve Bayes H. J. Escalante. Towards a ParticleSwarmModelSelectionalgorithm. Multi-levelinferenceworkshop and modelselectiongame, NIPS, Whistler, VA, BC, Canada, 2006. H. J. Escalante, E. Sucar, M. Montes. ParticleSwarmModelSelection, In Journal of Machine LearningResearch, 10(Feb):405--440, 2009.
Particle swarm optimization • Population-based search heuristic • Inspired on the behavior of biological communities that exhibit local and social behaviors A. P. Engelbrecht. Fundamentals of Computational Swarm Intelligence. Wiley,2006
Particle swarm optimization • Each individual (particle) i has: • A position in the search space (Xit), which represents a solution to the problem at hand, • A velocity vector (Vit), which determines how a particle explores the search space • After random initialization, particles update their positions according to: A. P. Engelbrecht. Fundamentals of Computational Swarm Intelligence. Wiley,2006
Particle swarm optimization • Randomly initialize a population of particles (i.e., the swarm) • Repeat the following iterative process until stop criterion is meet: • Evaluate the fitness of each particle • Find personal best (pi) and global best (pg) • Update particles • Update best solution found (if needed) • Return the best particle (solution) Fitness value ... ... Init t=1 t=2 t=max_it
PSMS : PSO for full model selection • Set of methods (not restricted to this set) Classification Feature selection Preprocessing http://clopinet.com/CLOP
PSMS : PSO for full model selection • Codification of solutions as real valued vectors Preprocessingbeforefeatureselection? Hyperparameters fortheselectedmethods Choice of methodsforpreprocessing, featureselection and classification
PSMS : PSO for full model selection • Fitness function: • K-fold cross-validation balanced error rate • K-fold cross-validation area under the ROC curve Train Error fold 1 Test Error fold 2 CV estimate Error fold 3 Error fold 4 Training data Error fold 5
Novel represensations and methods in textclassification Some experimental results of PSMS
PSMS in the ALvsPK challenge • Five data sets for binary classification • Goal: to obtain the best classification model for each data set • Two tracks: • Prior knowledge • Agnostic learning http://www.agnostic.inf.ethz.ch/
PSMS in the ALvsPK challenge • Best configuration of PSMS: Comparison of the performance of modelsselectedwith PSMS withthatobtainedbyothertechniques in theALvsPKchallenge Modelsselectedwith PSMS forthedifferent data sets http://www.agnostic.inf.ethz.ch/
PSMS in the ALvsPK challenge • Official ranking: http://www.agnostic.inf.ethz.ch/
Some results in benchmark data • Comparison of PSMS and pattern search
Some results in benchmark data • Comparison of PSMS and pattern search
Some results in benchmark data • Comparison of PSMS and pattern search
PSMS: Interactive demo http://clopinet.com/CLOP Isabelle Guyon, Amir Saffari, Hugo Jair Escalante, GokanBakir, and GavinCawley, CLOP: a Matlab LearningObjectPackage. NIPS 2007 Demonstrations, Vancouver, British Columbia, Canada 2007.
Novel represensations and methods in textclassification PSMS: Applications and extensions
Authorship verification The task of deciding whether given text documents were or were not written by a certain author (i.e., abinary classification problem) Applications include: fraud detection, spam filtering, computer forensics and plagiarism detection
Authorship verification Sample documents from author “X” Feature extraction Model construction and testing Prediction Dear Mary, I have been busy in the last few weeks, however, I would like to ….…. … … This paper describes an novel approach to region labeling by using a Markov random field model …. This document was not written by “X” Model for author “X” Sample documents from other authors than “X” Hereby I am sending my application for the PhD position offered in your research group … Hi Jane, how things are going on in Mexico; I hope you are not having problems because … Our paper introduces a novel method for … … … Today in Brazil an Air France plane crashed down, during a storm…. Unseen document