230 likes | 244 Views
ECML PKDD 2008 15-19 September 2008, Antwerp, Belgium. A Genetic Algorithm for Text Classification Rule Induction. A.Pietramala 1 , V.Policicchio 1 , P.Rullo 1,2 , I.Sidhu 3 Universit à della Calabria (Rende, Italy) {a.pietramala,policicchio,rullo}@mat.unical.it Exeura Srl (Rende, Italy)
E N D
ECML PKDD 200815-19 September 2008, Antwerp, Belgium A Genetic Algorithm for Text Classification Rule Induction A.Pietramala1, V.Policicchio1, P.Rullo1,2, I.Sidhu3 Università della Calabria (Rende, Italy){a.pietramala,policicchio,rullo}@mat.unical.it Exeura Srl (Rende, Italy) Kenetica Ltd (Chicago, IL-USA) {isidhu}@computer.org
Outline • Motivations • The Olex Hypothesis Language • The Genetic Algorithm Approach (Olex-GA) • Experimental Results and Comparative Evaluation • Discussions • Conclusions and Future Work
Motivations • Rule learning algorithms have become a successful strategy for classifier induction. • Rule-based classifiers provide the desirable property of being readable and, thus, easy to understand (and, possibly, modify). • Genetic Algorithms (GAs) are stochastic search methods inspired to the biological evolution. • GAs show the capability to provide good solutions for classical optimization tasks (e.g. TSP and Knapsack)
Rule Induction and GAs • Rule induction is one of the application fields of GAs. The basic idea is that: • Each individual in the population represents a candidate solution (a classification rule or a classifier) • The fitness of an individual is evaluated in terms of the predictive accuracy. • We propose presents a GA approach, called Olex-GA, for the induction of rule-based text classifiers.
Hc (Pos,Neg) Pos Neg Olex-GA - The hypothesis language • A classifier Hc (Pos,Neg) is of the form: “if any of the terms t1,…,tn occurs in d and none of the terms tn+1,…,tn+m occurs in d, then classify d under category c”
Olex-GA The hypothesis language • The terms in Pos and Neg are chosen among the ones belonging to the local vocabulary: • Intuitively, Vc(k, f ) is the set of the best k terms for category c according to a given scoring function f.
Olex-GAProblem statement • The Olex-GA’s learning problem is stated as an optimization problem: PROBLEM MAX-F Let a category c Cand a vocabulary V (k, f) over the training set TSbe given. Then, find two subsets of V (k, f),Pos = {t1,…,tn }and Neg = {tn+1,…,tn+m } with Pos ≠ Ø, such that Hc (Pos, Neg) applied to TS yields a maximum value of Fc,(over TS), for a given [0,1]. • Problem MAX-F is NP-Hard.
Olex-GA A Genetic Algorithm to Solve MAX-F • Problem MAX-F is a combinatorial optimization problem aimed at finding a best combination of terms taken from a given vocabulary. • MAX-F is a typical problem for which GAs are known to be a good candidate resolution method.
GA-OlexOur implementation of GA • In the following, we describe our choices concerning: • Population Encoding • Fitness Function • Evolutionary Operators
Given a vocabulary EXAMPLE Hc K 1 0 1 0 0 0 1 0 1 0 t1 t2 t3 t4 t5 t1 t2 t3 t4 t5 Olex-GAPopulation Encoding • Each individual represents an entire classifier. • An individual is simply a binary representation of the sets Pos and Neg of a classifier Hc (Pos, Neg).
Olex-GA Population Encoding • We restrict the search of both positive and negative terms, respectively, to: • Pos*, the set of terms belonging to Vc(k, f ) (candidate positive terms); • Neg*, the set of terms which occur in any document containing some candidate positive term and not belonging to the training set TSc of c (candidate negative terms). • The reduction of search space allows: • an improvement of the algorithm efficiency • a quick convergence toward good solutions
Olex-GAFitness Function • The fitness of a chromosome K, representing Hc(Pos,Neg) is the value of the F-measure resulting from applying Hc(Pos,Neg)to the training set TS. • This choice naturally follows from the formulation of problem MAX-F.
Olex-GAEvolutionary Operators • We perform: • selection via the roulette-wheel method, • crossover by the uniform crossoverscheme. • mutation, which consists in the flipping of each single bit with a given (low) probability. • elitism, in order to ensure that the best individuals of the current generation are passed to the next one without being altered
Olex-GAExperimentation We have experimentally evaluated our algorithm on two standard benchmark corpora: • REUTERS-21578 (R10) • It consists of 12,902 documents • They are manually classified with respect to 135 categories. We have considered the subset of the 10 most populated categories. • OHSUMED • We used the collection consisting of the first 20,000 documents from the 50,216 medical abstracts of the year 1991. • The classification scheme consisted of the 23 MeSH disease categories.
Experimental settings • We applied the stratifiedholdout method: REUTERS: • ModApté split : 9603 documents are used to form the training corpus (seen data) and 3299 to form the test set (unseen data). OHSUMED: • The first 10,000 were used as seen dataand the second 10,000 as unseen data. In both cases, we have randomly split the set of seen data into a • training set(70%), on which to run the GA • and a validation set(30%), on which tuning the model parameters.
Experimental settings • GA Parameters: • For each chromosome Kin the population, we initialized K+ at random, while we set K¡-[t] = 0, for each t Neg*(thus, Kinitially encodes a classifier Hc(Pos,Neg) with no negative terms).
Comparative Evaluation • On both corpora, we carried out a direct comparison with the following systems: • SVM (both polynomial and radial basis function) • Ripper (with two optimization steps) • C4.5 • Naive Bayes • Olex-Greedy • The performances were evaluated using the Weka library of ML algorithms (apart from Olex-Greedy).
Performance Comparison on Reuters • Efficacy • SVMpoli > SVMrbf > Ripper ≈ Olex-GA> C45 > Olex-Greedy > NB • Efficiency • NB > Olex-Greedy > SVMpoli > Olex-GA > C45 > SVMrbf > Ripper
Performance Comparison on OHSUMED • Efficacy • Olex-GA> Ripper > SVMpoli > Olex-Greedy > SVMrbf ≈ NB > C45 • Efficiency • NB > Olex-Greedy > SVMpoli > Olex-GA > C45 > SVMrbf > Ripper
Discussions – Relationto other inductive rule learners • Conventional Rule Learners (Ripper, C4.5): • Usually rely on a two-stage process: rule induction and rule pruning. • Each of the above step in turn consists of several steps • Olex-GA relies on a a single-step process which does not need any post-induction optimization. • With respect to Olex-Greedy, Olex-GA provides better predictive accuracy, but is less efficient.
Conclusions • Olex-GA encodes a classifier, in a very natural and compact way, as an individual • Fitness of an individual is evaluated as the F-measure of the encoded classifiers • Experimental results point out: • Olex-GA quickly converges to very accurate classifiers; • Olex-GA performs at a competitive level with standard algorithms; • Time efficiency is lower than Olex-Greedy but higher than the other rule learning methods, such as Ripper and C45.
Future work • Extension of the proposed technique to deal with classifiers of the form where each Ti is a conjunction of “simple” terms: