240 likes | 432 Views
Breeding Decision Trees Using Evolutionary Techniques. Papagelis Athanasios - Kalles Dimitrios Computer Technology Institute & AHEAD RM. Introduction. We use GAs to evolve simple and accurate binary decision trees Simple genetic operators over tree structures
E N D
Breeding Decision Trees Using Evolutionary Techniques Papagelis Athanasios - Kalles DimitriosComputer Technology Institute & AHEAD RM
Introduction • We use GAs to evolve simple and accurate binary decision trees • Simple genetic operators over tree structures • Experiments with UCI datasets • very good size • competitive accuracy results • Experiments with synthetic datasets • Superior accuracy results
Current tree induction algorithms… • .. Use greedy heuristics • To guide search during tree building • To prune the resulting trees • Fast implementations • Accurate results on widely used benchmark datasets (like UCI datasets) • Optimal results ? • No • Good for real world problems? • There are not many real world datasets available for research.
More on greedy heuristics • They can quickly guide us to desired solutions • On the other hand they can substantially deviate from optimal • WHY? • They are very strict • Which means that they are VERY GOOD just for a limited problem space
Why GAs should work ? • GAs are not • Hill climbers • Blind on complex search spaces • Exhaustive searchers • Extremely expensive • They are … • Beam searchers • They balance between time needed and space searched • Application on bigger problem space • Good results for much more problems • No need to tune or derive new algorithms
Another way to see it.. • Biases • Preference bias • Characteristics of output • We should choose about it • e.g small trees • Procedural bias • How we will search? • We should not choose about it • Unfortunately we have to: • Greedy heuristics make strong hypotheses about search space • GAs make weak hypotheses about search space
The real world question… • Are there datasets where hill-climbing techniques are really inadequate ? • e.g unnecessarily big – misguiding output • Yes there are… • Conditionally dependent attributes • e.g XOR • Irrelevant attributes • Many solutions that use GAs as a preprocessor so as to select adequate attributes • Direct genetic search can be proven more efficient for those datasets
The proposed solution • Select the desired decision tree characteristics (e.g small size) • Adopt a decision tree representation with appropriate genetic operators • Create an appropriate fitness function • Produce a representative initial population • Evolve for as long as you wish!
Initialization procedure • Population of minimum decision trees • Simple and fast • Choose a random value as test value • Choose two random classes as leaves A=2 Class=2 Class=1
Payoff function • Balance between accuracy and size • set x depending on the desired output characteristics. • Small Trees ? x near 1 • Emphasis on accuracy ? x grows big
Advanced System Characteristics • Scalled payoff function (Goldberg, 1989) • Alternative crossovers • Evolution towards fit subtrees • Accurate subtrees had less chance to be used for crossover or mutation. • Limited Error Fitness (LEF) (Gathercole & Ross, 1997) • significant CPU timesavings and insignificant accuracy loses
Second Layer GA • Test the effectiveness of all those components • coded information about the mutation/crossover rates and different heuristics as well as a number of other optimizing parameters • Most recurring results: • mutation rate 0.005 • crossover rate 0.93 • use a crowding avoidance technique • Alternative crossover/mutation techniques did not produce better results than basic crossover/mutation
Search space / Induction costs • 10 leaves,6 values,2 classes • Search space >50,173,704,142,848(HUGE!) • Greedy feature selection • O(ak) a=attributes,k=instances (Quinlan 1986) • O(a2k2)one level lookahead (Murthy and Salzberg, 1995) • O(adkd) for d-1 levels of lookahead • Proposed heuristic • O(gen* k2*a). • Extended heuristic • O(gen*k*a)
How it works? An example (a) • An artificial dataset with eight rules (26 possible value, three classes) • First two activation-rules as below: • (15.0 %) c1 A=(a or b or t) & B=(a or h or q or x) • (14.0%) c1 B=(f or l or s or w) & C=(c or e or f or k) • Huge Search Space !!!
Illustration of greedy heuristics problem • An example dataset (XOR over A1&A2)
A1=t A2=f A2=t t f f t C4.5 result tree A3=t A1=t A2=f A2=t t f f t Totally unacceptable!!!
C4.5 / OneR deficiencies • Similar preference biases • Accurate, small decision trees • This is acceptable • Not optimized procedural biases • Emphasis on accuracy (C4.5) • Not optimized tree’s size • Emphasis on size (OneR) • Trivial search policy • Pruning as a greedy heuristic has similar disadvantages
Average needed re-classification Future work • Minimize evolution time • crossover/mutation operators change the tree from a node downwards • we can classify only the instances that belong to the changed-node’s subtree. • But we need to maintain more node statistics
Future work (2) • Choose the output class using a majority vote over the produced tree forest (experts voting) • Pruning is a greedy heuristic • A GA’s pruning?