University of Essex

Simplifying Decision Trees Learned by Genetic Programming University of Essex Alma Lilia García Almanza Edward P.K. Tsang

Motivation Problem description Scenario Method Procedure Experiment design Experimental results Outline

Outline Motivation Problem description Scenario Method Procedure Experiment design Experimental results Objective The objective of this research is to build a financial forecasting tool using Genetic Programming (GP). This research has two main goals the first is to improve the accuracy of the predictions achieved by a GP forecasting. The second goal is to reduce the rate of false positive.

Outline Motivation Problem description Scenario Method Procedure Experiment design Experimental results Problem description The training data is composed by indicators derived from financial technical analysis. The first column of the table is the signal. This is generated looking ahead in the future in a horizon of n times. in order to look for a possible increase or decrease of price in r%. Classes: Buy and No Buy Or Sell and No Sell

We propose the analysis of decision trees produced by GP, in order to explore the predictive power of their rules. Scenario method is composed by the following steps: Outline Motivation Problem description Scenario Method Procedure Experiment design Experimental results Scenario Method description • Class division • Rule extraction • Rule evaluation • Rule selection • Tree pruning

Outline Motivation Problem description Scenario Method Procedure Experiment design Experimental results Class division Class division Rule extraction Rule evaluation Rule selection Tree pruning S → <Root> <Root> → “If-then-else”,<Conjunction>|<Condition>,”Class”, “No Class” <Condition> → <Operation> , <Variable>, <Threshold> |<Variable> <Conjunction> →“And”|”Or”,<Conjunction>|<Condition>,<Conjunction>|<Condition> <Operation> →“<”, “>” <Variable> → “V1”| “V2”| .. “V3“ <Threshold> → Real number Discriminator grammar Conditional Nodes

Outline Motivation Problem description Scenario Method Procedure Experiment design Experimental results Class division Class division Rule extraction Rule evaluation Rule selection Tree pruning Trees generated by Discriminator Grammar can be expressed as: T = { R1 U R2 U . . . Rĸ} where Ri as a minimal set of conditions whose intersection satisfies T. Conditions are represented by conditional nodes. T = { R1 U R2 U R3} 2 7 10 map(T) = 2 14 18 2 14 21 R1={2 ∩ 7 ∩ 10} R2={2 ∩ 14 ∩ 18} R3={2 ∩ 14 ∩ 21}

Outline Motivation Problem description Scenario Method Procedure Experiment design Experimental results Class division Rule extraction Rule evaluation Rule selection Tree pruning Rule Evaluation TPk FPk FNk TNk Once the node map is created, we need to evaluate each of the rules, so each rule will be evaluated and its confusion matrix Mk will be calculated. TPkTrue Positive FPkFalse Positive FNkFalse Negative TNkTrue Negative Mk = 0

Outline Motivation Problem description Scenario Method Procedure Experiment design Experimental results Class division Rule extraction Rule evaluation Rule selection Tree pruning Rule Selection Best scenario of RB ∩ Rn where RB is the best rule of the tree T Where PT = TPk + FNk NT = FPk + TNk for any k

Outline Motivation Problem description Scenario Method Procedure Experiment design Experimental results Rule Selection Class division Rule extraction Rule evaluation Rule selection Tree pruning Worst scenario of RB ∩ Rn where RB is the best rule of the tree T Where PT = TPk + FNk NT = FPk + TNk for any k

Outline Motivation Problem description Scenario Method Procedure Experiment design Experimental results Class division Rule extraction Rule evaluation Rule selection Tree pruning Rule Selection According Mβk+ and Mβk- to the evaluation of the best and worst scenario are the following: Once we have calculated the worst and the best scenario for the union of Rβk we have the interval (Eval(Rβk-), Eval(Rβk+)) • Why Eval() as opposed to using training data? • Cost • Overfitting

Outline Motivation Problem description Scenario Method Procedure Experiment design Experimental results Class division Rule extraction Rule evaluation Rule selection Tree pruning Rule Selection Let’s define the probability that the new rule Rβk would be better than Rβ as follows: P(Improve Eval) = P(Eval(Rβk ) ≥ Eval(Rβ )) = The probability that Rβk would get worst than Rβkis : P(Decrease Eval) = P(Eval(Rβk) ≤ Eval(Rβ)) =

Outline Motivation Problem description Scenario Method Procedure Experiment design Experimental results Class division Rule extraction Rule evaluation Rule selection Tree pruning Tree pruning IF 0 AND1 > 24 OR 25 The tree is composed by the following rules > 2 R1= { 2, 7, 10} R2= { 2, 14, 18} R3= { 2, 14, 21} OR 5 R1= { 2, 7, 10} R2= { 2, 14, 21} AND6 Var23 Var34 AND 13 Where the numbers represent the node of the condition > 7 > 10 > 14 OR 17 Var28 .439 Var311 .6712 Var215 Var516 > 18 > 21 Procedure Var719 .6820 Var922 Var323 • Evaluate every decision rule • Consider that the R2 is not contributing to the classification task. Thus, we analyze every condition in R2 to determine which conditions are involved in other rules • The only condition that is not involved in the other rules is 18 • Remove the condition

Outline Motivation Problem description Scenario Method Procedure Experiment design Experimental results Experiment design To test our approach in different stages of the evolutionary process, populations from different points of the evolution were created. A population of 1,000 individuals was created, it was evolved during 100 generations. Every twenty generations the whole population was saved, the result was five populations of 1,000 individuals each, let us call them P20, P40, · · · , P100. The procedure was repeated 25 times. Finally the experiment results were grouped and averaged by generation and pruning threshold.

Outline Motivation Problem description Scenario Method Procedure Experiment design Experimental results Experiment design Financial indicators that were used as variables in the training/testing data

Outline Motivation Problem description Scenario Method Procedure Experiment design Experimental results Experiment design GP parameters

Outline Motivation Problem description Scenario Method Procedure Experiment design Experimental results Experiment results SM was tested using different pruning thresholds (PT) over populations in generations 20, 40, 60, 80, 100

Questions

University of Essex