Machine Learning

Machine Learning Decision Trees. Exercise Solutions

Exercise 1 a) Machine learning methods are often categorised in three main types: supervised, unsupervised and reinforcement learning methods. Explain these in not more than a sentence each and explain in which category does Decision Tree Learning fall and why?

Answer • Supervised learning is learning with a teacher, i.e. input-output examples are given to the system in the training phase. After training the system is asked to predict the output from new inputs. E.g. classification • Unsupervised learning is in fact learning for structure discovery with no teacher. Only input data are seen in both the training and the testing phase. E.g. ICA, clustering. • Reinforcement learning is learning with no teacher but with feedback from the environment. The feedback consists of rewards, which are typically delayed. E.g. Q-learning. Decision Trees are supervised learning methods.They do classification based on given examples.

c) For the sunbathers example given in the lecture, calculate the Disorder function for the attribute ‘height’ at the root node.

is_sunburned Height Short Tall Average Dana Pete Alex Annie Katie Sarah Emily John Disorder of height

Disorder of height (contd) Alex Annie Katie Sarah Emily John

Exercise 2 • For the sunbathers example given in the lecture, calculate the Disorder function associated with the possible branches of the decision tree once the root node (hair colour) has been chosen.

is_sunburned Answer: 1st branch Hair colour Sarah AnnieDana Katie Blonde Height Weight Lotion used Short Tall Average Light Average Yes No Sarah Annie Dana Katie Annie Katie Sarah Dana Sarah Katie AnnieDana 0.5 0 1.0

So in this branch (1st branch) we found the “Lotion Used” is the next attribute to split on • We also found that by doing that this branch is done. • The method of computation for the other 2 branches (red and brown) is exactly the same.

Exercise 3 • Using the decision tree learning algorithm, calculate the decision tree for the following data set

Data for Exercise 3

Ex 3: Search for Root. Candidate: Hair Colour is_sunburned Hair colour Brown Blonde Sarah AnnieDana Julie Ruth Alex Pete John Av Disorder = (5/8)* 0.971 = 0.6069

Ex 3: Search for Root. Candidate: Height is_sunburned Height Short Tall Average Alex Annie Sarah Julie John Ruth Dana Pete Av Disorder = ¼ + 1/2 * 0.8113 + 0 = 0.655

Ex 3: Search for Root. Candidate: Weight is_sunburned Weight Light Heavy Average Sarah Julie Ruth Pete John Dana Alex Annie Av Disorder = 2*(3/8)*0.9183 = 0.6887

Ex 3: Search for Root. Candidate: Lotion is_sunburned Lotion used Sarah Annie Julie Pete John Ruth No Yes Dana Alex Av Disorder =(3/4)*0.9183 = 0.6887

is_sunburned Hair colour Sarah AnnieDana Julie Ruth Brown Blonde No ? ? ? Lotion used Height Weight Yes No Short Tall Heavy Light Av Av Sarah Annie Julie Ruth Annie Sarah Julie Ruth Dana Sarah Julie Ruth Dana Annie No Ex 3: Next Dana

is_sunburned Ex 3: Next Hair colour Blonde Brown Height No Short Tall Av No Yes Sarah Julie Ruth No further split will improve the classification accuracy on the training data. We can assign a decision to this leaf node based on the majority. That gives a ‘No’.

Machine Learning