1 / 101

Machine Learning

Machine Learning. Georg Pölzlbauer December 11, 2006. Outline. Exercises Data Preparation Decision Trees Model Selection Random Forests Support Vector Machines. Exercises. Groups of 2 or 3 students

clint
Download Presentation

Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning Georg Pölzlbauer December 11, 2006

  2. Outline • Exercises • Data Preparation • Decision Trees • Model Selection • Random Forests • Support Vector Machines

  3. Exercises • Groups of 2 or 3 students • UCI ML Repository: pick 3 data sets (different characteristics, i.e. number of samples, number of dimensions, number of classes) • Estimate classification error with 3 classifiers of choice; compare results • Estimate appropriate parameters for these classifiers • Implement in Matlab, R, WEKA, YALE, KNIME

  4. Exercises: Software • Matlab • YALEhttp://rapid-i.com/ • WEKAhttp://www.cs.waikato.ac.nz/ml/weka/ • KNIMEhttp://www.knime.org/ • Rhttp://www.r-project.org/

  5. Exercises: Software • WEKA: recommended; easy to use, easy to learn, no programming • KNIME, YALE: also easy to use • R: most advanced and powerful software; do not use if you do not know R really well! • Matlab: not recommended; requires installation of packages from internet etc.

  6. Exercises: Written Report • Report should be 5-10 pages • Discuss characteristics of data sets (i.e. handling of missing values, scaling etc.) • Summarize classifiers used (one paragraph each) • Discuss experimental results (tables, figures) • Do not include code in report

  7. Exercises: How to proceed • It is not necessary to implement anything; rely on libraries, modules etc • UCI ML Repository:http://www.ics.uci.edu/~mlearn/MLSummary.html • Import data file, scale data, apply model selection, write down any problems/findings

  8. Grading • No written/oral exam • End of January submission of report • Ca. 15 minutes discussion of results and code (individually for each group) • Grading bonus: Use of sophisticated models, detailed comparision of classifiers, thorough discussion of experiments, justification of choices

  9. Questions? • Questions regarding theory: • poelzlbauer@ifs.tuwien.ac.at • musliu@dbai.tuwien.ac.at • Questions regarding R, Weka, …: • Forum

  10. Machine Learning: Setting

  11. Machine Learning: Setting Train ML Model

  12. Machine Learning: Setting Train ML Model

  13. Data Preparation • -> Example adult census data • Table format data (data matrix) • Missing values • Categorical data • Quantitative (continuous) data with different scales

  14. Categorical variables • Non-numeric variables with a finite number of levels • E.g. "red", "blue", "green" • Some ML algorithms can only handle numeric variables • Solution: 1-to-N coding

  15. 1-to-N Coding

  16. Scaling of continuous variables • Many ML algorithms rely on measuring the distance between 2 samples • There should be no difference if a length variable is measured in cm, inch, or km • To remove the unit of measure (e.g. kg, mph, …) each variable dimension is normalized: • subtract mean • divide by standard deviation

  17. Scaling of continuous variables • Data set now has mean 0, variance 1 • Chebyshev's inequality: • 75% of data between -2 and +2 • 89% of data between -3 and +3 • 94% of data between -4 and +4

  18. Household income $10.000 $200.000 very low low average high very high Output variables • ML requires categorical output (continuous output = regression) • ML methods can be applied by binning continuous output (loss of prediction accuracy)

  19. Binary Decision Trees • Rely on Information Theory (Shannon) • Recursive algorithm that splits feature space into 2 areas at each recursion step • Classification works by going through the tree from the root node until arriving at a leaf node

  20. Decision Trees: Example

  21. Information Theory, Entropy • Introduced by Claude Shannon • Applications in data compression • Concerned with measuring actual information vs. redundancy • Measures information in bits

  22. What is „Entropy“? • In Machine Learning, Entropy is a measure for the impurity of a set • High Entropy => bad for prediction • High Entropy => needs to be reduced (Information Gain)

  23. Calculating H(X)

  24. H(X): Case studies p(xred) p(xblue) H(X) I 0.5 0.5 1 II 0.3 0.7 0.88 III 0.7 0.3 0.88 IV 0 1 0

  25. H(X): Relative vs. absolute frequencies vs. => H(XI) = H(XII) Only relative frequencies matter!

  26. Information Gain Information Gain: Sets that minimize Entropy by largest amount Given a set and a choice between possible subsets, which one is preferable? H(X) = 1

  27. Informatin Gain (Properties) • IG is at most as large as the Entropy of the original set • IG is the amount by which the original Entropy can be reduced by splitting into subsets • IG is at least zero (if Entropy is not reduced) • 0 <= IG <= H(X)

  28. Building (binary) Decision Trees • Data set: categorical or quantitative variables • Iterate variables, calculate IG for every possible split • categorical variables: one variable vs. the rest • quantitative variables: sort values, split between each pair of values • recurse until prediction is good enough

  29. Decision Trees: Quantitative variables 0.07 0.00 0.01 0.03 0.08 0.03 0.00 0.00 0.01 0.13 0.06 original H: 0.99 0.17 0.01 0.11 0.43 0.26 0.06 0.13 0.05 0.29 0.28 0.09 0.16

  30. Decision Trees: Quantitative variables

  31. Decision Trees: Classification

  32. Decision Trees: Classification

  33. Decision Trees: Classification

  34. Decision Trees: More than 2 classes

  35. Decision Trees: Non-binary trees

  36. Decision Trees: Overfitting • Fully grown trees are usually too complicated

  37. Decision Trees: Stopping Criteria • Stop when absolute number of samples is low (below a threshold) • Stop when Entropy is already relatively low (below a threshold) • Stop if IG is low • Stop if decision could be random (Chi-Square test) • Threshold values are hyperparameters

  38. Decision Trees: Pruning • "Pruning" means removing nodes from a tree after training has finished • Stopping criteria are sometimes referred to as "pre-pruning" • Redundant nodes are removed, sometimes tree is remodeled

  39. Example: Pruning

  40. Decision Trees: Stability

  41. Decision Trees: Stability

  42. Decision Trees: Stability

  43. Decision Trees: Stability

  44. Decision Trees: Stability

  45. Decision Trees: Stability

  46. Decision Trees: Stability

  47. Decision Trees: Stability

  48. Decision Trees: Stability

  49. Decision Trees: Stability

  50. Decision Trees: Stability

More Related