1 / 26

CSC 4510 – Machine Learning

Lecture 3: Classification and Decision Trees. CSC 4510 – Machine Learning. Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/. Last time:Machine learning Overview Supervised Learning Classification

ailani
Download Presentation

CSC 4510 – Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 3: Classification and Decision Trees CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ CSC 4510 - M.A. Papalaskari - Villanova University

  2. Last time:Machine learning Overview • Supervised Learning • Classification • Regression • Unsupervised learning Others: Reinforcement learning, recommender systems. Also talk about: Practical advice for applying learning algorithms. CSC 4510 - M.A. Papalaskari - Villanova University

  3. Supervised or Unsupervised learning? Iris Data

  4. Resources: Datasets • UCI Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html • UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.html • Statlib: http://lib.stat.cmu.edu/ • Delve: http://www.cs.utoronto.ca/~delve/ CSC 4510 - M.A. Papalaskari - Villanova University

  5. Example: adult.data Dataset description from UCI Repository CSC 4510 - M.A. Papalaskari - Villanova University

  6. UCI Repository: adult.data CSC 4510 - M.A. Papalaskari - Villanova University

  7. Our Sample Data %,Class data,,,,, %,major 1=CS; 2=psych; 3=other,class (1=freshman; 2=sophomore;=graduate or other),birthday month (number),eyecolor (0=blue; =brown; 2=other),Do you prefer apples(1) or oranges (0)?, T: major,class,bmonth,eyecolor,aORo; A: 2,3,6,1,0; A: 1,2,3,1,1; A: 2,3,5,1,1; A: 3,4,7,1,1; A: 1,4,10,1,0; A: 3,4,6,1,0; A: 2,3,10,0,1; A: 1,4,7,1,0; A: 2,3,3,1,1; A: 3,3,7,1,1; A: 1,4,8,2,1; A: 1,4,4,1,0; A: 3,4,3,0,1; A: 3,4,2,2,1; A: 3,4,8,1,1; A: 1,4,2,2,0; A: 1,5,8,1,1; A: 1,5,4,0,1; A: 2,5,11,2,0; CSC 4510 - M.A. Papalaskari - Villanova University

  8. Classification (Categorization) • Given: • A description of an instance, xX, where X is the instance language or instance space. • A fixed set of categories: C={c1, c2,…cn} • Determine: • The category of x: c(x)C, where c(x) is a categorization function whose domain is X and whose range is C. • If c(x) is a binary function C={0,1} ({true,false}, {positive, negative}) then it is called a concept. CSC 4510 - M.A. Papalaskari - Villanova University

  9. Tiny Example of Category Learning • Instance attributes: <size, color, shape> • size  {small, medium, large} • color  {red, blue, green} • shape  {square, circle, triangle} • C = {positive, negative} • D: CSC 4510 - M.A. Papalaskari - Villanova University

  10. Hypothesis Selection • Many hypotheses are usually consistent with the training data. • red & circle • (small & circle) or (large & red) • (small & red & circle) or (large & red & circle) • not [ ( red & triangle) or (blue & circle) ] • not [ ( small & red & triangle) or (large & blue & circle) ] • Bias • Any criteria other than consistency with the training data that is used to select a hypothesis. CSC 4510 - M.A. Papalaskari - Villanova University

  11. Generalization • Hypotheses must generalize to correctly classify instances not in the training data. • Simply memorizing training examples is a consistent hypothesis that does not generalize. • Occam’s razor: • Finding a simple hypothesis helps ensure generalization. CSC 4510 - M.A. Papalaskari - Villanova University

  12. Hypothesis Space • Restrict learned functions a priori to a given hypothesis space, H, of functions h(x) that can be considered as definitions of c(x). • For learning concepts on instances described by n discrete-valued features, consider the space of conjunctive hypotheses represented by a vector of n constraints <c1, c2, … cn> where each ci is either: • ?, a wild card indicating no constraint on the ith feature • A specific value from the domain of the ith feature • Ø indicating no value is acceptable • Sample conjunctive hypotheses are • <big, red, ?> • <?, ?, ?> (most general hypothesis) • < Ø, Ø, Ø> (most specific hypothesis) CSC 4510 - M.A. Papalaskari - Villanova University

  13. Decision Tree Creation • Example: Do We Want to Wait in a Restaurant? CSC 4510 - M.A. Papalaskari - Villanova University

  14. Decision Tree Creation • One Possible Decision Tree: CSC 4510 - M.A. Papalaskari - Villanova University

  15. Creating Efficient Decision Trees CSC 4510 - M.A. Papalaskari - Villanova University

  16. Decision Tree Induction • Many Trees, which to prefer? • Occam’s Razor: The most likely explanation for a set of observations is the simplest explanation. • Assumption: “Smallest Tree” == “Simplest” CSC 4510 - M.A. Papalaskari - Villanova University

  17. Decision Tree Induction Issues • UNFORTUNATELY: • Finding smallest Tree is Intractable! • (what does this mean?) CSC 4510 - M.A. Papalaskari - Villanova University

  18. Heuristics to the Rescue! • Algorithm: CSC 4510 - M.A. Papalaskari - Villanova University

  19. Informal Argument: Choosing Attributes Some Attributes just discriminate better than others CSC 4510 - M.A. Papalaskari - Villanova University

  20. Choosing and Ordering Attribute-Tests • Information Theory • “How many bits is a question’s answer worth?” • Coin Toss: Fair vs. Rigged • Observation: 1 bit is enough to answer a yes/no question about which one has NO idea. • If answers Vi have probabilities P(Vi), then we must weight the number of bits for each answer by its probability to get an overall average number of bits required to represent any answer. CSC 4510 - M.A. Papalaskari - Villanova University

  21. Choosing Attributes • Given “p” positive examples of concept “F(x)” and “n” negative examples, what is I(“correctly identify instances of concept X”)? CSC 4510 - M.A. Papalaskari - Villanova University

  22. Choosing/Ordering Decision Tree Attributes • If one knows the answer/value of an attribute, how much unknown information about the overall concept are we still missing? CSC 4510 - M.A. Papalaskari - Villanova University

  23. Heuristically Choosing Attributes • When adding tests to a tree, always add the next attribute that gives us the largest information gain: • What happens when a leaf node is ambiguous (has both + and - examples) • when our decision path gets us to such a node, randomly give a yes/no answer according to the yes/no probabilities at that node CSC 4510 - M.A. Papalaskari - Villanova University

  24. When to Use/Not Use Decision Trees • Expressiveness • Pro: any Boolean Function can be represented • Con: many BFs don’t have compact trees • Overfitting: finding meaningless regularities in data • Solution 1 (Pruning): don’t use attributes whose G(A) is close to zero; use Chi-Squared tests for significance. • Solution 2: (Cross Validation) Prefer trees with higher predictive accuracy on set-aside data. CSC 4510 - M.A. Papalaskari - Villanova University

  25. Our Sample Data • Lets revisit the sample data from our class in AiSpace: • Download and save the file with student data • From the main tools page in AIspace.org select “Decision Trees” • Launch the decision trees tool using Java web start (use the first link on that page) • Load the example and use the “Step” button to build the tree. • Observe the choice of nodes split by the decision tree algorithm CSC 4510 - M.A. Papalaskari - Villanova University

  26. Class Exercise • Practice using decision tree learning on some of the sample datasets available in AISpace • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/ CSC 4510 - M.A. Papalaskari - Villanova University

More Related