1 / 29

Machine Learning

Machine Learning. CPS4801. Research Day. Keynote Speaker Tuesday 9:30-11:00 STEM Lecture Hall (2 nd floor) Meet-and-Greet 11:30 STEM 512 Faculty Presentation Tuesday 11:00-3:00 STEM Prof. Liou 2:00 Room 415 Student Poster Wednesday 10:00-3:00

mare
Download Presentation

Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning CPS4801

  2. Research Day • Keynote Speaker • Tuesday 9:30-11:00 STEM Lecture Hall (2nd floor) • Meet-and-Greet 11:30 STEM 512 • Faculty Presentation • Tuesday 11:00-3:00 STEM • Prof. Liou 2:00 Room 415 • Student Poster • Wednesday 10:00-3:00 • Computer Science 10:00-12:00 STEM Atrium • Schedule:http://orsp.kean.edu/ResearchDays_Schedule.html

  3. Outline • Introduction • Decision tree learning • Clustering • Artificial Neural Networks • Genetic algorithms

  4. Learning from Examples • An agent is learning if it improves its performance on future tasks after making observations about the world. • One class of learning problem: • from a collection of input-output pairs, learn a function that predicts the output for new inputs.

  5. Why learning? • The designer cannot anticipate all possible situations • A robot designed to navigate mazes must learn the layout of each new maze. • The designer cannot anticipate all changes • A program designed to predict tomorrow’s stock market prices must learn to adapt when conditions change. • Programmers sometimes have no idea how to program a solution • recognizing faces

  6. Types of Learning • Supervised learning • example input-output pairs and learns a function • Unsupervised learning • correct answers not given • clustering: a taxi agent must develop a concept of “good traffic days” and “bad traffic days” • Reinforcement learning • rewards or punishments • taxi agent: lack of a tip • chess game: two points for a win

  7. Supervised Learning • Learning a function/rule from specific input-output pairs is also called inductive learning. • Given a training set of N example pairs: • (x1,y1), (x2,y2), ..., (xN, yN) • target unknown function y = f(x) • Problem: find a hypothesish such that h ≈ f • h is generalized well if it correctly predicts the value of y for novel examples (test set).

  8. Supervised Learning • When the output y is one of the finite set of values (sunny, cloudy, rainy), the learning problem is called classification. • Boolean or binary classification • When y is a number (tomorrow’s temperature), the problem is called regression.

  9. Inductive learning method • The points are in the (x,y) plane, where y = f(x). • We approximate f with h selected from a hypothesis space H. • Construct/adjust h to agree with f on training set

  10. Inductive learning method • Construct/adjust h to agree with f on training set • E.g., curve fitting:

  11. Inductive learning method • Construct/adjust h to agree with f on training set • E.g., curve fitting:

  12. Inductive learning method • Construct/adjust h to agree with f on training set • (h is consistent if it agrees with f on all examples) • E.g., curve fitting:

  13. Inductive learning method • Construct/adjust h to agree with f on training set • (h is consistent if it agrees with f on all examples) • E.g., curve fitting: • How to choose from among multiple consistent hypotheses?

  14. Inductive learning method • Ockham’s razor: prefer the simplest hypothesis consistent with data (14th-century English philosopher William of Ockham) • There is a tradeoff between complex hypotheses that fit the training data well and simpler hypotheses that may generalize better.

  15. Cross-Validation Labeled data (1566) Split into 10 folds 9 folds (approx. 1409) 1 fold (approx. 157) Train Model Evaluate Lather, rinse, repeat (10 times) Report average

  16. Learning decision trees • One of the simplest and yet most successful forms of machine learning. • A decision tree represents a function that takes as input a vector of attribute values and returns a “decision” – a single output. • discrete input, Boolean classification

  17. Learning decision trees Problem: decide whether to wait for a table at a restaurant, based on the following attributes: • Alternate: is there an alternative restaurant nearby? • Bar: is there a comfortable bar area to wait in? • Fri/Sat: is today Friday or Saturday? • Hungry: are we hungry? • Patrons: number of people in the restaurant (None, Some, Full) • Price: price range ($, $$, $$$) • Raining: is it raining outside? • Reservation: have we made a reservation? • Type: kind of restaurant (French, Italian, Thai, Burger) • WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)

  18. Decision trees • One possible representation for hypotheses (no Price and Type) • “true” tree for deciding whether to wait:

  19. Expressiveness • Decision trees can express any function of the input attributes. • E.g., for Boolean functions, truth table row → path to leaf: • Goal <==> (Path1 v Path2 v Path3 v ...) • Trivially, there is a consistent decision tree for any training set with one path to leaf for each example. • Prefer to find more compact decision trees

  20. Decision trees • One possible representation for hypotheses (no Price and Type) • “true” tree for deciding whether to wait:

  21. Constructing the Decision Tree • Goal: Find the smallest decision tree consistent with the examples • divide-and-conquer: Test the most important attribute first, divides the problem up into smaller subproblems that can be solved recursively. • “Most important”: attribute that best splits examples • Form tree with root = best attribute • For each value vi (or range) of best attribute • Selects those examples with best=vi • Construct subtreei by recursively calling decision tree with subset of examples, all attributes except best • Add a branch to tree with label=vi and subtree=subtreei

  22. Decision tree learning • Aim: find a small tree consistent with the training examples • Idea: (recursively) choose "most significant" attribute as root of (sub)tree

  23. Choosing an attribute • Idea: a good attribute splits the examples into subsets that are (ideally) "all positive" or "all negative" • Which is a better choice?

  24. Attribute-based representations • Examples described by attribute values • A training set of 12 examples • E.g., situations where I will/won't wait for a table: • Classification of examples is positive (T) or negative (F)

  25. Choosing the Best Attribute:Binary Classification • Want a formal measure that returns a maximum value when attribute makes a perfect split and minimum when it makes no distinction • Information theory (Shannon and Weaver 49) • Entropy: a measure of uncertainty of a random variable • A coin that always comes up heads --> 0 • A flip of a fair coin (Heads or tails) --> 1(bit) • The roll of a fair four-sided die --> 2(bit) • Information gain: the expected reduction in entropy caused by partitioning the examples according to this attribute

  26. Formula for Entropy Examples: Suppose we have a collection of 10 examples, 5 positive, 5 negative:H(1/2,1/2) = -1/2log21/2 -1/2log21/2 = 1 bit Suppose we have a collection of 100 examples, 1 positive and 99 negative: H(1/100,99/100) = -.01log2.01 -.99log2.99 = .08 bits

  27. Information gain • Information gain (from attribute test) = difference between the original information requirement and new requirement • Information Gain (IG) or reduction in entropy from the attribute test: • Choose the attribute with the largest IG

  28. Information gain For the training set, p = n = 6, I(6/12, 6/12) = 1 bit Consider the attributes Patrons and Type (and others too): Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the root

  29. Example contd. • Decision tree learned from the 12 examples: • Substantially simpler than the “true” tree

More Related