640 likes | 897 Views
Machine Learning: Making Computer Science Scientific. Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.cs.orst.edu/~tgd. Acknowledgements. VLSI Wafer Testing Tony Fountain Robot Navigation Didac Busquets Carles Sierra
E N D
Machine Learning: Making Computer Science Scientific Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.cs.orst.edu/~tgd
Acknowledgements • VLSI Wafer Testing • Tony Fountain • Robot Navigation • Didac Busquets • Carles Sierra • Ramon Lopez de Mantaras • NSF grants IIS-0083292 and ITR-085836
Outline • Three scenarios where standard software engineering methods fail • Machine learning methods applied to these scenarios • Fundamental questions in machine learning • Statistical thinking in computer science
Scenario 1: Reading Checks Find and read “courtesy amount” on checks:
Possible Methods: • Method 1: Interview humans to find out what steps they follow in reading checks • Method 2: Collect examples of checks and the correct amounts. Train a machine learning system to recognize the amounts
Scenario 2: VLSI Wafer Testing • Wafer test: Functional test of each die (chip) while on the wafer
Which Chips (and how many) should be tested? • Tradeoff: • Test all chips on wafer? • Avoid cost of packaging bad chips • Incur cost of testing all chips • Test none of the chips on the wafer? • May package some bad chips • No cost of testing on wafer
Possible Methods • Method 1: Guess the right tradeoff point • Method 2: Learn a probabilistic model that captures the probability that each chip will be bad • Plug this model into a Bayesian decision making procedure to optimize expected profit
Scenario 3: Allocating mobile robot camera Binocular No GPS
Camera tradeoff • Mobile robot uses camera both for obstacle avoidance and landmark-based navigation • Tradeoff: • If camera is used only for navigation, robot collides with objects • If camera is used only for obstacle avoidance, robot gets lost
Possible Methods • Method 1: Manually write a program to allocate the camera • Method 2: Experimentally learn a policy for switching between obstacle avoidance and landmark tracking
Challenges for SE Methodology • Standard SE methods fail when… • System requirements are hard to collect • The system must resolve difficult tradeoffs
(1) System requirements are hard to collect • There are no human experts • Cellular telephone fraud • Human experts are inarticulate • Handwriting recognition • The requirements are changing rapidly • Computer intrusion detection • Each user has different requirements • E-mail filtering
(2) The system must resolve difficult tradeoffs • VLSI Wafer testing • Tradeoff point depends on probability of bad chips, relative costs of testing versus packaging • Camera Allocation for Mobile Robot • Tradeoff depends on probability of obstacles, number and quality of landmarks
Machine Learning: Replacing guesswork with data • In all of these cases, the standard SE methodology requires engineers to make guesses • Guessing how to do character recognition • Guessing the tradeoff point for wafer test • Guessing the tradeoff for camera allocation • Machine Learning provides a way of making these decisions based on data
Outline • Three scenarios where software engineering methods fail • Machine learning methods applied to these scenarios • Fundamental questions in machine learning • Statistical thinking in computer science
Basic Machine Learning Methods • Supervised Learning • Density Estimation • Reinforcement Learning
1 0 6 3 8 Supervised Learning Training Examples New Examples Learning Algorithm Classifier 8
AT&T/NCR Check Reading System Recognition transformer is a neural network trained on 500,000 examples of characters The entire system is trained given entire checks as input and dollar amounts as output LeCun, Bottou, Bengio & Haffner (1998) Gradient-Based Learning Applied to Document Recognition
Check Reader Performance • 82% of machine-printed checks correctly recognized • 1% of checks incorrectly recognized • 17% “rejected” – check is presented to a person for manual reading • Fielded by NCR in June 1996; reads millions of checks per month
Supervised Learning Summary • Desired classifier is a function y = f(x) • Training examples are desired input-output pairs (xi,yi)
Density Estimation Training Examples Partially-tested wafer Learning Algorithm Density Estimator P(chipi is bad) = 0.42
W . . . C1 C2 C3 C209 On-Wafer Testing System • Trained density estimator on 600 wafers from mature product (HP; Corvallis, OR) • Probability model is “naïve Bayes” mixture model with four components (trained with EM)
One-Step Value of Information • Choose the larger of • Expected profit if we predict remaining chips, package, and re-test • Expected profit if we test chip Ci, then predict remaining chips, package, and re-test [for all Ci not yet tested]
On-Wafer Chip Test Results 3.8% increase in profit
Density Estimation Summary • Desired output is a joint probability distribution P(C1, C2, …, C203) • Training examples are points X= (C1, C2, …, C203) sampled from this distribution
agent Reinforcement Learning state s Environment reward r action a Agent’s goal: Choose actions to maximize total reward Action Selection Rule is called a “policy”: a = p(s)
Reinforcement Learning for Robot Navigation • Learning from rewards and punishments in the environment • Give reward for reaching goal • Give punishment for getting lost • Give punishment for collisions
Experimental Results:% trials robot reaches goal Busquets, Lopez de Mantaras, Sierra, Dietterich (2002)
Reinforcement Learning Summary • Desired output is an action selection policy p • Training examples are <s,a,r,s’> tuples collected by the agent interacting with the environment
Outline • Three scenarios where software engineering methods fail • Machine learning methods applied to these scenarios • Fundamental questions in machine learning • Statistical thinking in computer science
Fundamental Issues in Machine Learning • Incorporating Prior Knowledge • Incorporating Learned Structures into Larger Systems • Making Reinforcement Learning Practical • Triple Tradeoff: accuracy, sample size, hypothesis complexity
Incorporating Prior Knowledge • How can we incorporate our prior knowledge into the learning algorithm? • Difficult for decision trees, neural networks, support-vector machines, etc. • Mismatch between form of our knowledge and the way the algorithms work • Easier for Bayesian networks • Express knowledge as constraints on the network
Incorporating Learned Structures into Larger Systems • Success story: Digit recognizer incorporated into check reader • Challenges: • Larger system may make several coordinated decisions, but learning system treated each decision as independent • Larger system may have complex cost function: Errors in thousands place versus the cents place: $7,236.07
Making Reinforcement Learning Practical • Current reinforcement learning methods do not scale well to large problems • Need robust reinforcement learning methodologies
The Triple Tradeoff • Fundamental relationship between • amount of training data • size and complexity of hypothesis space • accuracy of the learned hypothesis • Explains many phenomena observed in machine learning systems
Learning Algorithms • Set of data points • Class H of hypotheses • Optimization problem: Find the hypothesis h in H that best fits the data Training Data h Hypothesis Space
Triple Tradeoff Amount of Data – Hypothesis Complexity – Accuracy N = 1000 Accuracy N = 100 N = 10 Hypothesis Space Complexity
Triple Tradeoff (2) H3 Hypothesis Complexity H2 Accuracy H1 Number of training examples N
Intuition • With only a small amount of data, we can only discriminate between a small number of different hypotheses • As we get more data, we have more evidence, so we can consider more alternative hypotheses • Complex hypotheses give better fit to the data
Fixed versus Variable-Sized Hypothesis Spaces • Fixed size • Ordinary linear regression • Bayes net with fixed structure • Neural networks • Variable size • Decision trees • Bayes nets with variable structure • Support vector machines
Corollary 1:Fixed H will underfit H2 underfit Accuracy H1 Number of training examples N
Corollary 2:Variable-sized H will overfit overfit Accuracy N = 100 Hypothesis Space Complexity
Ideal Learning Algorithm: Adapt complexity to data N = 1000 Accuracy N = 100 N = 10 Hypothesis Space Complexity
Adapting Hypothesis Complexity to Data Complexity • Find hypothesis h to minimize error(h) + l complexity(h) • Many methods for adjusting l • Cross-validation • MDL
Outline • Three scenarios where software engineering methods fail • Machine learning methods applied to these scenarios • Fundamental questions in machine learning • Statistical thinking in computer science
The Data Explosion • NASA Data • 284 Terabytes (as of August, 1999) • Earth Observing System: 194 G/day • Landsat 7: 150 G/day • Hubble Space Telescope: 0.6 G/day http://spsosun.gsfc.nasa.gov/eosinfo/EOSDIS_Site/index.html
The Data Explosion (2) • Google indexes 2,073,418,204 web pages • US Year 2000 Census: 62 Terabytes of scanned images • Walmart Data Warehouse: 7 (500?) Terabytes • Missouri Botanical Garden TROPICOS plant image database: 700 Gbytes
Old Computer Science Conception of Data Store Retrieve
New Computer Science Conception of Data Problems Store Build Models Solve Problems Solutions