Machine Learning

Machine Learning CPSC 315 – Programming Studio Spring 2009 Project 2, Lecture 5

Forms of Learning • Supervised • Learns from examples which provide desired outputs for given inputs • Unsupervised • Learns patterns in input data when no specific output values are given • Reinforcement • Learns by an indication of correctness at end of some reasoning

Supervised Learning • Must have training data including • Inputs (features) to be considered in decision • Outputs (correct decisions) for those inputs • Inductive reasoning • Given a collection of examples of function f, return a function h that approximates f • Difficulty: many functions h may be possible • Hope to pick function h that generalizes well • Tradeoff between the complexity of the hypothesis and the degree of fit to the data • Consider data modeling

Evaluating Supervised Learning Algorithms • Collect a large set of examples (input/output pairs) • Divide into two disjoint sets • Training data • Testing data • Apply learning algorithm to training data, generating a hypothesis h • Measure % of examples in the testing data that are successfully classified by h (or amount of error for continuously valued outputs) • Repeat above steps for different sizes of training sets and different randomly selected training sets

Decision Trees • Map features of situation to decision • Example from a classification of unsafe acts:

Decision Trees • Relation to rule-based reasoning • Features of element used to classify element • Features of situation used to select action • Used as the basis for many “how to” books • How to identify type of snake? • Observable features of snake • How to fix an automobile? • Features related to problem and state of automobile • If features are understandable, the decision tree can be used to explain decision

Learning Decision Trees • Types of Decision Trees • Learning a discrete-valued function is classification learning • Learning a continuous-valued function is regression • Assumption: use of Ockham’s razor will result in more general function • Want the smallest decision tree, but that is not tractable • Will be satisfied with smallish tree

Algorithm for Decision Tree Learning • Basic idea • Recursively select feature that splits data (most) unevenly • No need to use all features • Heuristic approach • Compare features for their ability to meaningfully split data • Feature-value = greatest difference in average output value(s) * size of smaller subset • Avoids splitting out individuals too early

Unsupervised Learning • Used to characterize/explain the key features of a set of data • No notion of desired output • Example: identifying fast-food vs. fine-dining restaurants when classes are not known ahead of time • Techniques • Clustering (k means, HAC) • Self-Organizing Maps • Gaussian Mixture Models • More on this topic in Project 3

Reinforcement Learning • Many large problems do not have desired outputs that can be used as training data • Process • Agent (system) performs a set of actions • Agent occasionally receives a reward to indicate something went right or penalty to indicate something went wrong • Agent has to learn relationship between the model of the situation, the chosen actions, and the rewards/penalties

Analogy to Animal Training • We cannot tell our pets what is right and wrong in (preconditions, action) pairs • Instead we reward good behavior (giving treats) and penalize bad behavior (spraying water or loud noise) • Pet has to learn when and where what is appropriate • Can result in incorrect interpretations (go in corner vs. go outside) • Difficulty: what of the prior/recent actions caused the positive/negative outcome • Clicker training for animals is meant to help this

Reinforcement Learning in Games • Simplest reinforcements • Winning or losing • Requires lots of games/time to learn • Other potential reinforcements • Opponent’s action selection • Did they minimize your goodness value • Modify goodness function to better match their moves • Potential to learn an individual’s values/strategy • Predicted goodness value vs. observed goodness value • This can be used in small (a few moves) or large (a game) time scales • Similar to person reflecting on when things went wrong • Need to be careful in implementation or else goodness function will return a constant (thus being totally consistent)

Modifying a Goodness Function • Consider the game of chess • Presume goodness function has three linear components • BoardControl • the difference between the number of board positions that Player1 and Player2 can get a piece to in one move • Threatened • the difference between the number of opponents pieces threatened (can be taken in one move) between Player1 and Player2 • Pieces • the difference in the sum of the values of pieces left for Player1 and Player2 where Queen = 10, Rook = 6, Bishop = 3, Knight = 3, Pawn = 1

Modifying a Goodness Function • G(s) = a*BoardControl + b*Pieces + c*Threatened • Modify coefficients to learn appropriate weighting of terms • Quantity of overall modification should relate to difference between predicted goodness and observed goodness • Direction of modification to each linear component should be related to whether they are consistent with or disagree with outcome • Could modify coefficients using fixed values (e.g. +/- .1) or with values a function of their effect on overall G for the state being considered • In theory, such a computer player could recognize that BoardControl is more important early in a game, Pieces is more important mid-game, and Threatened is more important for the end game.

Machine Learning