Learning: Nearest Neighbor

Learning:Nearest Neighbor Artificial Intelligence CMSC 25000 January 31, 2002

Agenda • Machine learning: Introduction • Nearest neighbor techniques • Applications: Robotic motion, Credit rating • Efficient implementations: • k-d trees, parallelism • Extensions: K-nearest neighbor • Limitations: • Distance, dimensions, & irrelevant attributes

Machine Learning • Learning: Acquiring a function, based on past inputs and values, from new inputs to values. • Learn concepts, classifications, values • Identify regularities in data

Machine Learning Examples • Pronunciation: • Spelling of word => sounds • Speech recognition: • Acoustic signals => sentences • Robot arm manipulation: • Target => torques • Credit rating: • Financial data => loan qualification

Machine Learning Characterization • Distinctions: • Are output values known for any inputs? • Supervised vs unsupervised learning • Supervised: training consists of inputs + true output value • E.g. letters+pronunciation • Unsupervised: training consists only of inputs • E.g. letters only • Course studies supervised methods

Machine Learning Characterization • Distinctions: • Are output values discrete or continuous? • Discrete: “Classification” • E.g. Qualified/Unqualified for a loan application • Continuous: “Regression” • E.g. Torques for robot arm motion • Characteristic of task

Machine Learning Characterization • Distinctions: • What form of function is learned? • Also called “inductive bias” • Graphically, decision boundary • E.g. Single, linear separator • Rectangular boundaries - ID trees • Vornoi spaces…etc… + + + - - -

Machine Learning Functions • Problem: Can the representation effectively model the class to be learned? • Motivates selection of learning algorithm For this function, Linear discriminant is GREAT! Rectangular boundaries (e.g. ID trees) TERRIBLE! Pick the right representation! - - - - - - - - - ++ + + + +

Machine Learning Features • Inputs: • E.g.words, acoustic measurements, financial data • Vectors of features: • E.g. word: letters • ‘cat’: L1=c; L2 = a; L3 = t • Financial data: F1= # late payments/yr : Integer • F2 = Ratio of income to expense: Real

Machine Learning Features • Question: • Which features should be used? • How should they relate to each other? • Issue 1: How do we define relation in feature space if features have different scales? • Solution: Scaling/normalization • Issue 2: Which ones are important? • If differ in irrelevant feature, should ignore

Complexity & Generalization • Goal: Predict values accurately on new inputs • Problem: • Train on sample data • Can make arbitrarily complex model to fit • BUT, will probably perform badly on NEW data • Strategy: • Limit complexity of model (e.g. degree of equ’n) • Split training and validation sets • Hold out data to check for overfitting

Nearest Neighbor • Memory- or case- based learning • Supervised method: Training • Record labeled instances and feature-value vectors • For each new, unlabeled instance • Identify “nearest” labeled instance • Assign same label • Consistency heuristic: Assume that a property is the same as that of the nearest reference case.

Nearest Neighbor Example • Problem: Robot arm motion • Difficult to model analytically • Kinematic equations • Relate joint angles and manipulator positions • Dynamics equations • Relate motor torques to joint angles • Difficult to achieve good results modeling robotic arms or human arm • Many factors & measurements

Nearest Neighbor Example • Solution: • Move robot arm around • Record parameters and trajectory segment • Table: torques, positions,velocities, squared velocities, velocity products, accelerations • To follow a new path: • Break into segments • Find closest segments in table • Get those torques (interpolate as necessary)

Nearest Neighbor Example • Issue: Big table • First time with new trajectory • “Closest” isn’t close • Table is sparse - few entries • Solution: Practice • As attempt trajectory, fill in more of table • After few attempts, very close

Nearest Neighbor Example II Name L R G/P • Credit Rating: • Classifier: Good / Poor • Features: • L = # late payments/yr; • R = Income/Expenses A 0 1.2 G B 25 0.4 P C 5 0.7 G D 20 0.8 P E 30 0.85 P F 11 1.2 G G 7 1.15 G H 15 0.8 P

Nearest Neighbor Example II Name L R G/P A 0 1.2 G A F B 25 0.4 P 1 G R E C 5 0.7 G H D C D 20 0.8 P E 30 0.85 P B F 11 1.2 G G 7 1.15 G 10 20 30 L H 15 0.8 P

Nearest Neighbor Example II Name L R G/P H 6 1.15 G A F J I 22 0.45 P 1 H G ?? E J 15 1.2 D H R C I B Distance Measure: Sqrt ((L1-L2)^2 + [sqrt(10)*(R1-R2)]^2)) - Scaled distance 10 20 30 L

Efficient Implementations • Classification cost: • Find nearest neighbor: O(n) • Compute distance between unknown and all instances • Compare distances • Problematic for large data sets • Alternative: • Use binary search to reduce to O(log n)

Efficient Implementation: K-D Trees • Divide instances into sets based on features • Binary branching: E.g. > value • 2^d leaves with d split path = n • d= O(log n) • To split cases into sets, • If there is one element in the set, stop • Otherwise pick a feature to split on • Find average position of two middle objects on that dimension • Split remaining objects based on average position • Recursively split subsets

K-D Trees: Classification R > 0.825? Yes No L > 17.5? L > 9 ? No Yes Yes No R > 0.6? R > 0.75? R > 1.175 ? R > 1.025 ? No Yes No Yes No No Yes Yes Poor Good Good Poor Good Good Poor Good

Efficient Implementation:Parallel Hardware • Classification cost: • # distance computations • Const time if O(n) processors • Cost of finding closest • Compute pairwise minimum, successively • O(log n) time

Nearest Neighbor: Issues • Prediction can be expensive if many features • Affected by classification, feature noise • One entry can change prediction • Definition of distance metric • How to combine different features • Different types, ranges of values • Sensitive to feature selection

Nearest Neighbor Analysis • Problem: • Ambiguous labeling, Training Noise • Solution: • K-nearest neighbors • Not just single nearest instance • Compare to K nearest neighbors • Label according to majority of K • What should K be? • Often 3, can train as well

Nearest Neighbor: Analysis • Issue: • What is a good distance metric? • How should features be combined? • Strategy: • (Typically weighted) Euclidean distance • Feature scaling: Normalization • Good starting point: • (Feature - Feature_mean)/Feature_standard_deviation • Rescales all values - Centered on 0 with std_dev 1

Nearest Neighbor: Analysis • Issue: • What features should we use? • E.g. Credit rating: Many possible features • Tax bracket, debt burden, retirement savings, etc.. • Nearest neighbor uses ALL • Irrelevant feature(s) could mislead • Fundamental problem with nearest neighbor

Nearest Neighbor: Advantages • Fast training: • Just record feature vector - output value set • Can model wide variety of functions • Complex decision boundaries • Weak inductive bias • Very generally applicable

Summary • Machine learning: • Acquire function from input features to value • Based on prior training instances • Supervised vs Unsupervised learning • Classification and Regression • Inductive bias: • Representation of function to learn • Complexity, Generalization, & Validation

Summary: Nearest Neighbor • Nearest neighbor: • Training: record input vectors + output value • Prediction: closest training instance to new data • Efficient implementations • Pros: fast training, very general, little bias • Cons: distance metric (scaling), sensitivity to noise & extraneous features

Learning: Nearest Neighbor