Machine Learning

Machine Learning Spring 2010 Rong Jin

CSE847 Machine Learning • Instructor: Rong Jin • Office Hour: • Tuesday 4:00pm-5:00pm • Thursday 4:00pm-5:00pm • Textbook • Machine Learning • The Elements of Statistical Learning • Pattern Recognition and Machine Learning • Many subjects are from papers • Web site: http://www.cse.msu.edu/~cse847

Requirements • ~10 homework assignments • Course project • Topic: visual object recognition • Data: over one million images with extracted visual features • Objective: build a classifier that automatically identify the class of objects in images • Midterm exam & final exam

Goal • Familiarize you with the state-of-art in Machine Learning • Breadth: many different techniques • Depth: Project • Hands-on experience • Develop the way of machine learning thinking • Learn how to model real-world problems by machine learning techniques • Learn how to deal with practical issues

Course Outline • Theoretical Aspects • Information Theory • Optimization Theory • Probability Theory • Learning Theory • Practical Aspects • Supervised Learning Algorithms • Unsupervised Learning Algorithms • Important Practical Issues • Applications

Today’s Topics • Why is machine learning? • Example: learning to play backgammon • General issues in machine learning

Why Machine Learning? • Past: most computer programs are mainly made by hand • Future: Computers should be able to program themselves by the interaction with their environment

Recent Trends • Recent progress in algorithm and theory • Growing flood of online data • Computational power is available • Growing industry

Three Niches for Machine Learning • Data mining: using historical data to improve decisions • Medical records  medical knowledge • Software applications that are difficult to program by hand • Autonomous driving • Image Classification • User modeling • Automatic recommender systems

Typical Data Mining Task • Given: • 9147 patient records, each describing pregnancy and birth • Each patient contains 215 features • Task: • Classes of future patients at high risk for Emergency Cesarean Section

Data Mining Results One of 18 learned rules: If no previous vaginal delivery abnormal 2nd Trimester Ultrasound Malpresentation at admission Then probability of Emergency C-Section is 0.6

Credit Risk Analysis Learned Rules: If Other-Delinquent-Account > 2 Number-Delinquent-Billing-Cycles > 1 Then Profitable-Costumer ? = no If Other-Delinquent-Account = 0 (Income > $30K or Years-of-Credit > 3) Then Profitable-Costumer ? = yes

Programs too Difficult to Program By Hand • ALVINN drives 70mph on highways

Programs too Difficult to Program By Hand • Visual object recognition

Image Retrieval using Texts

What to Recommend? Description: A high-school boy is given the chance to write a story about an up-and-coming rock band as he accompanies it on their concert tour. Recommend: ? Description:A homicide detective and a fire marshall must stop a pair of murderers who commit videotaped crimes to become media darlings Rating: Description: A biography of sports legend, Muhammad Ali, from his early days to his days in the ring Rating: No Description: A young adventurer named Milo Thatch joins an intrepid group of explorers to find the mysterious lost continent of Atlantis. Recommend: ? Description:Benjamin Martin is drawn into the American revolutionary war against his will when a brutal British commander kills his son. Rating: Yes Software that Models Users History

Netflix Contest

Relevant Disciplines • Artificial Intelligence • Statistics (particularly Bayesian Stat.) • Computational complexity theory • Information theory • Optimization theory • Philosophy • Psychology • …

What is the Learning Problem • Learning = Improving with experience at some task • Improve over task T • With respect to performance measure P • Based on experience E • Example: Learning to Play Backgammon • T: Play backgammon • P: % of games won in world tournament • E: opportunity to play against itself

Backgammon • More than 1020 states (boards) • Best human players see only small fraction of all board during lifetime • Searching is hard because of dice (branching factor > 100)

TD-Gammon by Tesauro (1995) • Trained by playing with itself • Now approximately equal to the best human player

Learn to Play Chess • Task T: Play chess • Performance P: Percent of games won in the world tournament • Experience E: • What experience? • How shall it be represented? • What exactly should be learned? • What specific algorithm to learn it?

Choose a Target Function • Goal: • Policy: : b  m • Choice of value function • V: b, m   B = board  = real values

Choose a Target Function • Goal: • Policy: : b  m • Choice of value function • V: b, m   • V: b   B = board  = real values

Value Function V(b): Example Definition • If b final board that is won: V(b) = 1 • If b final board that is lost: V(b) = -1 • If b not final board V(b) = E[V(b*)] where b* is final board after playing optimally

Representation of Target Function V(b) Same value for each board Lookup table (one entry for each board) • Summarize experience into • Polynomials • Neural Networks No Learning No Generalization

Example: Linear Feature Representation • Features: • pb(b), pw(b) = number of black (white) pieces on board b • ub(b), ub(b) = number of unprotected pieces • tb(b), tb(b) = number of pieces threatened by opponent • Linear function: • V(b) = w0pb(b)+ w1pw(b)+ w2ub(b)+ w3uw(b)+ w4tb(b)+ w5tw(b) • Learning: • Estimation of parameters w0, …, w5

Tuning Weights • Given: • board b • Predicted value V(b) • Desired value V*(b) • Calculate error(b) = (V*(b) – V(b))2 For each board feature fi wi wi + cerror(b)fi • Stochastically minimizes b (V*(b)-V(b))2 Gradient Descent Optimization

Obtain Boards • Random boards • Beginner plays • Professionals plays

Obtain Target Values • Person provides value V(b) • Play until termination. If outcome is • Win: V(b)  1 for all boards • Loss: V(b)  -1 for all boards • Draw: V(b)  0 for all boards • Play one move: b  b’ V(b)  V(b’) • Play n moves: b  b’… b(n) • V(b)  V(b(n))

MathematicalModeling Finding Optimal Parameters + Statistics Optimization Machine Learning A General Framework

Importants Issues in Machine Learning • Obtaining experience • How to obtain experience? • Supervised learning vs. Unsupervised learning • How many examples are enough? • PAC learning theory • Learning algorithms • What algorithm can approximate function well, when? • How does the complexity of learning algorithms impact the learning accuracy? • Whether the target function is learnable? • Representing inputs • How to represent the inputs? • How to remove the irrelevant information from the input representation? • How to reduce the redundancy of the input representation?

Machine Learning