350 likes | 654 Views
Machine Learning. Spring 2010 Rong Jin. CSE847 Machine Learning. Instructor: Rong Jin Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm Textbook Machine Learning The Elements of Statistical Learning Pattern Recognition and Machine Learning Many subjects are from papers
E N D
Machine Learning Spring 2010 Rong Jin
CSE847 Machine Learning • Instructor: Rong Jin • Office Hour: • Tuesday 4:00pm-5:00pm • Thursday 4:00pm-5:00pm • Textbook • Machine Learning • The Elements of Statistical Learning • Pattern Recognition and Machine Learning • Many subjects are from papers • Web site: http://www.cse.msu.edu/~cse847
Requirements • ~10 homework assignments • Course project • Topic: visual object recognition • Data: over one million images with extracted visual features • Objective: build a classifier that automatically identify the class of objects in images • Midterm exam & final exam
Goal • Familiarize you with the state-of-art in Machine Learning • Breadth: many different techniques • Depth: Project • Hands-on experience • Develop the way of machine learning thinking • Learn how to model real-world problems by machine learning techniques • Learn how to deal with practical issues
Course Outline • Theoretical Aspects • Information Theory • Optimization Theory • Probability Theory • Learning Theory • Practical Aspects • Supervised Learning Algorithms • Unsupervised Learning Algorithms • Important Practical Issues • Applications
Today’s Topics • Why is machine learning? • Example: learning to play backgammon • General issues in machine learning
Why Machine Learning? • Past: most computer programs are mainly made by hand • Future: Computers should be able to program themselves by the interaction with their environment
Recent Trends • Recent progress in algorithm and theory • Growing flood of online data • Computational power is available • Growing industry
Three Niches for Machine Learning • Data mining: using historical data to improve decisions • Medical records medical knowledge • Software applications that are difficult to program by hand • Autonomous driving • Image Classification • User modeling • Automatic recommender systems
Typical Data Mining Task • Given: • 9147 patient records, each describing pregnancy and birth • Each patient contains 215 features • Task: • Classes of future patients at high risk for Emergency Cesarean Section
Data Mining Results One of 18 learned rules: If no previous vaginal delivery abnormal 2nd Trimester Ultrasound Malpresentation at admission Then probability of Emergency C-Section is 0.6
Credit Risk Analysis Learned Rules: If Other-Delinquent-Account > 2 Number-Delinquent-Billing-Cycles > 1 Then Profitable-Costumer ? = no If Other-Delinquent-Account = 0 (Income > $30K or Years-of-Credit > 3) Then Profitable-Costumer ? = yes
Programs too Difficult to Program By Hand • ALVINN drives 70mph on highways
Programs too Difficult to Program By Hand • ALVINN drives 70mph on highways
Programs too Difficult to Program By Hand • Visual object recognition
What to Recommend? Description: A high-school boy is given the chance to write a story about an up-and-coming rock band as he accompanies it on their concert tour. Recommend: ? Description:A homicide detective and a fire marshall must stop a pair of murderers who commit videotaped crimes to become media darlings Rating: Description: A biography of sports legend, Muhammad Ali, from his early days to his days in the ring Rating: No Description: A young adventurer named Milo Thatch joins an intrepid group of explorers to find the mysterious lost continent of Atlantis. Recommend: ? Description:Benjamin Martin is drawn into the American revolutionary war against his will when a brutal British commander kills his son. Rating: Yes Software that Models Users History
Relevant Disciplines • Artificial Intelligence • Statistics (particularly Bayesian Stat.) • Computational complexity theory • Information theory • Optimization theory • Philosophy • Psychology • …
Today’s Topics • Why is machine learning? • Example: learning to play backgammon • General issues in machine learning
What is the Learning Problem • Learning = Improving with experience at some task • Improve over task T • With respect to performance measure P • Based on experience E • Example: Learning to Play Backgammon • T: Play backgammon • P: % of games won in world tournament • E: opportunity to play against itself
Backgammon • More than 1020 states (boards) • Best human players see only small fraction of all board during lifetime • Searching is hard because of dice (branching factor > 100)
TD-Gammon by Tesauro (1995) • Trained by playing with itself • Now approximately equal to the best human player
Learn to Play Chess • Task T: Play chess • Performance P: Percent of games won in the world tournament • Experience E: • What experience? • How shall it be represented? • What exactly should be learned? • What specific algorithm to learn it?
Choose a Target Function • Goal: • Policy: : b m • Choice of value function • V: b, m B = board = real values
Choose a Target Function • Goal: • Policy: : b m • Choice of value function • V: b, m • V: b B = board = real values
Value Function V(b): Example Definition • If b final board that is won: V(b) = 1 • If b final board that is lost: V(b) = -1 • If b not final board V(b) = E[V(b*)] where b* is final board after playing optimally
Representation of Target Function V(b) Same value for each board Lookup table (one entry for each board) • Summarize experience into • Polynomials • Neural Networks No Learning No Generalization
Example: Linear Feature Representation • Features: • pb(b), pw(b) = number of black (white) pieces on board b • ub(b), ub(b) = number of unprotected pieces • tb(b), tb(b) = number of pieces threatened by opponent • Linear function: • V(b) = w0pb(b)+ w1pw(b)+ w2ub(b)+ w3uw(b)+ w4tb(b)+ w5tw(b) • Learning: • Estimation of parameters w0, …, w5
Tuning Weights • Given: • board b • Predicted value V(b) • Desired value V*(b) • Calculate error(b) = (V*(b) – V(b))2 For each board feature fi wi wi + cerror(b)fi • Stochastically minimizes b (V*(b)-V(b))2 Gradient Descent Optimization
Obtain Boards • Random boards • Beginner plays • Professionals plays
Obtain Target Values • Person provides value V(b) • Play until termination. If outcome is • Win: V(b) 1 for all boards • Loss: V(b) -1 for all boards • Draw: V(b) 0 for all boards • Play one move: b b’ V(b) V(b’) • Play n moves: b b’… b(n) • V(b) V(b(n))
MathematicalModeling Finding Optimal Parameters + Statistics Optimization Machine Learning A General Framework
Today’s Topics • Why is machine learning? • Example: learning to play backgammon • General issues in machine learning
Importants Issues in Machine Learning • Obtaining experience • How to obtain experience? • Supervised learning vs. Unsupervised learning • How many examples are enough? • PAC learning theory • Learning algorithms • What algorithm can approximate function well, when? • How does the complexity of learning algorithms impact the learning accuracy? • Whether the target function is learnable? • Representing inputs • How to represent the inputs? • How to remove the irrelevant information from the input representation? • How to reduce the redundancy of the input representation?