1 / 123

Machine Learning

Machine Learning. SCE 5820: Machine Leanring Instructor: Jinbo Bi Computer Science and Engineering Dept. Course Information. Instructor: Dr. Jinbo Bi Office: ITEB 233 Phone: 860-486-1458 Email: jinbo@engr.uconn.edu Web : http://www.engr.uconn.edu/~jinbo/

Download Presentation

Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning SCE 5820: Machine Leanring Instructor: Jinbo Bi Computer Science and Engineering Dept.

  2. Course Information • Instructor: Dr. Jinbo Bi • Office: ITEB 233 • Phone: 860-486-1458 • Email:jinbo@engr.uconn.edu • Web: http://www.engr.uconn.edu/~jinbo/ • Time: Tue / Thur. 2:00pm – 3:15pm • Location: BCH 302 • Office hours: Thur. 3:15-4:15pm • HuskyCT • http://learn.uconn.edu • Login with your NetID and password • Illustration

  3. Regression and classification • Both regression and classification problems are typically supervised learning problems • The main property of supervised learning • Training example contains the input variables and the corresponding target label • The goal is to find a good mapping from the input variables to the target variable

  4. Classification: Definition • Given a collection of examples (training set ) • Each example contains a set of variables (features), and the target variable class. • Find a model for class attribute as a function of the values of other variables. • Goal: previously unseen examples should be assigned a class as accurately as possible. • A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

  5. Classification Application 1 Test Set Model categorical categorical continuous Current data, want to use the model to predict class Learn Classifier Training Set Past transaction records, label them Fraud detection – goals: Predict fraudulent cases in credit card transactions.

  6. Classification: Application 2 • Handwritten Digit Recognition • Goal: Identify the digit of a handwritten number • Approach: • Align all images to derive the features • Model the class (identity) based on these features

  7. Illustrating Classification Task

  8. Classification algorithms K-Nearest-Neighbor classifiers Naïve Bayes classifier Neural Networks Linear Discriminant Analysis (LDA) Support Vector Machines (SVM) Decision Tree Logistic Regression Graphical models

  9. Regression: Definition Goal: predict the value of one or more continuous target attributes give the values of the input attributes Difference between classification and regression only lies in the target attribute Classification: discrete or categorical target Regression: continuous target Greatly studied in statistics, neural network fields.

  10. Refund Marital Taxable Tid Loss Status Income 100 1 Yes Single 125K 120 2 No Married 100K -200 3 No Single 70K -300 4 Yes Married 120K -400 5 No Divorced 95K -500 6 No Married 60K -190 7 Yes Divorced 220K 300 8 No Single 85K Test Set -240 9 No Married 75K Model 90 10 No Single 90K 10 Regression application 1 Continuous target categorical categorical continuous Current data, want to use the model to predict Learn Regressor Training Set Past transaction records, label them goals: Predict the possible loss from a customer

  11. Regression applications • Examples: • Predicting sales amounts of new product based on advertising expenditure. • Predicting wind velocities as a function of temperature, humidity, air pressure, etc. • Time series prediction of stock market indices.

  12. Regression algorithms • Least squares methods • Regularized linear regression (ridge regression) • Neural networks • Support vector machines (SVM) • Bayesian linear regression

  13. Practical issues in the training • Underfitting • Overfitting Before introducing these important concept, let us study a simple regression algorithm – linear regression

  14. Least squares • We wish to use some real-valued input variables x to predict the value of a target y • We collect training data of pairs (xi,yi), i=1,…N • Suppose we have a model f that maps each x example to a value of y’ • Sum of squares function: • Sum of the squares of the deviation between the observed target value y and the predicted value y’

  15. Least squares • Find a function f such that the sum of squares is minimized • For example, your function is in the form of linear functions f (x) = wTx • Least squares with a linear function of parameters w is called “linear regression”

  16. Linear regression • Linear regression has a closed-form solution for w • The minimum is attained at the zero derivative

  17. Polynomial Curve Fitting • x is evenly distributed from [0,1] • y = f(x) + random error • y = sin(2πx) + ε, ε ~ N(0,σ)

  18. Polynomial Curve Fitting

  19. Sum-of-Squares Error Function

  20. 0th Order Polynomial

  21. 1st Order Polynomial

  22. 3rd Order Polynomial

  23. 9th Order Polynomial

  24. Over-fitting Root-Mean-Square (RMS) Error:

  25. Polynomial Coefficients

  26. Data Set Size: 9th Order Polynomial

  27. Data Set Size: 9th Order Polynomial

  28. Regularization Penalize large coefficient values Ridge regression

  29. Regularization:

  30. Regularization:

  31. Regularization: vs.

  32. Polynomial Coefficients

  33. Ridge Regression • Derive the analytic solution to the optimization problem for ridge regression Using KKT condition – first order derivative = 0

  34. Neural networks • Introduction • Different designs of NN • Feed-forward Network (MLP) • Network Training • Error Back-propagation • Regularization

  35. Introduction • Neuroscience studies how networks of neurons produce intellectual behavior, cognition, emotion and physiological responses • Computer science studies how to simulate knowledge in cognitive science, including the way neurons process signals • Artificial neural networks simulate the connectivity in the neural system, the way it passes through signal, and mimic the massively parallel operations of the human brain

  36. Common features Dentrites

  37. Different types of NN • Adaptive NN: have a set of adjustable parameters that can be tuned • Topological NN • Recurrent NN

  38. InputLayer OutputLayer HiddenLayer Different types of NN • Feed-forward NN • Multi-layer perceptron • Linear perceptron

  39. Different types of NN • Radial basis function NN (RBFN)

  40. Multi-Layer Perceptron • Layered perceptron networks can realize any logical function, however there is no simple way to estimate the parameters/generalize the (single layer) Perceptron convergence procedure • Multi-layer perceptron (MLP) networks are a class of models that are formed from layered sigmoidal nodes, which can be used for regression or classification purposes. • They are commonly trained using gradient descent on a mean squared error performance function, using a technique known as error back propagation in order to calculate the gradients. • Widely applied to many prediction and classification problems over the past 15 years.

  41. x1 w1 x2 w2 y Σ : : wt xt Linear perceptron y = w1*x1 + w2*x2 + … + wt*xt Input layer output layer Many functions can not be approximated using perceptron

  42. Multi-Layer Perceptron • XOR (exclusive OR) problem • 0+0=0 • 1+1=2=0 mod 2 • 1+0=1 • 0+1=1 • Perceptron does not work here! Single layer generates a linear decision boundary

  43. x1 W11(1) f(Σ) W11(2) x2 W21(1) y f(Σ) W22(1) : : W21(2) wt1(1) f(Σ) xt Multi-Layer Perceptron Input layer Hidden layer output layer Each link is associated with a weight, and these weights are the tuning parameters to be learned Each neuron except ones in the input layer receives inputs from the previous layer, and reports an output to next layer

  44. w1 w2 . . . S wn Each neuron f is Activation function summation • The activation function f can be • Identity function f(x) = x • Sigmoid function • Hyperbolic tangent

  45. Universal Approximation of MLP 1st layer 2nd layer 3rd layer Universal Approximation: Three-layer network can in principle approximate any function with any accuracy!

  46. x1 y x2 xt Feed-forward network function Signal flows • The output from each hidden node • The final output M nodes N nodes

  47. Network Training • A supervised neural network is a function h(x;w) that maps from inputs x to target y • Usually training a NN does not involve the change of NN structures (such as how many hidden layers or how many hidden nodes) • Training NN refers to adjusting the values of connection weights so that h(x;w) adapts to the problem • Use sum of squares as the error metric Use gradient descent

  48. Gradient descent • Review of gradient descent • Iterative algorithm containing many iterations • Each iteration, the weights w receive a small update • Terminate • until the network is stable (in other words, the training error cannot be reduced further) E(wnew) < E(w) not hold • until the error on a validation set starts to climb up (early stopping)

  49. Signal flows forwards x1 W ij y=h(x;w) x2 W jk xt N nodes M nodes Error Back-propagation • The update of the weights goes backwards because we have to use the chain rule to evaluate the gradient of E(w) Learning is backwards

  50. W ij y=h(x;w) x2 W jk xt N nodes M nodes Error Back-propagation Learning is backwards • Update the weights in the output layer first • Propagate errors from the high layer to low layer • Recall x1

More Related