1 / 10

Linear Models for Prediction with Numeric Attributes

Explore linear regression and other linear models for predicting the behavior of entities based on numeric attributes. Learn how to minimize error and find the best-fit line for the data.

Download Presentation

Linear Models for Prediction with Numeric Attributes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining CSCI 307, Spring 2019 Lecture 19 Linear Models

  2. Linear Models: Work Naturally with Numeric Attributes • Linear Regression • Small training set does pretty well. • Finds weights for each attribute. • Plug in new instance values and produce the class. • The value of the class is continuous. • Produces a line. • Other Models • Logistic Regression: The predictors are continuous, but the value of the class is discrete (dead or alive). S-shaped division. • The Perceptron Algorithm (additive weight scheme) • The Winnow Algorithm (multiplicative scheme)

  3. Goal: Build a Mathematical Model to Predict the Behavior of Group/Entities The Variables • Explanatory (predictors): x1,x2,...,xn Attributes (one or many) of the entity -- could be numerical (continuous, discrete) -- could be categorical (two or more groups, but often binary), map category to numerical values (0, 1) • Response: y Quantifies the behavior of the entity -- could be numerical or categorical

  4. Model #1: Linear Simple Linear: Ypred = a + bX x and y are often both continuous in R Xi(known) ==> Yipred = a(unknown) + b(unknown)Xi Want to find a and b based on the known input data (the attributes). Linear Regression: Finds the line that best fits the data. We do this by minimizing the error between the predicted y and the actual y.

  5. Linear Regression Error Y = a + bX Find a and b such that is minimized (This is the Least Squares method). M Given {xi , yi}i=1 Predicted y Actual values of y

  6. Regression Analysis Regression analysis:A collective name for techniques for the modeling and analysis of numerical data consisting of values of a dependent variable(also called response variableor measurement) and of one or more independent variables (aka. explanatory variablesor predictors) y y = x + 1 x • The parameters are estimated so as to give a "best fit" of the data • Most commonly the best fit is evaluated by using the least squares method, but other criteria have also been used

  7. Linear Models: Linear Regression • Work most naturally with numeric attributes • Standard technique for numeric prediction • Outcome is linear combination of attributes x = w0 + w1a1 + w2a2 + ...+ wkak • Weights are calculated from the training data • Predicted value for first training instance a(1) w0a0(1)+ w1a1(1)+ w2a2(1)+ ...+ wkak(1) = (assuming each instance is extended with a constant attribute with value 1)

  8. Minimizing the Squared Error • Choose k+1 coefficients to minimize the squared error on the training data • Squared error: • Derive coefficients using standard matrix operations • Can be done if there are more instances than attributes (roughly speaking) • Minimizing the absolute error is more difficult

  9. Linear Regression • Most common type of regression analysis • Assumes a linear relation exists between the dependent variable and the independent variable(s) that we choose to evaluate. • Produces an equation (or "model") for a "best fit" line to describe the relation. • If the data exhibits a nonlinear dependency, a line will be found anyway and it may not fit well.

  10. Classification • Any regression technique can be used for classification • Training: perform a regression for each class, setting the output to 1 for training instances that belong to class, and 0 for those that don’t • Prediction: predict class corresponding to model with largest output value (membership value) • For linear regression this is known as multi-response linear regression • Problem: membership values are not in [0,1] range, so aren't proper probability estimates

More Related