Prediction with Regression

Prediction with Regression An Introduction to Linear Regression and Shrinkage Methods EhsanKhoddamMohammadi

Outline • Prediction • Estimation • Bias Variance Trade-Off • Regression • Ordinary Least square • Ridge regression • Lasso

Predictiondefinition • set of inputs: X1, X2, …, Xp • the output: Y • We want to analyze the relationship between these variables (interpretation) • We want to estimate output based on inputs (prediction)

Predictionsame concept in different literatures • Machine learning: supervised learning • Finance: forecasting • Politics: prediction • Estimation theory: function approximation

RegressionWhy? • Well-performed and accurate in both Interpretation and Prediction • Strong fundamental in math, statistics and computation • Many modern and advanced methods are based on Regression, even they are variant of regression • New methods are still invented for regression: • Nobel prize are still given to investigations in regression, Hot topic • Could be formulated as optimization problem: • that’s the reason I choose it for this class, it’s more related to subject of class than any other methods I’ve known for prediction

Regressionclassification • Linear Regression • Least square • Best sub-sut Selection, Regression with feature selection • Stepwise Regression • Shrinkage regularization for Regression: • Ridge Regression • Lasso Regression • Non-Linear Regression • Numerical Data fitting • ANN • Discrete regression • Logistic Regression

Before proceeding with regression Let’s investigate on some statistical property of ESTIMATION

Estimating the parameter • assume that we have iid (identically independent distributed) samples X1, . . . ,Xn with unknown distribution. • Estimating p.d.f of them is too hard in many situations, Instead of that, We want to estimate a parameter θ . • is estimation of θ, it is function of X1, . . . ,Xn .

Bias-Variance dilemma • Definition 1 : The bias of an estimator is . If it is 0, the estimator is said to be unbiased. • Definition 2 : The mean squared error (MSE) of an estimator is . • An interesting equation: What does it really mean?

[Image from “More on Regularization and (Generalized) Ridge Operators”, Takane,(2007)]

Test and training error as a function of model complexity. [ Image from “The Elements of Statistical Learning”,Second Edition, Hastie et al. (2008)]

Linear RegressionModel • Set of training data : • Linear Regression model: • Real-valued coefficients β need to be estimated

Linear RegressionLeast square • Most popular estimation method • Minimize the Residual Sum of Squares: How do we minimize it?

Linear RegressionLeast square • Let’s rewrite last formula in this form: • Quadratic function (not a point here but we shall use this property later) • Differentiating respect to β and set it to zero: • Unique Solution: ; Under which assumptions we could obtain unique solution?

Linear RegressionLeast square, Assumptions • X should be full-rank, hence is p.d and invertible, unique solution could be obtained • In another word, features vectors should be linearly independent or uncorrelated • What will be happened to β if X would be non-full-rank matrix or some features would be highly correlated?

Linear RegressionLeast square, flaws • Low bias but High variance: and one could estimate Var(y) by: • It’s hard to find meaning-full relation if we have too many features. What would you recommend to solve these problems?

Linear RegressionImprovements • Model Selection (Feature Selection): • Best-Subset Selection (Branch and Leap , Furnival (1974)) • Step-wise Selection (Greedy approach, sub-optimal but preferred) • mRMR (using mutual information criterion for selection) • Shrinkage Methods: impose constraint on β • Ridge Regression • Lasso Regression

Ridge Regression • When you have a problem want to be solved in statistics, There is always a Russian statistician waiting for you to solve it. (Be careful! just in statistics I guarantee , they will betray you in any other situations) • Andrey Nikolayevich Tychonoffprovides a Tikhonov (!!!) regularization for ill-posed problems , Also known as Ridge Regression in statistics.

Ridge Regressionfirst attempt • Remember this?: Tychonoff added a term to avoid singularity and changed above formula to this: Now, the inverse could be computed even if Is not of full-rank, Also β is still linear function of y. Every thing start from above formula but now we have better point of view than Tychonoff, let’s take a look!

Ridge Regressionbetter motivation • To avoid high variance of β we just impose a constraint on it, our problem is now an optimization problem with constraints.

Even better representation: using lagrangian form Or again even better! in matrix representation form, we could differentiate this formula and set it to zero Could you guess the solution? Could you find a relation between β and βridge when inputs are orthonormal?

LASSO Least Absolute Selection and Shrinkage Operator

LASSO • We impose L1-norm constraint on our regression • No close form exists, it’s non-linear function of y How could you solve above problem? (hint: ask Mr.Iranmehr!)

LASSOWhy? • First attempt for usage of L1-norm, show significant results in signal processing, denoising [Chen et al. (1998)] • Base method for LAR (new and novel method for regression, not covered here) [Efron et al. (2004)] • Good for Sparse model selection where p>N [Donoho (2006b)]

REFERENCES • “The Elements of Statistical Learning”, Second Edition, Hastie et al. , 2008 • “More on Regularization and (Generalized) Ridge Operators”, Takane, 2007 • “Bias, Variance and MSE of Estimators”, Guy Lebanon, 2004 • “Least Squares Optimization with L1-Norm Regularization”, Mark Schmidt, 2005 • “Regularization: Ridge Regression and the LASSO”, Tibshirani, 2006

Prediction with Regression

Prediction with Regression

Presentation Transcript

Regression Analysis with SPSS

Regression Analysis with SAS

Chapter 15: Describing Relationships: Regression, Prediction, and Causation

Logistic Regression and Perceptron Prediction of Instruction Branches

Prediction variance in Linear Regression

Regression With Categorical Variables

Section 6.2: Regression, Prediction, and Causation

Section 6.2: Regression, Prediction, and Causation

Prediction (Classification, Regression)

Regression with ARMA Errors

Poisson Regression with Rates

Prediction with Regression Analysis (HK: Chapter 7.8)

Electricity Price Prediction Using Logistic Smooth Transition Regression

Regression Analysis with SPSS

Inference About Prediction and Checking the Regression Assumptions

Regression with 2 IVs

Regression With Categorical Variables

A Heart Disease Prediction Model using Logistic Regression

Poisson Regression with Rates

Linear Regression with R

Regression with Autocorrelated Errors