230 likes | 249 Views
Lecture 1a: Linear regression with one predictor variable. Course structure. 732G21 Sambandsmodeller http://www.ida.liu.se/~732G21 One semester= Regr.analysis + + analysis of variance (teacher: Lotta Hallberg) 732G28 Regression methods http://www.ida.liu.se/~ 732G28
E N D
Lecture 1a:Linear regression with one predictor variable 732G21/732A35/732G28
Course structure • 732G21 Sambandsmodellerhttp://www.ida.liu.se/~732G21 One semester=Regr.analysis+ + analysis of variance (teacher: Lotta Hallberg)732G28 Regression methodshttp://www.ida.liu.se/~732G28 Half of semester=Regr. analysis732A35 Linear statistical modelshttp://www.ida.liu.se/~732A22 Almost onesemester=Regr. Analysis+ + analysisof variance (teacher: Lotta Hallberg) 732G21/732A35/732G28
Course structure (regression part) • Course language: English, but you may use Swedish • We use It’s learning (accessed via Student portal) (show…) • 9 Lectures • 8 Labs (computer). Deadlines, around 5 days after lab ends • 8 Lessons=I solve problems on the whiteboard + lab discussion • One written final exam • Course book: Kutner, M.H., Nachtsheim, C.J., Neter, J. and Li, W. Applied Linear Statistical Models with Student Data CD, 5th Edition, ISBN 0073108742. 732G21/732A35/732G28
Regression analysis • Linear statistical models are widely used in • Business • Economics • Engineering • Social, biological sciences • Etc Example: A database contains price of houses sold in Linköping in 2009, their age, size, other parameters. • Given parameters of a new house • determine its approximate market price • Determine reasonable price bounds 732G21/732A35/732G28
What we analyse • Analysis of databases • Observations (records, cases) in rows • Variables in columns • Explanatory variables (predictors, inputs) Xi • Response Y, we assume Y=f(X1,…,Xn) In this lecture, models with only one explanatory variable 732G21/732A35/732G28
Statistical relation and functional relation • Real data can seldom be presented as Y=βX (observation errors, missing inputs etc) Example: Age and salary for a sample of eight persons from a company. Scatterplot 732G21/732A35/732G28
Statistical relation and functional relation • Presented relation is almost linear • Linear regression analysis: find a linear finction as close as possible to the data 732G21/732A35/732G28
Regression models • For each X, there is a probability distribution P(Y=y|X=x) of Y • The aim is to find a regression function E(Y|X=x) 732G21/732A35/732G28
Regression models Construction of regression models • Selection of prediction variables (variance reduction) • Functional form (from theory, approximation) • Domain of the model Software • MINITAB • SAS • SPSS • Matlab • Excel 732G21/732A35/732G28
Simple linear model Formal statement • Yi is i th response value • β0 β1 model parameters, regression parameters (intercept, slope) • Xi is i th predictor value • is i.i.d. random vars with expectation zero and variance σ2 732G21/732A35/732G28
Simple linear model Features (show…) • All Yi and Yj are uncorrelated Meaning of regression parameters • β0 response value at X=0 • β1 change in EY per unit increase in X 732G21/732A35/732G28
Estimation of regression function Given data set Method of least squares: • Observed response Yi • Estimated response • Deviation • Regression fit is good when all deviations are minimized (see pict) -> minimimize sum of squares 732G21/732A35/732G28
Estimation of regression function • How to find minimum of Q? Estimators of β0 andβ1 732G21/732A35/732G28
Estimation of regression function Exercise(For salary data, MINITAB): • Make scatterplot (Scatterplot…, with, without regression lien) • Perform regression using ”Regression…” • Perform regression using ”Fittedlineplot..” • Calculatecoefficients by hand 732G21/732A35/732G28
Estimation of regression function 732G21/732A35/732G28
Estimation of regression function Gauss-Markov theorem • Estimators b0 and b1 are unbiased and have minimum variance among all unbiased estimators • Unbiased bias=Eb0-β0=0 Eb0=β0 • Analogously, Eb1=β1 Show illustration… 732G21/732A35/732G28
Estimation of regression function • Mean (expected response) • Point estimator of mean response (fitted value) Residuals 732G21/732A35/732G28
Estimation of regression function • Plot of residuals (obtain it with MINITAB) 732G21/732A35/732G28
Estimation of regression function • Properties of residuals • (because ) • is minimum possible • (because of 1) • , (can be shown) • Regression line always goes through 732G21/732A35/732G28
Estimation of error term variance • Estimate of variance of singlepopulation (samplevariance) • In regression, we compute s2 using residuals (look at residual plot) 732G21/732A35/732G28
Estimation of error term variance • Why divided by n-2? Because E(MSE)=σ2 • Important: In general, unbiased d - degrees of freedom, number of model parameteres Example: Computeresiduals, SSE, MSE, find it in MINITAB output 732G21/732A35/732G28
Simple regression using software • Minitab • Graph → Scatterplot • Stat → Regression • Stat->Fitted Line Plot 732G21/732A35/732G28
Reading • Course book, Ch. 1 up to page 27. 732G21/732A35/732G28