90 likes | 257 Views
Regression using lm lmRegression.R. Basics Prediction World Bank CO2 Data. Simple Linear regression. Simple linear model: y = b 1 + x b 2 + error y: the dependent variable x: the independent variable b 1 , b 2 : intercept and slope coefficients
E N D
Regression using lmlmRegression.R Basics Prediction World Bank CO2 Data
Simple Linear regression • Simple linear model: y = b1 + x b2+ error y: the dependent variable x: the independent variable b1, b2 : intercept and slope coefficients error: random departures between the model and the response. Coefficients estimated by least squares
Multiple regression • y = b0+ x1 b1+ x2b2 + x3b3 + … + error
Annual Boulder Temperatures Temperature is dependent variable, Year is the independent variable Errors =???? Linear =???
CO 2 Emissions by Country • Independent: GDP/capita • Dependent: CO2 emission • Linear?? Errors ??
The R lm function • Takes a formula to describe the regression where ~means equals • Works best when the data set is a data frame • Returns a complicated list that can be used in summary, predict, print plot lmFit <- lm( y ~ x1 + x2)
Or more generallyusing a data frame lmFit <- lm( y ~ x1 + x2, data=dataset) dataset$y, dataset$x1, dataset$x2
Analysis of World Bank data set • Best to work on a log scale and GDP has the strongest linear relationship • Some additional pattern leftover in the residuals • Try other variables • Try a more complex curve • Check the predictions using cross-validation
Leave-one-out Cross-validation • Robust way to check a models predictions and the uncertainty measure • Four steps: • Sequentially leave out each observation • Refit model with remaining data • Predict the omitted observation • Compare prediction and confidence interval to the actual observation A check on the consistency of the statistical model Because omitted observation is not used to make prediction