110 likes | 206 Views
Regression and Correlation. Module 8. Relationship between two variables. Changing wind speed, humidity, or other met parameters, and Pollutant concentrations. Causality?. Simultaneous change does not imply causality seat belt use on airplanes
E N D
Regression and Correlation Module 8 module 8
Relationship between two variables • Changing wind speed, humidity, or other met parameters, and • Pollutant concentrations module 8
Causality? • Simultaneous change does not imply causality • seat belt use on airplanes • ADHD rate among children and the number of child therapists in the U.S. • Snoring and sleeping with someone else in your bedroom • Other factors may be root cause of both, or may be an “artifact” of your data module 8
Linear regression: • Y=mx + b • The difference between the real y values and the predicted-based-on-a-straight-line y values is • “Residual” • This is used to calculate R squared • R squared is a measure of the difference between a perfect line and your data module 8
methods in Excel: • Create an XY chart • With chart selected, click on Chart, Add Trendline module 8
Within the chart method, cont: • Click on Options, • Display equation and R2 on chart, • can also create a regression line based on nonlinear correlation module 8
Excel method: • Use functions • =slope( Ys FIRST, Xs) • =intercept (Ys, Xs) • =steyx (Ys, Xs) • =forecast (Ys, Xs) module 8
method in Excel: • Data Analysis Toolpak • Regression • Advantages: creates a normal probability plot, if you select this option • Creates a tabled output (be careful do not write over data) module 8
R squared: • from to • how closely the estimated values for the trendline correspond to your actual data • trendline is most reliable when its R-squared value is at or near Also known as the coefficient of determination module 8
Regression vs Correlation: • Regression based on how far Ys differ from their predicted values • Regression looks at the variability in X and uses it to predict variability in Y • Correlation (aka Pearson correlation coefficient) evaluates the proportion of the y-change that is DUE to the x-y relationship • RSQ(known_y's,known_x's) module 8
Correlation: • Three excel methods: • = RSQ (Ys, Xs) • = CORREL (array) • = PEARSON (array) • Cautions: must arrange data first to use array • Check if R or R2 value is returned module 8