430 likes | 590 Views
Session 1. Outline for Session 1. Course Objectives & Description Review of Basic Statistical Ideas Intercept, Slope, Correlation, Causality Simple Linear Regression Statistical Model and Concepts Regression in Excel. Course Themes.
E N D
Outline for Session 1 • Course Objectives & Description • Review of Basic Statistical Ideas • Intercept, Slope, Correlation, Causality • Simple Linear Regression • Statistical Model and Concepts • Regression in Excel Applied Regression -- Prof. Juran
Course Themes • Learn useful and practical tools of regression and data analysis • Learn by example and by doing • Learn enough theory to use regression safely Applied Regression -- Prof. Juran
Shape the course experience to meet your goals • The agenda is flexible • Pick your own project • The professor also enjoys learning • Let’s enjoy ourselves – life is too short Applied Regression -- Prof. Juran
Basic Information www.columbia.edu/~dj114/b8114.htm Teaching Assistant: • DavideCrapis Applied Regression -- Prof. Juran
Basic Requirements • Come to class and participate • Cases about once per week • Project Applied Regression -- Prof. Juran
What is Regression Analysis? • A Procedure for Data Analysis • Regression analysis is a family of mathematical procedures for fitting functions to data. • The most basic procedure -- simple linear regression -- fits a straight line to a set of data so that the sum of the squared “y deviations” is minimal. Regression can be used on a completely pragmatic basis. Applied Regression -- Prof. Juran
What is Regression Analysis? • A Foundation for Statistical Inference • If special statistical conditions hold, the regression analysis: • Produces statistically “best” estimates of the “true” underlying relationship and its components • Provides measures of the quality and reliability of the fitted function • Provides the basis for hypothesis tests and confidence and prediction intervals Applied Regression -- Prof. Juran
Some Regression Applications • Determining the factors that influence energy consumption in a detergent plant • Measuring the volatility of financial securities • Determining the influence of ambient launch temperature on Space Shuttle o-ring burn through. • Identifying demographic and purchase history factors that predict high consumer response to catalog mailings • Mounting a legal defense against a charge of sex discrimination in pay. • Determining the cause of leaking antifreeze bottles on a packing line. • Measuring the fairness of CEO compensation • Predicting monthly champagne sales Applied Regression -- Prof. Juran
Course Outline • Basics of regression • Bottom: inferences about effects of independent variables on the dependent variable • Middle: Analysis of Variance • Top: summary measures for the model Applied Regression -- Prof. Juran
Course Outline • Advanced Regression Topics • Interval Estimation • Full Model with Arrays • Qualitative Variables • Residual Analysis • Thoughts on Nonlinear Regression • Model-building Ideas • Multicollinearity • Autocorrelation, serial correlation Applied Regression -- Prof. Juran
Course Outline • Related Topics • Chi-square Goodness-of-Fit Tests • Forecasting Methods • Exponential Smoothing • Regression • Two Multivariate Methods • Cluster Analysis • Discriminant Analysis • Binary Logistic Regression Applied Regression -- Prof. Juran
The Theory Underlying Simple Linear Regression Regression can always be used to fit a straight line to a set of data. It is a relatively easy computational task (Excel, Minitab, etc.) . If specified conditions hold, statistical theory can be employed to evaluate the quality and reliability of the line - for prediction of future events. Applied Regression -- Prof. Juran
The Standard Statistical Model • Y: the “dependent” random variable, the effect or outcome that we wish to predict or understand. • X: the “independent” deterministic variable, an input, cause or determinant that may cause, influence, explain or predict the values of Y. The dependent random variable The independent deterministic variable The parameters of the “true” regression relationship A random “noise” factor Applied Regression -- Prof. Juran
Regression Assumptions The expected value of Y is a linear function of X: The variance of Y does not change with X: Applied Regression -- Prof. Juran
Regression Assumptions Random variations at different X values are uncorrelated: Random variations from the regression line are normally distributed: Applied Regression -- Prof. Juran
Thoughts on Linearity The significance of the word “linear” in the linear regression model is not linearity in the X’s, it is linearity in the Betas (the slope coefficients). Consider the following variants – both of which are linear: Applied Regression -- Prof. Juran
There are many creative ways to fit non-linear functions by linear regression. Consider a few popular linearizations: Time permitting, we will look at some of these possibilities later in the course. These may present interesting opportunities for student term projects. Applied Regression -- Prof. Juran
ˆ ˆ b b b b We seek g ood estimators of and of that minimize the sums of the 0 0 1 1 squared residuals (errors). The residual is i th ˆ ˆ = - b + b = ( ), 1 , 2 ,..., e y x i n 0 1 i i i Regression Estimators We are given the data set: Applied Regression -- Prof. Juran
Computer Repair Example Applied Regression -- Prof. Juran
Statistical Basics Basic statistical computations and graphical displays are very helpful in doing and interpreting a regression. We should always compute: Applied Regression -- Prof. Juran
We should always plot histograms of the y and x values, a time order plot of x and y (if appropriate) and a scatter plot of y on x. Graphical Analysis Applied Regression -- Prof. Juran
Estimating Parameters • Using Excel • Using Solver • Using analytical formulas Applied Regression -- Prof. Juran
Using Excel (Scatter Diagram) Applied Regression -- Prof. Juran
Using Excel (Data Analysis) Data Tab – Data Analysis Applied Regression -- Prof. Juran
Using Excel (Data Analysis) Applied Regression -- Prof. Juran
Using Solver Applied Regression -- Prof. Juran
Using Formulas RABE 2.13 RABE 2.13 Applied Regression -- Prof. Juran
Correlation and Regression There is a close relationship between regression and correlation. The correlation coefficient, , measures the degree to which random variables X and Y move together or not. = +1 implies a perfect positive linear relationship while = -1 implies a perfect negative linear relationship. = 0 essentially implies independence. Applied Regression -- Prof. Juran
Statistical Basics: Covariance The covariance can be calculated using: or equivalently Usually, we find it more useful to consider the coefficient of correlation. That is, Sometimes the inverse relation is useful: Applied Regression -- Prof. Juran
Correlation and Regression • The sample (Pearson) correlation coefficient is • Regressions automatically produce an estimate of the squared correlation called R2 or R-square. Values of R-square close to 1 indicate a strong relationship while values close to 0 indicate a weak or non-existent relationship Applied Regression -- Prof. Juran
Some Validity Issues • We need to evaluate the strength of the relationship, whether we have the proper functional form, and the validity of the several statistical assumptions from a practical and theoretical viewpoint using a multiplicity of tools. • Fitted regression functions are interpolations of the data in hand, and extrapolation is always dangerous. Moreover, the functional form that fits the data in our range of “experience” may not fit beyond it. Applied Regression -- Prof. Juran
Regressions are based on past data. Why should the same functional form and parameters hold in the future? • In some uses of regression the future value of x may not be known – this adds greatly to our uncertainty. • In collecting data to do a regression choose x values wisely – when you have a choice. They should: • Be in the range where you intend to work • Be spread out along the range with some observations near practical extremes • Have replicated values at the same x or at very nearby x values for good estimation of • Whenever possible test the stability of your model with a “holdout” sample, not used in the original model fitting. Applied Regression -- Prof. Juran
Summary • Course Objectives & Description • Review of Basic Statistical Ideas • Intercept, Slope, Correlation, Causality • Simple Linear Regression • Statistical Model and Concepts • Regression in Excel • Computer Repair Example Applied Regression -- Prof. Juran