1 / 42

Session 1

Session 1. Outline for Session 1. Course Objectives & Description Review of Basic Statistical Ideas Intercept, Slope, Correlation, Causality Simple Linear Regression Statistical Model and Concepts Regression in Excel. Course Themes.

swann
Download Presentation

Session 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Session 1

  2. Outline for Session 1 • Course Objectives & Description • Review of Basic Statistical Ideas • Intercept, Slope, Correlation, Causality • Simple Linear Regression • Statistical Model and Concepts • Regression in Excel Applied Regression -- Prof. Juran

  3. Course Themes • Learn useful and practical tools of regression and data analysis • Learn by example and by doing • Learn enough theory to use regression safely Applied Regression -- Prof. Juran

  4. Shape the course experience to meet your goals • The agenda is flexible • Pick your own project • The professor also enjoys learning • Let’s enjoy ourselves – life is too short Applied Regression -- Prof. Juran

  5. Basic Information www.columbia.edu/~dj114/b8114.htm Teaching Assistant: • DavideCrapis Applied Regression -- Prof. Juran

  6. Basic Requirements • Come to class and participate • Cases about once per week • Project Applied Regression -- Prof. Juran

  7. What is Regression Analysis? • A Procedure for Data Analysis • Regression analysis is a family of mathematical procedures for fitting functions to data. • The most basic procedure -- simple linear regression -- fits a straight line to a set of data so that the sum of the squared “y deviations” is minimal. Regression can be used on a completely pragmatic basis. Applied Regression -- Prof. Juran

  8. What is Regression Analysis? • A Foundation for Statistical Inference • If special statistical conditions hold, the regression analysis: • Produces statistically “best” estimates of the “true” underlying relationship and its components • Provides measures of the quality and reliability of the fitted function • Provides the basis for hypothesis tests and confidence and prediction intervals Applied Regression -- Prof. Juran

  9. Some Regression Applications • Determining the factors that influence energy consumption in a detergent plant • Measuring the volatility of financial securities • Determining the influence of ambient launch temperature on Space Shuttle o-ring burn through. • Identifying demographic and purchase history factors that predict high consumer response to catalog mailings • Mounting a legal defense against a charge of sex discrimination in pay. • Determining the cause of leaking antifreeze bottles on a packing line. • Measuring the fairness of CEO compensation • Predicting monthly champagne sales Applied Regression -- Prof. Juran

  10. Course Outline • Basics of regression • Bottom: inferences about effects of independent variables on the dependent variable • Middle: Analysis of Variance • Top: summary measures for the model Applied Regression -- Prof. Juran

  11. Course Outline • Advanced Regression Topics • Interval Estimation • Full Model with Arrays • Qualitative Variables • Residual Analysis • Thoughts on Nonlinear Regression • Model-building Ideas • Multicollinearity • Autocorrelation, serial correlation Applied Regression -- Prof. Juran

  12. Course Outline • Related Topics • Chi-square Goodness-of-Fit Tests • Forecasting Methods • Exponential Smoothing • Regression • Two Multivariate Methods • Cluster Analysis • Discriminant Analysis • Binary Logistic Regression Applied Regression -- Prof. Juran

  13. The Theory Underlying Simple Linear Regression Regression can always be used to fit a straight line to a set of data. It is a relatively easy computational task (Excel, Minitab, etc.) . If specified conditions hold, statistical theory can be employed to evaluate the quality and reliability of the line - for prediction of future events. Applied Regression -- Prof. Juran

  14. The Standard Statistical Model • Y: the “dependent” random variable, the effect or outcome that we wish to predict or understand. • X: the “independent” deterministic variable, an input, cause or determinant that may cause, influence, explain or predict the values of Y. The dependent random variable The independent deterministic variable The parameters of the “true” regression relationship A random “noise” factor Applied Regression -- Prof. Juran

  15. Regression Assumptions The expected value of Y is a linear function of X: The variance of Y does not change with X: Applied Regression -- Prof. Juran

  16. Regression Assumptions Random variations at different X values are uncorrelated: Random variations from the regression line are normally distributed: Applied Regression -- Prof. Juran

  17. Thoughts on Linearity The significance of the word “linear” in the linear regression model is not linearity in the X’s, it is linearity in the Betas (the slope coefficients). Consider the following variants – both of which are linear: Applied Regression -- Prof. Juran

  18. There are many creative ways to fit non-linear functions by linear regression. Consider a few popular linearizations: Time permitting, we will look at some of these possibilities later in the course. These may present interesting opportunities for student term projects. Applied Regression -- Prof. Juran

  19. ˆ ˆ b b b b We seek g ood estimators of and of that minimize the sums of the 0 0 1 1 squared residuals (errors). The residual is i th ˆ ˆ = - b + b = ( ), 1 , 2 ,..., e y x i n 0 1 i i i Regression Estimators We are given the data set: Applied Regression -- Prof. Juran

  20. Computer Repair Example Applied Regression -- Prof. Juran

  21. Statistical Basics Basic statistical computations and graphical displays are very helpful in doing and interpreting a regression. We should always compute: Applied Regression -- Prof. Juran

  22. Applied Regression -- Prof. Juran

  23. We should always plot histograms of the y and x values, a time order plot of x and y (if appropriate) and a scatter plot of y on x. Graphical Analysis Applied Regression -- Prof. Juran

  24. Applied Regression -- Prof. Juran

  25. Applied Regression -- Prof. Juran

  26. Applied Regression -- Prof. Juran

  27. Estimating Parameters • Using Excel • Using Solver • Using analytical formulas Applied Regression -- Prof. Juran

  28. Using Excel (Scatter Diagram) Applied Regression -- Prof. Juran

  29. Applied Regression -- Prof. Juran

  30. Using Excel (Data Analysis) Data Tab – Data Analysis Applied Regression -- Prof. Juran

  31. Using Excel (Data Analysis) Applied Regression -- Prof. Juran

  32. Using Solver Applied Regression -- Prof. Juran

  33. Applied Regression -- Prof. Juran

  34. Applied Regression -- Prof. Juran

  35. Using Formulas RABE 2.13 RABE 2.13 Applied Regression -- Prof. Juran

  36. Applied Regression -- Prof. Juran

  37. Correlation and Regression There is a close relationship between regression and correlation. The correlation coefficient, , measures the degree to which random variables X and Y move together or not.  = +1 implies a perfect positive linear relationship while  = -1 implies a perfect negative linear relationship.  = 0 essentially implies independence. Applied Regression -- Prof. Juran

  38. Statistical Basics: Covariance The covariance can be calculated using: or equivalently Usually, we find it more useful to consider the coefficient of correlation. That is, Sometimes the inverse relation is useful: Applied Regression -- Prof. Juran

  39. Correlation and Regression • The sample (Pearson) correlation coefficient is • Regressions automatically produce an estimate of the squared correlation called R2 or R-square. Values of R-square close to 1 indicate a strong relationship while values close to 0 indicate a weak or non-existent relationship Applied Regression -- Prof. Juran

  40. Some Validity Issues • We need to evaluate the strength of the relationship, whether we have the proper functional form, and the validity of the several statistical assumptions from a practical and theoretical viewpoint using a multiplicity of tools. • Fitted regression functions are interpolations of the data in hand, and extrapolation is always dangerous. Moreover, the functional form that fits the data in our range of “experience” may not fit beyond it. Applied Regression -- Prof. Juran

  41. Regressions are based on past data. Why should the same functional form and parameters hold in the future? • In some uses of regression the future value of x may not be known – this adds greatly to our uncertainty. • In collecting data to do a regression choose x values wisely – when you have a choice. They should: • Be in the range where you intend to work • Be spread out along the range with some observations near practical extremes • Have replicated values at the same x or at very nearby x values for good estimation of  • Whenever possible test the stability of your model with a “holdout” sample, not used in the original model fitting. Applied Regression -- Prof. Juran

  42. Summary • Course Objectives & Description • Review of Basic Statistical Ideas • Intercept, Slope, Correlation, Causality • Simple Linear Regression • Statistical Model and Concepts • Regression in Excel • Computer Repair Example Applied Regression -- Prof. Juran

More Related