1 / 21

Survey of Statistical Methods

Survey of Statistical Methods. April 18, 2005. A little History about Regression…. Sir Francis Galton coined the term regression during his study of heredity laws. He observed that physical characteristics of children were correlated with those of their fathers.

tokala
Download Presentation

Survey of Statistical Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Survey of Statistical Methods April 18, 2005

  2. A little History about Regression… • Sir Francis Galton coined the term regression during his study of heredity laws. • He observed that physical characteristics of children were correlated with those of their fathers. • Noted that tall fathers tended to have shorter sons, and short fathers tended to have taller sons. He called this phenomenon “regression toward the mean”. Image from: http://www.york.ac.uk/depts/maths/histstat/people/sources.htm

  3. Correlation vs. Regression • Correlation provides a measurement of the overall relationship between two variables • In bivariate regression, you are attempting to predict a score on Y from a score on X • In multiple regression, you are attempting to predict a score on Y from several X variables • No causal association can be assumed • Predictions are based on a set of rules that define “best fit”

  4. Types of Regression Analyses • Simple Linear Regression • Two variables X and Y – both are interval/ratio level data • Multiple Regression • Multiple X variables (X’s can be at any level of measurement) are used to predict a single Y variable (Y must be continuous and interval/ratio) • Logistic Regression • Multiple X variables (any level of measurement) are used to predict a single Y variable (categorical variable) • Canonical Correlation • Multiple X variables are used to predict multiple Y variables

  5. Key Assumptions forSimple Linear Regression Analysis • Representative • The sample must be representative of the population to which the inference will be made • X and Y are linearly related • When the two scores are graphed, they should tend to form a straight line • Normality • The dependent variable (Y) must be approximately normally distributed • Homoscedasticity • For every value of X, the distribution of Y scores must have approximately equal variability

  6. Equation for a line:Y= bX + a slope intercept

  7. Slope & Intercept

  8. “Best” is most often defined by the Least Squares Criterion • “Least Squares Criterion” means than the sum of the squared residuals is a minimum.

  9. Y=bx+a • Y=bx+a+ • All regression equations have an element of “error” • When reporting a regression equation, you need to also report statistics describing the “error” in your equation.

  10. Assumption #1 - Linear Relationship

  11. Assumption #2 - Homoscedasticity

  12. Checking for Violation of Assumptions…….“Analysis of the Residuals”

  13. Statistics that describe the “Error” in a Regression Equation • Standard Error of the Estimate (SEE) • The average error made in estimating Y from X. This is an indication of the accuracy of estimation. • R2 • Proportion of explained variance. Expressed as a % • Confidence Intervals • for b (slope) • for a (intercept)

  14. Standard Error of the Estimate • Standard Error of the Estimate (SEE) • The average error made in estimating Y from X. This is an indication of the accuracy of estimation. For any given X, 95% of the Y scores will lie within 1.96*SEE of the predicted Y score. • Assumption of Homoscedasticity • equal variance of Y scores for each value of X

  15. R2 • The percent of the variance in Y that is explained by X • Values can range from 0 (no variability explained) to 1 (perfect prediction). • R2 is sensitive to sample size • Adjusted R2 value is usually reported in the literature since it corrects for sample size. • Adjusted R2 < R2

  16. Using SPSS to conduct aMultiple Regression Analysis • Analyze – Regression -- Linear

  17. Using SPSS to conduct aMultiple Regression Analysis • Analyze – Regression -- Linear

  18. What should you be looking for in the output from the Regression Analysis? • Check to see if you have met the assumptions (“Analysis of the Residuals”) • Normality • Homoscedasticity • Check to see that the Regression Equation is ‘significant’ • If all it OK with these values… you can proceed • If not, stop… you can not use a regression analysis

  19. What values do you report? • Adjusted R2 • Standard Error of the Estimate • Regression Equation • If the purpose of the study is to identify the ‘greatest predictors’ of Y, then report the education using the Standardized Beta coefficients • If the purpose of the study is to create a prediction equation that someone can use to predict a persons score on Y, report the unstandardized B coefficients along with their standard error.

  20. Assignment for next class • Using the Pico (2002) article as a guide, create a (i.e. one) summary table for the bodyfat data analysis • Hint: Focus on Table 3 and the discussion about table 3 that is found on page 132 as a guide.

More Related