210 likes | 324 Views
Survey of Statistical Methods. April 18, 2005. A little History about Regression…. Sir Francis Galton coined the term regression during his study of heredity laws. He observed that physical characteristics of children were correlated with those of their fathers.
E N D
Survey of Statistical Methods April 18, 2005
A little History about Regression… • Sir Francis Galton coined the term regression during his study of heredity laws. • He observed that physical characteristics of children were correlated with those of their fathers. • Noted that tall fathers tended to have shorter sons, and short fathers tended to have taller sons. He called this phenomenon “regression toward the mean”. Image from: http://www.york.ac.uk/depts/maths/histstat/people/sources.htm
Correlation vs. Regression • Correlation provides a measurement of the overall relationship between two variables • In bivariate regression, you are attempting to predict a score on Y from a score on X • In multiple regression, you are attempting to predict a score on Y from several X variables • No causal association can be assumed • Predictions are based on a set of rules that define “best fit”
Types of Regression Analyses • Simple Linear Regression • Two variables X and Y – both are interval/ratio level data • Multiple Regression • Multiple X variables (X’s can be at any level of measurement) are used to predict a single Y variable (Y must be continuous and interval/ratio) • Logistic Regression • Multiple X variables (any level of measurement) are used to predict a single Y variable (categorical variable) • Canonical Correlation • Multiple X variables are used to predict multiple Y variables
Key Assumptions forSimple Linear Regression Analysis • Representative • The sample must be representative of the population to which the inference will be made • X and Y are linearly related • When the two scores are graphed, they should tend to form a straight line • Normality • The dependent variable (Y) must be approximately normally distributed • Homoscedasticity • For every value of X, the distribution of Y scores must have approximately equal variability
Equation for a line:Y= bX + a slope intercept
“Best” is most often defined by the Least Squares Criterion • “Least Squares Criterion” means than the sum of the squared residuals is a minimum.
Y=bx+a • Y=bx+a+ • All regression equations have an element of “error” • When reporting a regression equation, you need to also report statistics describing the “error” in your equation.
Checking for Violation of Assumptions…….“Analysis of the Residuals”
Statistics that describe the “Error” in a Regression Equation • Standard Error of the Estimate (SEE) • The average error made in estimating Y from X. This is an indication of the accuracy of estimation. • R2 • Proportion of explained variance. Expressed as a % • Confidence Intervals • for b (slope) • for a (intercept)
Standard Error of the Estimate • Standard Error of the Estimate (SEE) • The average error made in estimating Y from X. This is an indication of the accuracy of estimation. For any given X, 95% of the Y scores will lie within 1.96*SEE of the predicted Y score. • Assumption of Homoscedasticity • equal variance of Y scores for each value of X
R2 • The percent of the variance in Y that is explained by X • Values can range from 0 (no variability explained) to 1 (perfect prediction). • R2 is sensitive to sample size • Adjusted R2 value is usually reported in the literature since it corrects for sample size. • Adjusted R2 < R2
Using SPSS to conduct aMultiple Regression Analysis • Analyze – Regression -- Linear
Using SPSS to conduct aMultiple Regression Analysis • Analyze – Regression -- Linear
What should you be looking for in the output from the Regression Analysis? • Check to see if you have met the assumptions (“Analysis of the Residuals”) • Normality • Homoscedasticity • Check to see that the Regression Equation is ‘significant’ • If all it OK with these values… you can proceed • If not, stop… you can not use a regression analysis
What values do you report? • Adjusted R2 • Standard Error of the Estimate • Regression Equation • If the purpose of the study is to identify the ‘greatest predictors’ of Y, then report the education using the Standardized Beta coefficients • If the purpose of the study is to create a prediction equation that someone can use to predict a persons score on Y, report the unstandardized B coefficients along with their standard error.
Assignment for next class • Using the Pico (2002) article as a guide, create a (i.e. one) summary table for the bodyfat data analysis • Hint: Focus on Table 3 and the discussion about table 3 that is found on page 132 as a guide.