360 likes | 690 Views
Multiple Regression Analysis (MRA). Design requirements Multiple regression model R 2 Comparing standardized regression coefficients. Steps in data analysis. Look first at each variable separately Then at relationships among the variables
E N D
Multiple Regression Analysis (MRA) Design requirements Multiple regression model R2 Comparing standardized regression coefficients
Steps in data analysis • Look first at each variable separately • Then at relationships among the variables • Examine the distribution of each variable to be used in multiple regression to determine if there are any unusual patterns that may be important in building our regression analysis.
Correlation Analysis • If interested only in determining whether a relationship exists, use correlation analysis. • Example: Student’s height and weight.
Correlation Analysis • Correlation coefficient close to +1=strong positive relationship. • Correlation coefficient close to -1= strong negative relationship. • Correlation coefficient close to 0= no relationship.
Example: Self Concept and Academic Achievement (N=103)Correlation
Multiple Regression Analysis (MRA) • Method for studying the relationship between a dependent variable and two or more independent variables. • Purposes: • Prediction • Explanation • Theory building
Design Requirements • One dependent variable (criterion) • Two or more independent variables (predictor variables). • Sample size: >= 50 (at least 10 times as many cases as independent variables)
Assumptions • Independence: The scores of any particular subject are independent of the scores of all other subjects • Normality: In the population, the scores on the dependent variable are normally distributed for each of the possible combinations of the level of the X variables; each of the variables is normally distributed
Assumptions • Homoscedasticity: In the population, the variances of the dependent variable for each of the possible combinations of the levels of the X variables are equal. • Linearity: In the population, the relation between the dependent variable and the independent variable is linear when all the other independent variables are held constant.
Linear regression • In simple linear regression the relationship between one explanatory variable (IV) and one response variable (DV). • In multiple regression, several explanatory variables work together to explain the dependent variable.
What is a Model? Representation of Some Phenomenon (Non-Math/Stats Model)
What is a Math/Stats Model? Describe Relationship between Variables Types • Deterministic Models (no randomness) • Probabilistic Models (with randomness)
Deterministic Models • Hypothesize Exact Relationships • Suitable When Prediction Error is Negligible • Example: Body mass index (BMI) is measure of body fat based on this formula. • Non-metric Formula: BMI = Weight (pounds)x703 (Height in inches)2
Probabilistic Models • Hypothesize 2 Components • Deterministic • Random Error • Example: Systolic blood pressure (SBP) of newborns is 6 Times the Age in days + Random Error • SBP = 6xage(d)+ • Random Error May Be Due to Factors Other than age in days (e.g. Birth weight)
Regression Models • Relationship between one dependent variable and explanatory variable(s) • Use equation to set up relationship • Numerical Dependent (Response) Variable • 1 or More Numerical or Categorical Independent (Explanatory) Variables • Used Mainly for Prediction & Estimation
Regression Modeling Steps • 1. Hypothesize Deterministic Component • Estimate Unknown Parameters • 2. Specify Probability Distribution of Random Error Term • Estimate Standard Deviation of Error • 3. Evaluate the fitted Model • 4. Use Model for Prediction & Estimation
Multiple Regression • Very popular among social scientists. • Most social phenomena have more than one cause. • Very difficult to manipulate just one social variable through experimentation. • Social scientists must attempt to model complex social realities to explain them.
Multiple Regression • Allows us to: • Use several variables at once to explain the variation in a continuous dependent variable. • Isolate the unique effect of one variable on the continuous dependent variable while taking into consideration that other variables are affecting it too. • Write a mathematical equation that tells us the overall effects of several variables together and the unique effects of each on a continuous dependent variable. • Control for other variables to demonstrate whether bivariate relationships are spurious
*** Multiple Regression • For example: A researcher may be interested in the relationship between Education and Income and Number of Children in a family. Independent Variables Education Family Income Dependent Variable Number of Children
Multiple Regression • For example: • Research Hypothesis: As education of respondents increases, the number of children in families will decline (negative relationship). • Research Hypothesis: As family income of respondents increases, the number of children in families will decline (negative relationship). Independent Variables Education Family Income Dependent Variable Number of Children
Multiple Regression • For example: • Null Hypothesis: There is no relationship between education of respondents and the number of children in families. • Null Hypothesis: There is no relationship between family income and the number of children in families. Independent Variables Education Family Income Dependent Variable Number of Children
Multiple Regression 57% of the variation in number of children is explained by education and income!
Explaining Variation: How much? Predictable variation by combination of independent variables Total Variation in Y Unpredictable Variation
Proportion of Predictable and Unpredictable Variation (1-R2)= Unpredictable (unexplained) variation in Y Where: Y= # Children X1 = Education X2 = Income Y X1 R2 = Predictable (explained) variation in Y X2
Multiple Regression Now… More Variables! • The social world is very complex. • What happens when you have even more variables? • For example: A researcher may be interested in the effects of Education, Income, Sex, and Gender Attitudes on Number of Children in a family. Dependent Variable Number of Children Independent Variables Education Family Income Sex Gender Attitudes
Simple vs. Multiple Regression • One dependent variable Y predicted from one independent variable X • One regression coefficient • r2: proportion of variation in dependent variable Y predictable from X • One dependent variable Y predicted from a set of independent variables (X1, X2 ….Xk) • One regression coefficient for each independent variable • R2: proportion of variation in dependent variable Y predictable by set of independent variables (X’s)
Different Ways of Building Regression Models • Simultaneous (Enter): All independent variables entered together • Stepwise: Independent variables entered according to some order (Determined by researcher) • By size or correlation with dependent variable • In order of significance (theory) • Hierarchical (Forward, Backward): Independent variables entered in stages
Multiple Regression:BLUE Criteria • Regression forces a best-fitting model onto data. If the model is appropriate for the data, regression should be used. • How do we know that our model is appropriate for the data? • Criteria for determining whether a regression model is appropriate for the data are nicknamed “BLUE” for best linear unbiased estimate.
Multiple Regression:BLUE Criteria • Violating the BLUE assumptions may result in biased estimates or incorrect significance tests. (However, OLS is robust to most violations.) • Data (constellation) should meet these criteria: • The relationship between the dependent variable and its predictors is linear • No irrelevant variables are either omitted from or included in the equation. (Good luck!) • All variables are measured without error. (Good luck!)