280 likes | 395 Views
Multiple Regression. What Techniques Can Tell Us. Chi Square- Do groups differ (nominal data)? T Test Do Groups/Variables differ? Gamma/Lambda/Kendall’s Tau etc Are variables related to each other? (nominal data) Correlation Are variables related to each other? (ratio/interval data).
E N D
What Techniques Can Tell Us • Chi Square- • Do groups differ (nominal data)? • T Test • Do Groups/Variables differ? • Gamma/Lambda/Kendall’s Tau etc • Are variables related to each other? (nominal data) • Correlation • Are variables related to each other? (ratio/interval data)
Interpreting Correlations • 3 questions we can answer • Is there a relationship between 2 variables? • What is the direction of the relationship? • What is the Strength of a relationship
Interpreting Correlations • Are there limitations here? And if so, what? • Don’t know amount of effect of one variable on other • Don’t know impact of other variables
Basic Equations • Let your DV (Y)= total cost of bananas • Suppose you buy X lbs of bananas at $.49 a lb • How would you express this as an equation to figure out how much your bananas are worth? • Y=.49 X • Can use for prediction • 10lbs=$4.90 • 2lbs=$.98
Multivariate Equations • Suppose you have a phone plan that charges • $5.95 a month • $.10 a minute instate long distance • $.08 a minute interstate long distance • $.01 a minute Local Calls • How would you represent? • Total=.1x1+.08x2+.01x3+5.95
Regression Analysis • Lets you work the problem Backwards • How much do different IVs contribute to a DV • How do different IVs relate to DV • Lets you build a model of more complicated relationships • In addition to existence, direction, strength, gives you the amount of change
Expressing A regression equation • Y=b1x1+b2x2+…..bixi+constant+error • Error is part of probabilistic nature of social science • Constant- what Y would equal if all Xs=0 • Estimation process- fit a line to data that minimizes the distance to all observed data points
Scatter Plots and Regression Lines • PID and Ideology • Correlation here is .37, not bad, but you can see, there are deviations in some cases
Fitting the Regression Line • Goal: Minimize the squared distances (error) between predicted values of Y and observed values. • Goal, explain the variance in Y in terms of X • Error in prediction is unexplained variance
Party and Ideology • Set up PID as DV, Ideology as IV, run analysis • Can also do Ideology as DV
Goodness of Fit • Measure of how much variance is explained by model you build • R2= correlation coefficient squared • R2= proportion of variance explained • R2 is symetrical • In previous example R2 = .256 • R2 ranges from 0-1 • Adjusted R2 takes into account the degrees of freedom, more appropriate measure
Run for the Border Using Multiple Regression • Suppose that you and some friends ate at Taco bell every week for a year. • For each meal, you know the total amount spent, and the number of each item, but not what each item cost. • You could use multiple regression to get parameter estimates of the true values. • Data set was constructed by choosing a random number (Between 0 and 4) of Bean Burritos, Tacos, Chalupas, Chicken Tacos, Beef Burritos, 7 Layer Burritos, and Soft drinks • Data matrix includes a variable for number of each
Border Model 1 • We’ll look at impact of bean burritos on total
Border Model 2 • Bean Burritos and Tacos
Model 4 Revisited • Bean Burrito- .69,Taco .79, Chalupa 1.19, Chicken taco 1.39, Beef Burrito 1.59,7 layer 1.89, Drink 1.29
Some Data Requirements for Regression • DV must be interval or ratio, and continuous • IVs should not be correlated with each other • Error should be constant at high and low predicted value (homoschedasticity) • Relationship must be linear • Errors of subsequent observations should not be correlated (no serial correlation)
For Next time • Multicolinearity • Heteroskedasticity • Interaction terms • Pass out Stat Assignment II