170 likes | 252 Views
Correlation, regression and reading tables. Review – What are the chances?. “Test” statistics – say, the “ r ” – help us evaluate whether there is a relationship between variables that goes beyond chance If there is, one can reject the null hypothesis of no relationship
E N D
Review – What are the chances? • “Test” statistics – say, the “r” – help us evaluate whether there is a relationship between variables that goes beyond chance • If there is, one can reject the null hypothesis of no relationship • But in the social sciences, one cannot take more than five chances in one-hundred of incorrectly rejecting the null hypothesis • Here is how we proceed: • Computers automatically determine whether the test statistic’s coefficient (expressed numerically, such as .03) is of sufficient magnitude to reject the null hypothesis • How large must a coefficient be? That varies. In any case, if a computer decides that it’s large enough, it automatically assigns one, two or three asterisks (*, **, ***). • One asterisk is the minimal level required for rejecting the null hypothesis. It is known as < .05, meaning less than five chances in 100 that a coefficient of that magnitude (size) could be produced by chance. • If the coefficient is so large that the probability is less than one in one-hundred that it was produced by chance, the computer assigns two asterisks (**) • An even better result is three asterisks (***), where the probability that a coefficient was produced by chance is less than one in a thousand
Correlation • r – correlation: simple relationship, coefficients range between -1 and +1 (0 = no relationship) • R – multiple correlation – cumulative effect of multiple IV’s • Computers automatically test correlations for statistical significance (this does not imply there is a causal relationship – that’s up to researchers to hypothesize) Sig. (2-tailed) means that the significance level was computed without considering the direction in which the variables might affected each other
OLS (ordinary least squares) regression Used when independent and dependent variables are continuous • r2– coefficient of determination: proportion of change in the dependent variable accounted for by the change in the independent variable (R2 – effect of multiple IV’s) • b (beta): unit change in dependent variable when independent variable changes one unit (not easily interpretable – focus on asterisks & actual p values, when shown) Indep. variables B SE p DV: perception of social disorder r2 = .52** r2=.52**, the square of the correlation coefficient, means that change in height (IV) accounts for fifty-two percent of the change in weight (DV). The probability that one could get an r2 coefficient of this magnitude by chance is less than 1 in 100.
Logistic regression Used when dependent variable is nominal (i.e., two mutually exclusive categories, 0/1) and independent variables are nominal or continuous Hypothesis is that arresting domestic abusers reduces the risk that their victims will be assaulted in the future. IV’s are down the left. DV repeat victimization (Yes/No) is embedded. • b is the logistic regression coefficient • Exp b,the “odds ratio,” reports the effect on the DV of a one-unit change in the IV.An Exp bof exactly 1 means that as the IV changes one unit the odds that the DV will change are even, same as a coin toss. No relationship between variables can be assumed. • Exp b’s greater than 1 indicate a positive relationship, less than 1 a negative relationship • Arrestdecreases(negative b) the odds of repeat victimization by 22 percent (1 - .78 = .22), but the effect is non-significant • Not reporting (positive b) increases the odds of repeat victimization by 89 percent(1 + .89) or 1.89 times, a statistically significant change • Prior victimization increases the odds of repeat victimization 408 percent or 5.08 times, also statistically significant (it’s not 508 percent because Exp b’s begin at 1) * * * * *
OLS regression v. logistic regression Logistic regression analysis predicting feeling unsafe (DV) OLS regression analysis predicting perception of social disorder (DV) IV’s B Exp B S.E. p S.E. IV’s B S.E. p DV is nominal – 0 and 1 DV is continuous
Logistic regression – Effects of broken homes on children Dependent variable: conviction for crime of violence Logisticregression Research questions • Use the column Exp(B) and percentages to describe the effects of significant variables • Describe the levels of significance using words
Youths from broken homes were 236 percent more likely of being convicted of a crime of violence. The effect was significant, with less than 1 chance in 100 that it was produced by chance. • Youths with poor parental supervision were 128 percent more likely to be convicted of a violent crime. The effect was significant, with less than 5 chances in 100 that it was produced by chance.
Logistic regression –going from B to exp(B) • Use an exponents calculator • http://www.rapidtables.com/calc/math/Exponent_Calculator.htm • For “number,” always enter the constant 2.72 • For “exponent,” enter the B value, also known as the “log-odds” • The result is the odds ratio, also known as exp(B) • In the left example the B is 1.21, and the exp(B) is 3.36. • Meaning, for each unit change in the IV, the DV increases 236 percent • In the right example the B is -.610, and the exp(B) is .543 • Meaning, for each unit change in the IV, the DV decreases 46 percent (1.00-.54)
“Poisson” regression – effects of audience characteristics on substance use Alcohol and cannabis use at adolescent parties Research questions • What is the relationship between the size of gatherings and substance use? • What is the relationship between the presence of peers and substance use? • What is the relationship between the behavior of peers and substance use? “Poisson” logistic regression
Findings • Higher levels of substance use tend to occur in smaller gatherings • Less alcohol use in the presence of close friends • Except that higher levels of alcohol/cannabis use when used by friends • Peer behavior is the key
“Tobit” regression – poor academic performance delinquency A “model is a differentcombination of independent variables IV’s run down the left;DV Delinquencyembedded in the table Regression coefficient. Positive means IV and DV go up and down together, negative means as one rises the other falls. Different ways to measure the IV’s (each is a separate independent variable) * p <.05 ** p <.01 *** p <.001 Additional, “control” independent variables. In regression each is normally measured on a scale or is a categorical/nominal variable, coded 0-1 (e.g., F=0, M=1). Non-ordinal variables such as race can also be coded 0–1, with 0 denoting a common, “reference” category. Here the reference category for race is “white.” Numbers in parentheses are the standard error of the estimated relationship between the IV and DV
Reference categories (see previous slide)“Everything is in comparison to something!” • This made-up data mimics Model 4, which takes all the variables into account. • Female is the reference category for Male. • Males are much more likely to be delinquent than females (males have a positive, statistically significant relationship with delinquency). reference category reference category • This made-up data also mimics Model 4. • Reference category for race was white. • Blacks and Asians, and to a lesser extent American Indians, are significantly less likelythan Whites to be delinquent (note their negative relationship with delinquency.) • Hispanics seem as likely to be delinquent as Whites. • American Indians seem less likely to be delinquent than Whites, but the difference isnot statistically significant. Probabilities that a coefficient was generated by chance: * <.05 ** <.01 *** <.001
Instead of using asterisks, sometimes the actualprobability is given “Odds ratio” is same as Exp(B) Probabilities that a coefficient was generated by chance: * <.05 ** <.01 *** <.001
Sometimes different measures of the same dependent variable(or different dependent variables) run across the top Hypothesis: Poor academic performance Delinquency GPA has a negative relationship with three measures of the DV: truancy, arrests and convictions. As GPA increases each of the others decreases. Each relationship is significant – the strongest is with truancy, where the probability that chance would yield a coefficient of this magnitude is less than one in one-thousand (p <.001). Delinquency measures Truancy Arrest Conviction Grade point average -.163*** -.124** -.092* (.035) (.037) (.043) Probabilities that a coefficient was generated by chance: * <.05 ** <.01 *** <.001
And just when you thought you had it “down”… It’s fairly rare, but sometimes categories of the dependent variable run in rows, and the independent variable categories run in columns. Hypothesis: SOCP (intensive supervision) fewer violations
Final exam question on tables • The final exam will ask the student to interpret a table. The hypothesis will be provided. • Student will have to identify the dependent and independent variables • Students must recognize whether relationships are positive or negative • Students must recognize whether relationships are statistically significant, and if so, to what extent • Students must be able to explain the effects described by log-odds ratios (exp b) using percentage • Students must be able to describe how relationships change: • As one moves across models (different combinations of the independent variable) • As one moves across different levels of the dependent variable • IMPORTANT: Tables must be interpreted strictly on the techniques learned in this course. Leave personal opinions behind. For example, if a relationship supports the notion that wealth causes crime, then wealth causes crime!