260 likes | 373 Views
Help! My mentor gave me data and asked me to analyze it…. Pathways to Careers in Clinical and Translational Research (PACCTR) Curriculum Core. Help! My mentor gave me data and asked me to analyze it…. Very common project for a mentor to give to a student
E N D
Help! My mentor gave me data and asked me to analyze it…. Pathways to Careers in Clinical and Translational Research (PACCTR) Curriculum Core
Help! My mentor gave me data and asked me to analyze it….. • Very common project for a mentor to give to a student • BUT, may not be an appropriate project if you don’t have experience in statistical analysis or access to a statistician • Also known as secondary data analysis
Secondary Data Analysis • Often there is extra data left over that mentor thinks could be interesting. • Example: An RCT has been completed (and published) examining the efficacy of a medication for some disease. At baseline, subjects were asked a lot of questions about quality of life and sexual function.The researcher wants you to analyze this data.
Step 1: Define what you have • Sometimes this is not clear • Get a list of variables, questionnaire/ survey, data abstraction instrument • Read the research protocol • Read any papers/posters already published
Step 2: Research Q • Always start with a research question. • Ask your mentor what the research question is. • If there isn’t a clear research question, proceed with caution. There may not be an interesting project here.
Research Q Example Possible RQ’s from RCT example 1. How does QOL change in treated vs placebo group? (An RCT) 2. What is the QOL and sexual functioning of people with this disease at baseline (ie before treatment). (A descriptive study, ie cross-sectional study) 3. What are the determinants of low QOL in people with this disease? (similar to above but you determine if certain groups have lower QOL than others eg by race, education, comorbidities etc. This could be done with multivariate analysis.)
Step 2: Novel? • Is this novel? • Again-often there is left over data that researcher thinks might be interesting. • Your job is to figure out if it would be interesting! • Do a lit search to see what’s been done, talk to clinicians to see if it is interesting. • If not, consider choosing a different project.
Step 3: initial data analysis • If there is a research question and it is interesting, proceed with initial data analysis. • Type of analysis depends on type of data: • Continuous outcomes =means, t-tests, linear regression • Dichotomous or categorical outcomes =proportions (%’s), chi square tests, logistic regression
Step 3: initial data analysis • Do you have a programmer or statistician? • NO. If you don’t have data analysis experience, consider a different project unless you have a lot of time to teach yourself or take a class. See Hulley’s Designing Clinical Research • Yes—you have a programmer or statistician to help you: • your job is to communicate with him/her in order to get the info you need. Ask for: • list of variables • list of means and proportions for the variables you are interested in • Compilation of cross-tabs and/or t-tests for selected variables to see if there are differences between groups. (see next slide)
Cross-tabs? • Cross-tabs are a short hand way of saying chi square tests (or Fischer exact test) • If you ask for sex by high vs low QOL, you would get: Fisher's exact = 0.000 1-sided Fisher's exact = 0.000 Risk ratio | 2.026644 | 1.575926 2.606269
Low QOL High QOL Total Male 100 69.44 67.11 44 30.56 31.88 144 100.00 50.17 Female 49 34.27 32.89 94 65.73 68.12 143 100.00 49.83 Total 149 51.92 100.00 138 48.08 100.00 287 100.00 100.00 How to interpret? • There are 287 total with about half male (144) and half female (143) Fisher's exact = 0.001 1-sided Fisher's exact = 0.001 Risk ratio | 2.026644 | 1.575926 2.606269
Low QOL High QOL Total Male 100 69.44 67.11 44 30.56 31.88 144 100.00 50.17 Female 49 34.27 32.89 94 65.73 68.12 143 100.00 49.83 Total 149 51.92 100.00 138 48.08 100.00 287 100.00 100.00 How to interpret? • There are 287 total with about half male (144) and half female (143) • Men are more likely to have low QOL (100 of 144 or 69.44%) than women (49 of 143 or 34.27%) Fisher's exact = 0.001 1-sided Fisher's exact = 0.001 Risk ratio | 2.026644 | 1.575926 2.606269
Low QOL High QOL Total Male 100 69.44 67.11 44 30.56 31.88 144 100.00 50.17 Female 49 34.27 32.89 94 65.73 68.12 143 100.00 49.83 Total 149 51.92 100.00 138 48.08 100.00 287 100.00 100.00 How to interpret? • There are 287 total with about half male (144) and half female (143) • Men are more likely to have low QOL (100 of 144 or 69.44%) than women (49 of 143 or 34.27%) • This difference is significant with a p=0.001 Fisher's exact = 0.001 1-sided Fisher's exact = 0.001 Risk ratio | 2.026644 | 1.575926 2.606269
Low QOL High QOL Total Male 100 69.44 67.11 44 30.56 31.88 144 100.00 50.17 Female 49 34.27 32.89 94 65.73 68.12 143 100.00 49.83 Total 149 51.92 100.00 138 48.08 100.00 287 100.00 100.00 How to interpret? Risk ratio | 2.026644 | 1.575926 2.606269 • Sometimes the output will instead come to you as a risk ratio (relative risk or odds ratio) • Interpretation: Men are 2 fold more likely to have low QOL (RR=2.02) • This difference is significant because 95% confidence interval does not include 1.0 (ie 1/57-2.61) Fisher's exact = 0.001 1-sided Fisher's exact = 0.001
What about t-tests? If you asked for BMI vs High/Low QOL you would get this: Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- Low QOL | 64 24.74505 .8713092 6.970474 23.00388 26.48622 High QOL | 62 25.43989 1.0174 8.011014 23.40548 27.47431 ---------+-------------------------------------------------------------------- combined | 126 25.08696 .6662367 7.478489 23.76839 26.40552 ---------+-------------------------------------------------------------------- diff | -.6948402 1.336548 -3.340244 1.950563 ------------------------------------------------------------------------------ Degrees of freedom: 124 Ho: mean(Placebo) - mean(Digoxin) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = -0.5199 t = -0.5199 t = -0.5199 P < t = 0.3020 P > |t| = 0.6041 P > t = 0.6980
Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- Low QOL | 64 24.74505 .8713092 6.970474 23.00388 26.48622 High QOL | 62 25.43989 1.0174 8.011014 23.40548 27.47431 ---------+-------------------------------------------------------------------- combined | 126 25.08696 .6662367 7.478489 23.76839 26.40552 ---------+-------------------------------------------------------------------- diff | -.6948402 1.336548 -3.340244 1.950563 ------------------------------------------------------------------------------ Degrees of freedom: 124 Ho: mean(Placebo) - mean(Digoxin) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = -0.5199 t = -0.5199 t = -0.5199 P < t = 0.3020 P > |t| = 0.6041 P > t = 0.6980 The low QOL subjects (n=64) have a mean BMI of 24.7 with a std dev of 6.9 and a 95% CI of 23.0 to 26.5 How to interpret?
Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- Low QOL | 64 24.74505 .8713092 6.970474 23.00388 26.48622 High QOL | 62 25.43989 1.0174 8.011014 23.40548 27.47431 ---------+-------------------------------------------------------------------- combined | 126 15.08696 .6662367 7.478489 23.76839 26.40552 ---------+-------------------------------------------------------------------- diff | -.6948402 1.336548 -3.340244 1.950563 ------------------------------------------------------------------------------ Degrees of freedom: 124 Ho: mean(Placebo) - mean(Digoxin) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = -0.5199 t = -0.5199 t = -0.5199 P < t = 0.3020 P > |t| = 0.6041 P > t = 0.6980 The low QOL subjects (n=64) have a mean BMI of 24.7 with a std dev of 6.9 and a 95% CI of 23.0 to 26.5 The high QOL subjects have a mean BMI of 25.4 How to interpret?
Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- Low QOL | 64 24.74505 .8713092 6.970474 23.00388 26.48622 High QOL | 62 25.43989 1.0174 8.011014 23.40548 27.47431 ---------+-------------------------------------------------------------------- combined | 126 25.08696 .6662367 7.478489 23.76839 26.40552 ---------+-------------------------------------------------------------------- diff | -.6948402 1.336548 -3.340244 1.950563 ------------------------------------------------------------------------------ Degrees of freedom: 124 Ho: mean(Placebo) - mean(Digoxin) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = -0.5199 t = -0.5199 t = -0.5199 P < t = 0.3020 P > |t| = 0.6041 P > t = 0.6980 The low QOL subjects (n=64) have a mean BMI of 24.7 with a std dev of 6.9 and a 95% CI of 23.0 to 26.5 The high QOL subjects have a mean BMI of 25.4 Is this significantly different? No—look at middLe column, p=0.6041 How to interpret?
What about multivariate analysis? • Predictors of Low QOL (Low QOL is the outcome so this is a logistic regression b/c it is a dichotomous outcome) • Choose variables to place in your model. Choice depends on both biologic plausibility and on results of the bivariate analysis (the cross-tabs and t-tests you did above)
Model selection, multivariate analysis • You may choose to put all variables in the model that were significant in bivariate analysis at a p of <0.10 (usually you choose p=0.10 to 0.20 b/c if you limit it to <0.05 you may miss some variables that become significant in a multivariate model due to confounding by other variables) • And, even if not significant in the bivariate model, you may choose to include variables that you think are important biologically or b/c others have reported an association (eg co-morbid conditions)
Results: Multivariate analysis You ask for the model to be run and get this: . xi: logistic lowqol i.trirace i.agecat2 male private q33job lesshs married Logit Estimates Number of obs = 371 chi2(16) = 79.29 Prob > chi2 = 0.0000 Log Likelihood = -202.4476 Pseudo R2 = 0.1638 ------------------------------------------------------------------------------ lowqol | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- Itrira_1 | .9543597 .3067677 -0.145 0.884 .5082804 1.791929 Itrira_2 | .404713 .1310575 -2.793 0.005 .2145379 .763467 Iageca1 | 2.149653 .749715 2.194 0.028 1.085182 4.25828 Iageca_2 | 2.007573 .6533771 2.141 0.032 1.060822 3.79927 male | 2.227047 .9420758 1.893 0.058 .9719808 5.102711 private | 1.085656 .8550493 0.104 0.917 .2318977 5.082625 q33job | .8852046 .2355718 -0.458 0.647 .5254371 1.491305 lesshs | .8078212 .2238751 -0.770 0.441 .4692648 1.390633 married | .9584556 .268145 -0.152 0.879 .5539024 1.658482 ------------------------------------------------------------------------------
Interpretation? Outcome variable: low QOL . xi: logistic lowqol i.trirace i.agecat2 male private q33job lesshs married Logit Estimates Number of obs = 371 chi2(16) = 79.29 Prob > chi2 = 0.0000 Log Likelihood = -202.4476 Pseudo R2 = 0.1638 ------------------------------------------------------------------------------ lowqol | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- Itrira_1 | .9543597 .3067677 -0.145 0.884 .5082804 1.791929 Itrira_2 | .404713 .1310575 -2.793 0.005 .2145379 .763467 Iageca1 | 2.149653 .749715 2.194 0.028 1.085182 4.25828 Iageca_2 | 2.007573 .6533771 2.141 0.032 1.060822 3.79927 male | 2.227047 .9420758 1.893 0.058 .9719808 5.102711 private | 1.085656 .8550493 0.104 0.917 .2318977 5.082625 q33job | .8852046 .2355718 -0.458 0.647 .5254371 1.491305 lesshs | .8078212 .2238751 -0.770 0.441 .4692648 1.390633 married | .9584556 .268145 -0.152 0.879 .5539024 1.658482 ------------------------------------------------------------------------------ Variables in model: Race (3 categories, ref=white), Age (3 categories, ref=<30), Male (vs female), Private insurance (vs Medicaid), Employed (vs unemployed), Education < high school (vs more), Married (vs unmarried). Note that BMI is not in the model b/c it wasn’t significant in bivariate analysis (t-test)
Interpretation? Outcome variable: low QOL . xi: logistic lowqol i.trirace i.agecat2 k20 private q33job lesshs married Logit Estimates Number of obs = 371 chi2(16) = 79.29 Prob > chi2 = 0.0000 Log Likelihood = -202.4476 Pseudo R2 = 0.1638 ------------------------------------------------------------------------------ lowqol | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- Itrira_1 | .9543597 .3067677 -0.145 0.884 .5082804 1.791929 Itrira_2 | .404713 .1310575 -2.793 0.005 .2145379 .763467 Iageca1 | 2.149653 .749715 2.194 0.028 1.085182 4.25828 Iageca_2 | 2.007573 .6533771 2.141 0.032 1.060822 3.79927 male | 2.227047 .9420758 1.893 0.058 .9719808 5.102711 private | 1.085656 .8550493 0.104 0.917 .2318977 5.082625 q33job | .8852046 .2355718 -0.458 0.647 .5254371 1.491305 lesshs | .8078212 .2238751 -0.770 0.441 .4692648 1.390633 married | .9584556 .268145 -0.152 0.879 .5539024 1.658482 ------------------------------------------------------------------------------ Look at the P column to see which variables are significantly associated with low QOL after adjustment for other variables in the model Odds ratios > 1.0 indicate a higher risk of low QOL, odds ratios <1.0 indicate a lower risk of low QOL.
Interpretation? . xi: logistic lowqol i.trirace i.agecat2 k20 private q33job lesshs married Logit Estimates Number of obs = 371 chi2(16) = 79.29 Prob > chi2 = 0.0000 Log Likelihood = -202.4476 Pseudo R2 = 0.1638 ------------------------------------------------------------------------------ lowqol | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- Itrira_1 | .9543597 .3067677 -0.145 0.884 .5082804 1.791929 Itrira_2 | .404713 .1310575 -2.793 0.005 .2145379 .763467 Iageca1 | 2.149653 .749715 2.194 0.028 1.085182 4.25828 Iageca_2 | 2.007573 .6533771 2.141 0.032 1.060822 3.79927 male | 2.227047 .9420758 1.893 0.058 .9719808 5.102711 private | 1.085656 .8550493 0.104 0.917 .2318977 5.082625 q33job | .8852046 .2355718 -0.458 0.647 .5254371 1.491305 lesshs | .8078212 .2238751 -0.770 0.441 .4692648 1.390633 married | .9584556 .268145 -0.152 0.879 .5539024 1.658482 ------------------------------------------------------------------------------ Variables associated with increased risk of low QOL: 1. age 40-50; 2 fold increase risk 2. Age >50; 2 fold increase risk 3. Male has trend toward significance with p=0.06. Variables associated with decreased risk low QOL: 1. Asian (category 2); 60% decrease risk All other variables no longer significantly associated with outcome Odds ratios > 1.0 indicate a higher risk of low QOL, odds ratios <1.0 indicate a lower risk of low QOL.
Summary: Data analysis • Clearly define the research question and ensure it is novel • Understand the data: get variable list, read questionnaire, read research proposal and already published posters/papers • Preliminary analysis=bivariate (t-test, chi square) • Advanced analysis: multivariate
PACCTR* Curriculum Core • Rebecca Jackson MD, School of Medicine • Roberta Oka RN, ANP, DNSc, School of Nursing • George Sawaya MD, School of Medicine • Susan Hyde DDS, MPH, PhD, School of Dentistry • Jennifer Cocohoba PharmD, School of Pharmacy • Joel Palefsky MD, School of Medicine * Pathways to Careers in Clinical and Translational Research