320 likes | 341 Views
This study analyzes survey data from 2008 college-aged women to understand the relationship between their knowledge of emergency contraception (EC) and variables such as race, high school class size, year in college, history of sexuality, and use of EC. The study aims to provide frequency tables for specific variables and explore the relationship between knowledge and accessibility of EC.
E N D
Unintended Pregnancy in College-aged Women Yi Su and Zhe Zhao May 3, 2010
Survey Data • 2008 subjects • 65 questions -Age (range from 18-35) -Race -High School Class Size -Year in College -History of Sexuality (Hx Sex) -Use of Emergency Contraception (EC) -Questions involving Knowledge of EC -Questions involving Accessibility to EC
Client’s Goal • Obtain frequency tables for certain variables • Find relationship between knowledge of EC and given variables • Find relationship between accessibility of EC and given variables
Before Analysis… • Eight columns involving race should be represented by one variable showing all levels --Race variable SAS Code • Variable should be created to summarize subject’s knowledge of EC, level of accessibility to EC --Knowledge_index, Access_indicator, Access_index SAS Code
Frequency Tables all subjects Hx sex Yes Hx sex No EC use Yes EC use No
Frequency Tables (continued) • Create, customize and manage output via SAS ODS • Code ods listing close; ods html body='C:\Consulting for Melissa\OUTPUT\all_freq.xls' style=Minimal; ods NOPROCTITLE; proc freq data = mydata.Survey_V03; …… Ods html close; • Output Excel
Relationship between Knowledge Index and Race • Knowledge Index: [0,1) • Race: Eight different races
Initial Try: ANOVA on original data procglmdata=mydata.survey_final; class Race_Coded; model Knowledge_Index=Race_Coded; lsmeans Race_Coded/adjust=Tukey pdiff; outputout=o p=pred r=resid; run;
Test Results Least Squares Means for effect Race_Coded Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: Knowledge_Index i/j 1 2 3 4 5 6 7 8 1 0.9970 0.9917 0.9954 0.2634 0.9995 1.0000 1.0000 2 0.9970 0.9999 0.9494 <.0001 1.0000 0.9993 0.9885 3 0.9917 0.9999 0.9311 0.1671 0.9999 0.9971 0.9790 4 0.9954 0.9494 0.9311 0.5436 0.9654 0.9927 0.9917 5 0.2634 <.0001 0.1671 0.5436 0.0828 0.3404 0.0395 6 0.9995 1.0000 0.9999 0.9654 0.0828 0.9999 0.9992 7 1.0000 0.9993 0.9971 0.9927 0.3404 0.9999 1.0000 8 1.0000 0.9885 0.9790 0.9917 0.0395 0.9992 1.0000
Model Diagnostics Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.81658 Pr < W <0.0001 Kolmogorov-Smirnov D 0.189102 Pr > D <0.0100 Cramer-von Mises W-Sq 17.61803 Pr > W-Sq <0.0050 Anderson-Darling A-Sq 104.3814 Pr > A-Sq <0.0050
Conclusion • Non-normality • Non-constant variance
Second Try • Box-Cox transformation for ANOVA proctransregdata=mydata.survey_final; Model boxcox(Knowledge_Index/parameter=0.00001)=class(Race_Coded); run;
Box Cox transformation result • The TRANSREG Procedure • Transformation Information • for BoxCox(Knowledge_Index) • Lambda R-Square Log Like • -3.00 0.00 -58498.8 • -2.75 0.00 -53051.7 • -2.50 0.00 -47621.1 • ……………………………………………………………………………………………………. • 1.75 0.02 3953.4 • 2.00 0.02 3989.9 • 2.25 0.01 4018.2 • 2.50 0.01 4039.6 • 2.75 0.01 4055.2 • 3.00 + 0.01 4065.8 < • < - Best Lambda
ANOVA with cube transformation procglmdata=a; class Race_Coded; model Knowledgetr=Race_Coded; lsmeans Race_Coded/adjust=Tukey pdiff; outputout=o p=pred r=resid; run;
Pairwise Comparison Result • Least Squares Means for effect Race_Coded • Pr > |t| for H0: LSMean(i)=LSMean(j) • Dependent Variable: knowledgetr • i/j 1 2 3 4 5 6 7 8 • 1 0.9974 0.9922 0.9791 0.3095 0.9996 1.0000 1.0000 • 2 0.9974 0.9998 0.8736 <.0001 1.0000 0.9988 0.9743 • 3 0.9922 0.9998 0.8427 0.2235 0.9999 0.9954 0.9606 • 4 0.9791 0.8736 0.8427 0.4110 0.9052 0.9750 0.9735 • 5 0.3095 <.0001 0.2235 0.4110 0.1135 0.3510 0.0368 • 6 0.9996 1.0000 0.9999 0.9052 0.1135 0.9998 0.9976 • 7 1.0000 0.9988 0.9954 0.9750 0.3510 0.9998 1.0000 • 8 1.0000 0.9743 0.9606 0.9735 0.0368 0.9976 1.0000
Conclusion Non-constant variance problem was fixed but still has problem with non-normality
Third Try: Non-parametric test • Wilcoxon two sample test---- The Wilcoxon-Mann-Whitney test is a non-parametric analog to the independent samples t-test and can be used when you do not assume that the dependent variable is a normally distributed variable
Kruskal-Wallis Test---- The Kruskal Wallis test is used when you have one independent variable with two or more levels. In other words, it is the non-parametric version of ANOVA. It is also a generalized form of the Mann-Whitney test method, as it permits two or more groups.
Kruskal-Wallis Test---- This test is an alternative to the independent group ANOVA, when the assumption of normality or equality of variance is not met. This, like many non-parametric tests, uses the ranks of the data rather than their raw values to calculate the statistic. Since this test does not make a distributional assumption, it is not as powerful as the ANOVA.
SAS code for Kruskal-Wallis Test procnpar1way data=mydata.survey_final wilcoxon; class Race_Coded; var Knowledge_Index; run;
Test Result Kruskal-Wallis Test Chi-Square 27.4365 DF 7 Pr > Chi-Square 0.0003
Pairwise Comparison • Not provided by procnpar1way Solutions • 1) Carry out all tests one by one, be careful of controlling for family error rate • 2) SAS macro
Pairwise Comparison P-value • 0 and 1 0.4739 0 and 2 0.4692 0 and 3 0.2810 0 and 4 0.0371 • 0 and 5 0.6427 0 and 6 0.9011 0 and 7 0.9637 • 1 and 2 0.7909 1 and 3 0.1200 1 and 4 <.0001 1 and 5 0.7874 • 1 and 6 0.4259 1 and 7 0.2596 • 2 and 3 0.1546 2 and 4 0.0234 2 and 5 0.7117 2 and 6 0.4199 • 2 and 7 0.3096 3 and 4 0.0518 3 and 5 0.1860 3 and 6 0.3709 • 3 and 7 0.2524 • 4 and 5 0.0148 4 and 6 0.0469 4 and 7 0.0038 • 5 and 6 0.5720 5 and 7 0.5267 • 6 and 7 0.9816 • Compare P-value with 0.05/21=0.00238
SAS Macro (1) • ODSOUTPUT WilcoxonScores=wlx(drop=variable); • ODSEXCLUDE wilcoxonScores; • procnpar1waydata=mydata.survey_final wilcoxon; • class Race_Coded; • var Knowledge_Index; • run; • PROCPRINTDATA=wlx NOObs ; • run; • * macro var k == number of groups; • DATA_null_ ; SET wlx nobs=nobs; CALL SYMPUT("k",LEFT(nobs)); run; • %put &k.; • PROCTRANSPOSEDATA =wlx OUT=cnts(drop=_name_ _label_) prefix=_n; var n; • ID class; run; • PROCTRANSPOSEDATA =wlx OUT=mns(drop=_name_ _label_) prefix=_mn; var • meanscore; ID class; run; • procprintdata=cnts; RUN; • procprintdata=mns; RUN; • %LET alpha=.05; * familywise pvalue ; • DATA results; SET cnts; SET mns; DROP nn _n1-_n&k. _mn1-_mn&k.; • LENGTH reject $2; RETAIN reject ' '; • LABEL compare='Critical Value' abs_diff='Absolute Difference in Mean • Ranks';
SAS Macro (2) • c= ((&k.*(&k.-1))/2); * number of pairwise tests; • z = PROBIT( (1- ((&alpha./2)/ c) ) ); * multiplier ; • nn=SUM(of _n1-_n&k.); * total number of observations ; • ARRAY nc{&k.} _n1 - _n&k.; • ARRAY mn{&k.} _mn1 - _mn&k.; • DO i = 1to (&k.-1); • DO j = (i+1) TO &k.; • sc1 = mn{i}; sc2 = mn{j}; • ABS_diff = abs(sc1 - sc2); • compare = z * SQRT( nn*(nn+1)/12 * ((1/nc{i}) + (1/nc{j}))); • IF abs_diff > compare then reject='**'; * the ** marker is to denote • any significant differences ; • OUTPUT results; • reject=' '; * reset marker to missing ; • END; • END; • RUN; • procprintdata=results NOobslabel; • var i j sc1 sc2 ABS_diff compare reject; • FORMAT abs_diff 6.3 comp 6.2; • run;
Absolute • Difference Critical • i j sc1 sc2 in MeanRanks Value reject • 1 2 1000.41 976.19 24.216 287.78 • 1 3 1000.41 1605.50 605.09 1257.79 • 1 4 1000.41 717.08 283.32 193.10 ** • 1 5 1000.41 1026.73 26.318 322.07 • 1 6 1000.41 1138.90 138.49 563.76 • 1 7 1000.41 1136.55 136.14 390.23 • 1 8 1000.41 . . . • 2 3 976.19 1605.50 629.31 1288.92 • 2 4 976.19 717.08 259.11 341.40 • 2 5 976.19 1026.73 50.533 427.78 • 2 6 976.19 1138.90 162.71 630.15 • 2 7 976.19 1136.55 160.36 481.19 • 2 8 976.19 . . . • 3 4 1605.50 717.08 888.42 1271.13 • 3 5 1605.50 1026.73 578.77 1297.00 • 3 6 1605.50 1138.90 466.60 1377.07 • 3 7 1605.50 1136.55 468.95 1315.59 • 3 8 1605.50 . . . • 4 5 717.08 1026.73 309.64 370.76 • 4 6 717.08 1138.90 421.82 592.93 • 4 7 717.08 1136.55 419.46 431.29 • 4 8 717.08 . . . • 5 6 1026.73 1138.90 112.17 646.53 • 5 7 1026.73 1136.55 109.82 502.45 • 5 8 1026.73 . . . • 6 7 1138.90 1136.55 2.352 683.05 • 6 8 1138.90 . . . • 7 8 1136.55 . . .