290 likes | 622 Views
SAS Macro Coding for Jackknife Repeated Replication. Jackknife Repeated Replication is well-suited to macro coding due to iterative and flexible abilities with SAS macro language
E N D
SAS Macro Coding for Jackknife Repeated Replication • Jackknife Repeated Replication is well-suited to macro coding due to iterative and flexible abilities with SAS macro language • This presentation will demonstrate how to use a general JRR macro to correctly calculate variance estimates for means and regression coefficients (logistic and OLS models) SI Workshop: July 15, 2005
Analysis of Complex Sample Survey Data • Data from complex sample surveys must be analyzed using techniques which adjust for the clustering of the sample design • SAS, SPSS, and Stata assume a simple random sample and do not correctly calculate variances and standard errors within the standard procedures SI Workshop: July 15, 2005
Analysis of Complex Survey Data • SAS and Stata offer survey and svy procedures which use the Taylor Series Linearization approach • JRR is another widely used replication approach, offers an alternative to the Taylor Series method • JRR is flexible and can be adapted to many different types of statistics such as means, regression coefficients, and other statistics of interest SI Workshop: July 15, 2005
Visual Representation of JRR process • JRR systematically removes a small portion of the sample and statistics of interest are computed repeated for each sub-sample • In this example, str=42 and secu=2 is deleted and str=42 and secu=1 is doubled. • This process is followed for each strata until entire dataset is covered SI Workshop: July 15, 2005
SAS JRR Macro: Logistic Regression *Logistic Regression Jackknife for Analysis of Complex Survey Data****************** ; *Pat Berglund, July 2003 for Summer Institute Workshop ; libname d 'd:\sumclass' ; options compress=yes nofmterr symbolgen ; options macrogen mprint; *create outer jackknife macro with parameters ; *Parameters to fill in: *ncluster=number of clusters, in the NCS I dataset this is 42 ; *weight=case weight ; *depend=dependent variable for the logistic model ; *preds=predictor variables entered with a space between each one ; *indata=input dataset* ; %macro jacklogods(ncluster,weight,depend,preds,indata); SI Workshop: July 15, 2005
*section 1: jackknife using strata and secu variables to do 42 jackknife selections* ; *each iteration of do loop selects one strata*secu combination and doubles the contribution of strata=x and secu=1 while setting strata=x and secu=2 to zero ; *all other combinations stay the same* ; %let nclust=%eval(&ncluster); data one; set &indata; %macrowgtcal ; %do i=1%to &nclust ; pwt&i=&weight; if str=&i and secu=1 then pwt&i=pwt&i*2 ; if str=&i and secu=2 then pwt&i=0 ; %end; %mend; %wgtcal ; SI Workshop: July 15, 2005
**section 2: run base model/statistic of interest for entire sample using full weight* ; %macrobase ; ods output parameterestimates=parms (keep=variable estimate ) ; ods listing close ; proc logistic des data=ONE ; model &depend=&preds ; weight &weight ; run ; ods listing ; proc print data=parms ; run ; proc sort ; by variable ; run ; %mend base ; %base ; SI Workshop: July 15, 2005
*Section 3: Run Replicate Models* ; * replicate models, one for each strata using weight developed in jackknife section 1* ; *save statistic of interest for use with variance estimation* ; %macroreps ; %do j=1%to &nclust ; ods output parameterestimates=parms&j (keep=estimate variable rename=(estimate=estimate&j )) ; ods listing close ; proc logistic des data=ONE ; model &depend=&preds ; weight pwt&j ; run ; proc sort ; by variable ; %end ; %mend reps; %reps ; SI Workshop: July 15, 2005
*Section 4: Merge Base and Replicate files together for calculation of statistics of interest* ; data rep ; merge parms %do k=1%to &nclust; parms&k %end;; by variable ; procprint ; run ; SI Workshop: July 15, 2005
*Section 5-Calculate complex design corrected variance and standard errors *variance = sum of the squared differences between the base statistic and the replicate statistics ; *standard error= square root of the sum of the squared differences (variance) ; *Odds Ratio=exponent of the coefficient ; *Confidence Intervals=OR+-1.96*corrected standard error* ; odslisting ; data calculate ; set rep ; %macroit ; %do j=1%to &nclust ; sqdiff&j=(estimate-estimate&j)**2; %end; sumdiff=sum(of sqdiff1-sqdiff&nclust); stderr=sqrt(sumdiff) ; or=exp(estimate) ; lowor=or-(1.96*stderr) ; upor=or+(1.96*stderr) ; %mend it ; %it; run ; SI Workshop: July 15, 2005
procprint ; var variable estimate stderr or lowor upor ; run ; %mend jacklogods ; %jacklogods(42,p2wtv3,deplt1,sexf,d.ncsdxdm3 ) ; *comparison with SRS logistic regression* ; proclogistic des data=d.ncsdxdm3 ; weight p2wtv3 ; model deplt1=sexf ; run ; *comparison with SAS surveylogistic ; procsurveylogistic data=d.ncsdxdm3 ; strata str ; cluster secu ; weight p2wtv3 ; model deplt1 (event='1') =sexf ; run ; SI Workshop: July 15, 2005
Results from Logistic JRR Design Corrected Results: Variable Estimate stderr or lowor upor SEXF 0.7434 0.088842 2.10315 1.92902 2.27728 SI Workshop: July 15, 2005
SRS Results Analysis of Maximum Likelihood Estimates Std. Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq SEXF 1 0.7434 0.0724 105.3802 <.0001 SI Workshop: July 15, 2005
SAS Surveylogistic Results Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 2.0084 0.0776 669.6525 <.0001 SEXF 1 -0.7434 0.0889 70.0103 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits SEXF 0.475 0.399 0.566 SI Workshop: July 15, 2005
%macroreps ; %do j=1%to &nclust ; ods output parameterestimates=parms&j (keep=estimate variable rename=(estimate=estimate&j )) ; ods listing close ; proc reg data=ONE ; model &depend=&preds ; weight pwt&j ; run ; proc sort ; by variable ; %end ; %mend reps; %reps ; data rep ; merge parms %do k=1%to &nclust; parms&k %end;; by variable ; procprint ; run ; odslisting ; data calculate ; set rep ; %macroit ; %do j=1%to &nclust ; sqdiff&j=(estimate-estimate&j)**2; %end; sumdiff=sum(of sqdiff1-sqdiff&nclust); stderr=sqrt(sumdiff) ; %mend it ; %it; run ; procprint ; title"Results from JRR for OLS regression" ; var variable estimate stderr ; run ; %mend jackgenmod ; %jackgenmod(42,p2wtv3,incpers,sexf ag25 ag35 ag45,d.ncsdxdm3 ) ; procsurveyreg data=d.ncsdxdm3 ; title"Example of Proc SurveyReg" ; strata str ; cluster secu ; weight p2wtv3 ; model incpers=sexf ag25 ag35 ag45 ; run ; Another approach: Linear Regression %macro jackgenmod(ncluster,weight,depend,preds,indata); %let nclust=%eval(&ncluster); data one; set &indata; %macrowgtcal ; %do i=1%to &nclust ; pwt&i=&weight; if str=&i and secu=1 then pwt&i=pwt&i*2 ; if str=&i and secu=2 then pwt&i=0 ; %end; %mend; %wgtcal ; SI Workshop: July 15, 2005
Base Model for OLS %macrobase ; ods output parameterestimates=parms (keep=variable estimate ) ; title "Example of Proc Reg without design correction" ; proc reg data=ONE ; model &depend=&preds ; weight &weight ; run ; proc sort ; by variable ; run ; %mend base ; %base ; SI Workshop: July 15, 2005
Replicate Models • %macroreps ; • %do j=1%to &nclust ; • ods output parameterestimates=parms&j • (keep=estimate variable rename=(estimate=estimate&j )) ; • ods listing close ; • proc reg data=ONE ; • model &depend=&preds ; • weight pwt&j ; • run ; • proc sort ; • by variable ; • %end ; • %mend reps; • %reps ; SI Workshop: July 15, 2005
Merge Replicate Datasets with Base Dataset data rep ; merge parms %do k=1%to &nclust; parms&k %end;; by variable ; procprint ; run ; odslisting ; SI Workshop: July 15, 2005
Calculate Corrected Standard Errors from Distribution of Replicate Coefficients data calculate ; set rep ; %macroit ; %do j=1%to &nclust ; sqdiff&j=(estimate-estimate&j)**2; %end; sumdiff=sum(of sqdiff1-sqdiff&nclust); stderr=sqrt(sumdiff) ; %mend it ; %it; run ; SI Workshop: July 15, 2005
Code to Print Results from JRR and Execute Outer Macro procprint ; title"Results from JRR for OLS regression" ; var variable estimate stderr ; run ; %mend jackgenmod ; %jackgenmod(42,p2wtv3,incpers,sexf ag25 ag35 ag45,d.ncsdxdm3 ) ; SI Workshop: July 15, 2005
Proc SurveyReg Code procsurveyreg data=d.ncsdxdm3 ; title"Example of Proc SurveyReg" ; strata str ; cluster secu ; weight p2wtv3 ; model incpers=sexf ag25 ag35 ag45 ; run ; SI Workshop: July 15, 2005
Parameter Estimates from OLS SRS Regression Parameter Estimates Parameter Std. Variable DF Estimate Error t Value Intercept 1 11077 485.53334 22.81 SEXF 1 -12096 434.45468 -27.84 AG25 1 15227 586.69609 25.95 AG35 1 22194 600.60265 36.95 AG45 1 21404 683.46087 31.32 SI Workshop: July 15, 2005
JRR Results Results from JRR for OLS regression Obs Variable Estimate stderr 1 Intercept 11077 529.49 2 AG25 15227 698.83 3 AG35 22194 1026.29 4 AG45 21404 1055.67 5 SEXF -12096 689.31 SI Workshop: July 15, 2005
Proc SurveyReg Results Estimated Regression Coefficients Standard Parameter Estimate Error t Value Pr > |t| Intercept 11077.003 532.95062 20.78 <.0001 SEXF -12095.819 690.29149 -17.52 <.0001 AG25 15227.170 698.54031 21.80 <.0001 AG35 22194.355 1017.50689 21.81 <.0001 AG45 21403.763 1062.42802 20.15 <.0001 SI Workshop: July 15, 2005
Conclusions • JRR is a flexible and convenient alternative to canned software procedures/programs • Any statistic/procedure can be used within JRR structure, assuming it makes statistical sense • SAS Macro coding allows parsimonious syntax and is ideal for repetitive and flexible coding SI Workshop: July 15, 2005