390 likes | 518 Views
X 11 X 12 X 13 X 21 X 22 X 23 X 31 X 32 X 33. Research Question.
E N D
X 11 X 12 X 13 X21 X22 X23 X31 X32 X33
Research Question • Are nursing homes dangerous for seniors? Does admittance to a nursing home increase risk of death in adults over 65 years of age when controlling for age, gender, race, and number of emergency room visits?
Propensity Score MatchingorDo nursing homes kill you? ANNMARIA DE MARS, PH.D. & CHELSEA HEAVEN THE JULIA GROUP
WHY YOU NEED IT TWO NON-EQUIVALENT GROUPS Patients in specialized units People who attend a fundraising event
Any time you can ask the question …. Is there a difference on OUTCOME between levels of “treatment” A, controlling for X, Y and Z ?
1. Make sure there are pre-existing differences (Thank you, Captain Obvious)
2a. Decide on covariates • Are the differences pre-existing or could they possibly be due to the different “treatment” levels? • Race and gender are good choices for covariates. If more students at private vs public schools are black or female, the schooling probably didn’t cause that • Differences in grade 10 math scores may be a result of the type of school
2b. Decide on covariates Don’t use your outcome variable as one of your covariates
3. Run logistic regression to generate propensity scores PROC LOGISTIC DATA= datasetname ; CLASS categorical variables ; MODEL dependent = list-of-covariates ; OUTPUT OUT = newdataset PREDICTED= propensity-score;
4. Select matching method • Quintiles • Nearest neighbors • Calipers ALL OF THE ABOVE CAN BE DONE EITHER WITH OR WITHOUT REPLACEMENT
5. Run matching program & test its effectiveness 6. Run your analysis using the matched data set
An actual example Do nursing homes kill you?
Our data Kaiser Permanente Study of the Oldest Old, 1971-1979 and 1980-1988: [California] DEPENDENT VARIABLE: Dthflag = 1 if Died during study period 0 if alive at end of study period
Our data TREATMENT VARIABLE athome = 1 if lived at home continuously 0 if admitted to nursing home any time during study period
Covariates * • AGE • RACE • GENDER • TOTAL Emergency Room VISITS ** * Three out of four were DEFINITELY pre-existing differences ** Proxy for health
Create propensity scores PROC LOGISTIC PROC LOGISTIC DATA= saslib.old ; CLASS athome race sex ; MODEL athome = race sex age_comp vissum1; OUTPUT OUT =study.allpropen PREDICTED = prob; NOTE: No DESCENDING option
QUINTILE MATCHING EXAMPLE ONE
Part on creating quintiles blatantly copied (almost) http://www.pauldickman.com/teaching/sas/quintiles.php
Calculate Quintile Cutpoints PROC UNIVARIATE DATA= saslib.allpropen; VAR prob; OUTPUT OUT=quintile PCTLPTS=20 40 60 80 PCTLPRE=pct; Remember the dataset we created with the predicted probabilities saved in it?
PROC UNIVARIATE VAR prob; *** predicted probability as variable OUTPUT OUT=quintile PCTLPTS=20 40 60 80 PCTLPRE=pct; *** output to a dataset named quintile, *** create four variables at these percentiles *** with the prefix pct ;
/* write the quintiles to macro variables */ data _null_ ; set quintile; call symput('q1',pct20) ; call symput('q2',pct40) ; call symput('q3',pct60) ; call symput('q4',pct80) ; Just because I am too lazy to write down the percentiles
Create quintiles data STUDY.AllPropen; set STUDY.AllPropen ; if prob =. then quintile = .; else if prob le &q1 then quintile=1; else if prob le &q2 then quintile=2; else if prob le &q3 then quintile=3; else if prob le &q4 then quintile=4; else quintile=5;
The matching part Try to control your excitement
Create case & control data sets DATA small large ; SET study.allpropen ; IF athome = 0 THEN OUTPUT small ; ELSE IF athome = 1 THEN OUTPUT large ;
Create data set of sampling percentages PROC FREQ DATA = small ; quintile / OUT = samp_pct ;
Create data set of sampling percentages PROC FREQ DATA = small ; quintile / OUT = samp_pct ;
Create sampling data set DATA samp_pct ; SET samp_pct ; _NSIZE_ = 1 ; _NSIZE_ = _NSIZE_ * COUNT ; DROP PERCENT ; Just here to make it easy to modify
PROC SURVEYSELECT SAMPSIZE= input data set can provide stratum sample sizes in the _NSIZE_ variable STRATA groups should appear in the same order in the secondary data set as in the DATA= data set.
SELECT RANDOM SAMPLE PROC SORT DATA = large ; BY quintile ; PROC SURVEYSELECT DATA= large SAMPSIZE = samp_pct OUT = largesamp ; STRATA quintile ;
Concatenate data sets DATA study.psm_sample ; SET largesamp small ;
Did it work? ** P <.01 **** P < .0001
Before odds ratio 6.5 : 1 AFTER ODDS RATIO = 3.7: 1