1 / 44

Being Productive with Stata and VA Data

Being Productive with Stata and VA Data. Give me six hours to chop down a tree and I will spend the first four sharpening the axe. --Abraham Lincoln Todd Wagner August 2008. Outline. Database manipulation in Stata Data Analysis in Stata. Working Interactively and .do files.

anemone
Download Presentation

Being Productive with Stata and VA Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Being Productive with Stataand VA Data Give me six hours to chop down a tree and I will spend the first four sharpening the axe. --Abraham Lincoln Todd Wagner August 2008

  2. Outline • Database manipulation in Stata • Data Analysis in Stata

  3. Working Interactively and .do files • You can issue commands directly into the command line. • Unless you save your commands into a batch file (a .do file), you’ll lose your code once you close Stata. • I often work interactively and then save the “right” commands in a do file.

  4. Editing a .do file in Stata • Any ASCII text editor will work • Stata has a built in text editor, but it is limited. • I recommend exploring your options http://fmwww.bc.edu/repec/bocode/t/textEditors.html

  5. How Stata Thinks about Data

  6. Handling Data • SAS processes one record at a time • Stata processes all the records at the same time • Loops are commonly used in SAS • Loops are very rarely used in Stata

  7. Loading Data into Memory • Stata reads the data into memory • set mem 100m (before you load the data) • You must have enough memory for your dataset • With large datasets: • drop unnecessary variables • Use the compress command (but don’t compress SCRSSN)

  8. Stata Abbreviations • Stata commands can be abbreviated with the first three letters • regression income education female could be written • reg income education female

  9. Stata Help • Stata’s built in help is great • Help <command> • Stata manuals are great because they review theory

  10. Stata and the Web • Stata is “web aware” • Check for updates periodically • update all • You can search for user-written programs • findit output • findit outreg (click to install)

  11. Stata in Windows • Page up scrolls through the previous commands • There is a graphical user interface (menus) if you forget a command • In Unix, you can all Stata’s functionality if you use x-windows (e.g., Cygwin).

  12. Sysdir, ls and cd • Stata recognizes some unix commands, such as ls and cd • Sysdir provides a listing of Stata’s working directories sysdir STATA: C:\Program Files\Stata9\ UPDATES: C:\ProgramFiles\Stata9\ado\updates\ BASE: C:\Program Files\Stata9\ado\base\ SITE: C:\Program Files\Stata9\ado\site\ PLUS: c:\ado\stbplus\ PERSONAL: c:\ado\personal\ OLDPLACE: c:\ado\ • Store your data on a VA server– not on your PC or laptop!

  13. Delimiters • SAS recognizes “;” as a delimiter • Stata recognizes the carriage return • Always add a carriage return after your last command • You can change delimiters to ; #delimit ;

  14. Missing Data • Stata and SAS both use “.” as missing • Stata implicitly values a missing as a very large number • SAS implicitly values a missing as a very small number

  15. Generating and Recoding Variables • In SAS you type quality=0; If VA=1 then quality=1; • In Stata you type gen quality=0 recode quality 0=1 if VA==1 or replace quality=1 if VA==1

  16. Boolean Logic • Stata is picky about Boolean logic gen y=x if a==b (must use two ==) gen y=x if a>b & b>10 (must use &) gen y=x if a<=b (< or > must be before =)

  17. Creating Dummy Variables • Goal: create dummy variable for gender gen male=sex==“M” tab sex, gen(sex_) • This second command automatically creates 2 dummy variables • Be careful about missing data– missing data are assigned to 0, unless you use “if” or “recode”

  18. Drop • Drop <varnames> (drops variables) • Drop if X==1 (drop cases where value is 1)

  19. egen Commands • You want to generate total costs for a medical center • In SAS this is done by proc summary • In Stata, you can type collapse (sum) costs, by (stan3)or sort sta3n by sta3n: egen sumcost=total(cost)

  20. ICD-9 Codes • Stata has capabilities to handle ICD-9 diagnosis and procedure codes • You can • check to see if codes are valid • generate identifiers based on codes or ranges of codes

  21. Dates • Same date functions as SAS

  22. Combining Data • Merge • this automatically creates a variable called _merge • merge==1 obs. from master data • merge==2 obs. from only one using dataset • merge==3 obs. from at least two datasets, master or using merge scrssn admitday disday using data_y • Append (stacking data)

  23. Explicit Subscripting • Identify the most recent encounter in an encounter database gsort id -date by id : gen n=_n by id : gen N=_N gen select=n==1 Ascending sort by ID and reverse by date Record counter from 1 to N per person Total number of records per person

  24. Using Stata

  25. Set, Clear and More • Set: sets system parameters • Need to set memory size to open a database set mem 100m • Clear erases data from memory • When output is >1 page, you are asked to continue (set more off)

  26. Summarizing Data • Sum < >, d provides more details on each variable • Tabstat provides summary info, including totals . sum gender age educ Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- gender | 4085 1.496206 .5000468 1 2 age | 4085 64.5601 9.451724 50 94 educ | 4085 4.398286 1.662883 1 9

  27. Tabulating Data . tab gender gender | Freq. Percent Cum. ------------+----------------------------------- 1 | 2,058 50.38 50.38 2 | 2,027 49.62 100.00 ------------+----------------------------------- Total | 4,085 100.00 . table gender ---------------------- gender | Freq. ----------+----------- 1 | 2,058 2 | 2,027 ----------------------

  28. Tabulating Data tab gender age too many values r(134); tab age gender | gender age | 1 2 | Total -----------+----------------------+---------- 50 | 49 69 | 118 51 | 72 71 | 143 … 94 | 1 0 | 1 -----------+----------------------+---------- Total | 2,058 2,027 | 4,085

  29. . tabstat age, by (gender) gender | mean ---------+---------- 1 | 64.77454 2 | 64.34238 ---------+---------- Total | 64.5601 -------------------- . table gender, c(mean age) ----------------------- gender | mean(age) ----------+------------ 1 | 64.77454 2 | 64.34238 ----------------------- Tabstat

  30. Graphing • Diagnostic graphics • Presenting results

  31. Basic Analytical Functions • OLS (reg) • Logistic, probit, count data (e.g., CLAD) • Multinomials • GLM/HLM • Duration models • Semi and non-parametric models

  32. Creating Publishable Tables • Outreg command • Outputs data to a delimited file • Delimited file can be read into Excel • Very flexible • Creates publishable tables easily

  33. Example with VA data

  34. Becaplermin • June 2006, FDA issued a Boxed Warning for becaplerim (a treatment for lower extremity diabetic ulcers) • Warning raised potential risk of cancer related mortality

  35. Analytical Goal • Case-control study for becaplermin • Sample is all patients with a diabetic ulcer of the lower extremity • Exposure is quantity of becaplermin prescriptions • Multivariate analysis, stratifying for patients with prior history of cancer

  36. Pulling VA Data • VA utilization data extracts reside in SAS. I extract my sample using SAS and then moved the data into Stata. • VA Data: • Sample: All encounters with a diabetic ulcer principal diagnosis in NPCD and PTF (FY02-07) • Exposure: All prescriptions from DSS pharmacy FY02-07 for Becaplermin feeder code • Outcome: All encounters with a neoplasm principal diagnosis (FY97-07)

  37. Transferring Data • Stattransfer or DBMS copy work • Stattransfer often seeks to optimize the Stata dataset by default • If transferring data with SCRSSN, FORCE Stattransfer to transfer SCRSSN as double precision • http://www.stata.com/support/faqs/data/prec.html

  38. CLICK ON DOUBLE Stattransfer

  39. Diabetic Ulcer Sample • Goal: turn encounter level data into person level data cd R:\twagner\customer\becap use ulcer, clear sort scrssn by scrssn: gen n=_n tab n keep if n==1 keep scrssn sort scrssn gen ulcer=1 save finder, replace

  40. Alternative Code sort scrssn by scrssn: gen n=_n by scrssn: gen num_ulcervisits=_N sort scrssn by scrssn: gen newepisode=vizday[_n]-vizday[_n-1]>60 recode newepisode .=1 if n==1 by scrssn: egen episodes=sum(newepisode)

  41. Step 2: Merge Ulcer Sample and Cancer Cases use neo, clear gen cancer=1 sort scrssn merge scrssn using finder drop if _m==1 Merge command creates a new variable: _m=1 data only in master data _m=2 data only in using data _m=3 data merged in both

  42. Step 2: continued sort scrssn admitday disday sta3n adtime by scrssn: egen firstcancer=min(admitday) if cancer==1 gen diedihcan=disto==-2 & cancer==1 gen dod_can=disday if diedihcan==1 gen cancerstays=1 if cancer==1 recode cancerstays .=0 collapse (min) firstcancer (sum) cancerstays (max) diedihcan dod_can cancer, by (scrssn) sort scrssn drop _m save diabcancer, replace

  43. Merge in Exposure data use becap, clear gen numrx=1 sort scrssn svc_dte by scrssn: egen firstbecap=min(svc_dte) by scrssn: egen lastbecap=max(svc_dte) collapse (min) firstbecap (max) lastbecap (sum) day_supply numrx , by (scrssn) gen becap=1 sort scrssn save becapsum, replace use diabcancer merge scrssn using becapsum

More Related