1 / 35

Stata: Getting Starting and Being Productive with VA Data

Stata: Getting Starting and Being Productive with VA Data. Give me six hours to chop down a tree and I will spend the first four sharpening the axe. --Abraham Lincoln Todd Wagner June 2007. Outline. Getting data into Stata Editing in Stata How does Stata handle data

omer
Download Presentation

Stata: Getting Starting and Being Productive with VA Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stata: Getting Starting and Being Productivewith VA Data Give me six hours to chop down a tree and I will spend the first four sharpening the axe. --Abraham Lincoln Todd Wagner June 2007

  2. Outline • Getting data into Stata • Editing in Stata • How does Stata handle data • Stata notation and help • Using Stata and Basic Stata commands

  3. Transferring Data • Stattransfer or DBMS copy work • Stattransfer often seeks to optimize the Stata dataset by default • If transferring data with SCRSSN, FORCE Stattransfer to transfer SCRSSN as double precision

  4. CLICK ON DOUBLE Stattransfer

  5. Editing in Stata • Any ASCII text editor will work • Stata has a built in text editor, but it is limited. • I recommend using another text editor http://fmwww.bc.edu/repec/bocode/t/textEditors.html

  6. Handling Data • SAS processes one record at a time • Stata processes all the records at the same time • Loops are commonly used in SAS • Loops are very rarely used in Stata

  7. Loading Data into Memory • Stata reads the data into memory • set mem 100m (before you load the data) • You must have enough memory for your dataset • With large datasets: • drop unnecessary variables • Use the compress command (but don’t compress SCRSSN)

  8. Stata Abbreviations • Stata commands can be abbreviated with the first three letters • regression income education female could be written • reg income education female • Can also abbreviate variables if uniquely defined • reg inc educ fem

  9. Stata Help • Stata’s built in help is great • Help <command> • Stata manuals are great because they review theory

  10. Stata and the Web • Stata is “web aware” • Check for updates periodically • update all • You can search for user-written programs • findit output • findit outreg (click to install)

  11. Stata in Windows • Page up scrolls through the previous commands • There is a graphical user interface (menus) if you forget a command • We have Stata on rocky and tasha– no graphical capabilities, no menus, and loss of some shortcuts

  12. Using Stata • Create batch files called “.do” files • I work interactively • Run Stata and create do file as I go • I can then use the do file as needed • Debugging code and exploratory data analysis is very fast in Stata

  13. Sysdir, ls and cd • Stata recognizes some unix commands, such as ls and cd • Sysdir provides a listing of Stata’s working directories sysdir STATA: C:\Program Files\Stata9\ UPDATES: C:\ProgramFiles\Stata9\ado\updates\ BASE: C:\Program Files\Stata9\ado\base\ SITE: C:\Program Files\Stata9\ado\site\ PLUS: c:\ado\stbplus\ PERSONAL: c:\ado\personal\ OLDPLACE: c:\ado\

  14. Delimiters • SAS recognizes “;” as a delimiter • Stata recognizes the carriage return • Always add a carriage return after your last command • You can change delimiters to ; #delimit ;

  15. Missing Data • Stata and SAS both use “.” as missing • Stata implicitly values a missing as a very large number • SAS implicitly values a missing as a very small number

  16. Generating and Recoding Variables • In SAS you type quality=0; If VA=1 then quality=1; • In Stata you type gen quality=0 recode quality 0=1 if VA==1 or replace quality=1 if VA==1

  17. Boolean Logic • Stata is picky about Boolean logic gen y=x if a==b (must use two ==) gen y=x if a>b & b>10 (must use &) gen y=x if a<=b (< or > must be before =)

  18. Creating Dummy Variables • Goal: create dummy variable for each DRG gen drgnum1=drg==1 or tab drg, gen(drgnum) • This second command automatically creates dummy variables

  19. Drop • Drop <varnames> (drops variables) • Drop if X==1 (drop cases where value is 1)

  20. egen Commands • You want to generate total costs for a medical center • In SAS this is done by proc summary • In Stata, you can type collapse (sum) costs, by (stan3)or sort sta3n by sta3n: egen sumcost=total(cost)

  21. ICD-9 Codes • Stata has capabilities to handle ICD-9 diagnosis and procedure codes • You can • check to see if codes are valid • generate identifiers based on codes or ranges of codes

  22. Dates • Same date functions as SAS

  23. Combining Data • Merge • this automatically creates a variable called _merge • merge==1 obs. from master data • merge==2 obs. from only one using dataset • merge==3 obs. from at least two datasets, master or using merge scrssn admitday disday using data_y • Append (stacking data)

  24. Explicit Subscripting • Identify the most recent encounter in an encounter database gsort id -date by id : gen n=_n by id : gen N=_N gen select=n==1 Ascending sort by ID and reverse by date Record counter from 1 to N per person Total number of records per person

  25. Using Stata

  26. Stata Interface in Windows

  27. Set, Clear and More • Set: sets system parameters • Need to set memory size to open a database set mem 100m • Clear erases data from memory • When output is >1 page, you are asked to continue (set more off)

  28. Summarizing Data • Sum < >, d provides more details on each variable • Tabstat provides summary info, including totals . sum gender age educ Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- gender | 4085 1.496206 .5000468 1 2 age | 4085 64.5601 9.451724 50 94 educ | 4085 4.398286 1.662883 1 9

  29. Tabulating Data . tab gender gender | Freq. Percent Cum. ------------+----------------------------------- 1 | 2,058 50.38 50.38 2 | 2,027 49.62 100.00 ------------+----------------------------------- Total | 4,085 100.00 . table gender ---------------------- gender | Freq. ----------+----------- 1 | 2,058 2 | 2,027 ----------------------

  30. Tabulating Data tab gender age too many values r(134); tab age gender | gender age | 1 2 | Total -----------+----------------------+---------- 50 | 49 69 | 118 51 | 72 71 | 143 … 94 | 1 0 | 1 -----------+----------------------+---------- Total | 2,058 2,027 | 4,085

  31. . tabstat age, by (gender) gender | mean ---------+---------- 1 | 64.77454 2 | 64.34238 ---------+---------- Total | 64.5601 -------------------- . table gender, c(mean age) ----------------------- gender | mean(age) ----------+------------ 1 | 64.77454 2 | 64.34238 ----------------------- Tabstat

  32. Graphing • Diagnostic graphics • Presenting results

  33. Basic Analytical Functions • OLS (reg) • Logistic, probit, count data (e.g., CLAD) • Multinomials • GLM/HLM • Duration models • Semi and non-parametric models

  34. Output Linear regression Number of obs = 1306 F( 21, 1284) = 10.88 Prob > F = 0.0000 R-squared = 0.1398 Root MSE = 90.367 Robust wtp Coef. Std. Err. t P>t [95% Conf.Interval] ethn1 1.990048 8.742036 0.23 0.820 -15.16019 19.14029 Ethn2 -25.74654 11.69993 -2.20 0.028 -48.69961 -2.793467 ethn3 -35.59552 11.98309 -2.97 0.003 -59.1041 -12.08694 ethn4 -3.244168 11.16836 -0.29 0.771 -25.15441 18.66607 english -11.44402 9.699576 -1.18 0.238 -30.47277 7.584741 lifeus 37.34419 13.86037 2.69 0.007 10.15274 64.53564 age1999 -.6272524 .3097408 -2.03 0.043 -1.234906 -.0195987 income .8068256 .1714309 4.71 0.000 .4705102 1.143141 incmis 14.07434 9.404149 1.50 0.135 -4.374848 32.52352 _cons 111.3607 24.13083 4.61 0.000 64.02051 158.7009

  35. Outreg • Outputs data to a delimited file • Delimited file can be read into Excel • Very flexible • Creates publishable tables

More Related