210 likes | 313 Views
Mostly Dates… and a f ew o ther (random but) useful STATA commands. Jen Cocohoba, Pharm.D ., MAS Health Sciences Associate Clinical Professor UCSF School of Pharmacy. In this portion of the lecture. How to manipulate dates in STATA Performing loops Basic “how to” merge datasets
E N D
Mostly Dates…and a few other (random but) useful STATA commands Jen Cocohoba, Pharm.D., MAS Health Sciences Associate Clinical Professor UCSF School of Pharmacy
In this portion of the lecture • How to manipulate dates in STATA • Performing loops • Basic “how to” merge datasets • Follow along • No lab exercises
Dealing with Dates in STATA • Dates in your research • STATA can help you manipulate dates • Add, subtract, calculate time between dates • Comparing dates (e.g. before 1999, after 1999) • Extract components of dates (year, day of week)
How STATA thinks about dates 1/1/1960 1/3/1960 12/31/1960 • “Counts” date as the # of days from a specific reference • January 1, 1960 = 0 • January 2, 1960 = 1 • January 3, 1960 = 2 • December 31, 1960 = 364 • This makes it “easy” for STATA to manipulate mathematically • We will come back to this when formatting dates 0 1 2 364
Cleaning strings to STATA dates • STATA 11 reads dates as string • Do the “usual” • Open my Excel spreadsheet • Copy data • Paste into STATA data editor • Note color of variable
Cleaning STATA dates • STATA sees dates=string even if typed into editor • Need to convert to date recognized by STATA • Generate a new date variable using date function • Tell it which old variable contains the date you want to convert • Give it a format (most common is month, day, year) • Compare old and new results
generate dob = date(birthdate, “MDY”) New variable name Date function Old variable name How the date is arranged *NOTE: your original date variable can be “date-like” (e.g. 8/10/1970) or can be in a true string format (August 10, 1970) --- STATA 11 can figure it out.
Number nonsense • Emerges as the date in STATA speak • Can format (mask) the numerical date so that it is easier for you to understand Command: format dob %td dob -2372 -4366 -3839 150 -4862 -3626 -2788 -3562 -1868 -5946 -5984 -1962 -4694 -6018 -4407 0 * NOTE: Other formats aside from td – in STATA help
Dates: A series of two commands • Most will be like this 2-command example • Generate dob = date(birthdate, “MDY”) • Format dob %td • NOTE: STATA has issues with dates with 2-digit years • Visitdate has 2 digit years • Should get a “missing values” generated • Two ways to fix this • Format dates to 4 digit years in Excel, then copy to STATA • OR add a “topyear” to the STATA command which helps form a cutoff year. Anything beyond the topyear is interpreted as the previous century.
Top year = if the year is “10” this is interpreted as 2010. If beyond the topyear (“11”) then it is interpreted as 1911 Generate vdate = date(visitdate, “MDY”, 2010) New variable name Date function Old variable name How the date is arranged
Other ways dates commonly stored • Date components in separate variables • STATA can concatenate these for you using an mdy command
Generate olddate = mdy(birth_m, birth_d, birth_y) New variable name mdy date function Name of month, day, and year variables Don’t forget to format the new date variable using format olddate %td
Date is now formatted – what can you do with it? • Extract components of the date into new variables (columns) • gen nameofdayvariable = day(datevariable) • gen weekdayvariable = dow(datevariable) • Lists as 0(Sunday) - 6(Saturday) • gen monthvariable = month(datevariable) • gen yearvariable = year(datevariable)
What else can you do with dates • Find time elapsed between dates • Suppose you wanted to find participants’ age at the date of their study visit. • Generate new variable called ageatvisit gen ageatvisit = vdate - dob • Note this gives you their age in number of DAYS • Can do this more efficiently by gen ageatvisit =(vdate – dob)/365.25 *gen agevisityears = int(ageatvisit)*
Comparing dates • Suppose you wanted to categorize patients by their visit dates • Those who had a visit before 12/31/07 = earlyvisit • Using a literal dates • Formatted as day month year (01jan1960) • Must be denoted by parenthesis • Must use pseudocommandtd • Example: td(01jan1960) • Example • gen earlyvisit= 0 • replace earlyvisit= 1 if vdate<= td(31dec2007) • replace earlyvisit =. if vdate==. * NOTE: in STATA 10 the pseudocommand is just “d”, not “td”
Programming loops • Analyses & data cleaning are often repetitive • Example • Test whether age at visit, number of side effects, and average severity of side effects differ by gender (sex) • Could do this… • ttest ageatvisit, by(sex) • ttest numsidefx, by(sex) • ttest severity, by(sex)
Simple Loop Syntax Command begin foreach var in ageatvisit numsidefx severity { ttest `var’, by(gender) } List of variables Perform this command, replacing the `var’ with the variables in the list. NOTE the special apostrophe marks (the first one lies below the ~ on the keyboard, the other is a normal apostrophe) Command end • * NOTES • Open brace must appear on the same line as the foreach command. • Nothing may follow the open brace (except for comments) • The first command must be on a separate line • The close brace must be on its own line
A little on merging datasets • Merge versus append • Merge = add new variables from 2nd dataset to existing observations (across) • Append = add new observations to existing variables (down under) • Merging requires datasets to have a common variable (ID) • Nomenclature for the datasets • One dataset is defined as the “master” (in memory) dataset • The other dataset is called the “using” dataset • Many merge types • One to one: master file w/demographics, using data has labs (merge 1:1) • One to many: master file w/demographics, using file with multiple visits (merge 1:m) • Many to one: Master file with multiple visits, using with demographics (merge m:1) • Many to many: master with multiple visits + using file with multiple visits (merge m:m)
How to merge • Need to make sure they are sorted AND saved • STATA 11 may do this automatically for you! • sort idvariable • Steps • Load the master dataset into memory • Sort (just to be safe) • Command merge type commonvariable using “name of 2nd dataset.dta” Example: merge 1:1 wihsid using “socdem.dta” • Check to make sure it makes sense See appearance of a “merge” variable which tells you where the observations came from (dataset 1, dataset 2, etc.)
Last little (random) commands… • Numbering, if multiple observations • sort id timevariable • By id: egen newnumbervariable=seq() • Sending data back to Excel • outsheet list of all variables using name_of_file_you_want
The wonders of STATA on the Web • Many things STATA can help you do • To figure out how… • STATA help is one place to start • I’ve had luck with Google searches • UCLA has a helpful STATA site • Other discussion strings • Good luck with your final projects