210 likes | 340 Views
Working sideways in Stata. Jakob Hjort DataManager, MPH Department of Cardiology Aarhus University Hospital DK-8200 Aarhus Denmark. 2014 Nordic and Baltic Stata Users Group Metting. The rectangular dataset. The rectangular dataset. Statistics. The rectangular dataset. Statistics.
E N D
Working sideways in Stata Jakob Hjort DataManager, MPH Department of Cardiology Aarhus University Hospital DK-8200 Aarhus Denmark 2014 Nordic and Baltic Stata Users Group Metting
The rectangular dataset Statistics
The rectangular dataset Statistics results ”It is not the data we want it’s the ssence of data”
The rectangular dataset Datamanagement
The rectangular dataset Datamanagement
The rectangular dataset Datamanagement Statistics
The rectangular dataset - transpose? Datamanagement Statistics
The rectangular dataset – subset in matrix using mata? use ”family.dta”, clear * Dataset with: fam_name, inc_mother & inc_father mata st_view(x=0,.,(”inc_mother”,”inc_father”)) income=colsum(x’)’ st_addvar(”long”,”inc_household”) st_store(.,”inc_household”,income) end list fam_name inc_mother inc_father inc_household
The direct approach generate [type] newvar=exp [if] [in] Datamanagement
The direct approach generate [type] newvar=exp [if] [in] Ex.: generate BMI=Weight/Height^2 Datamanagement Weight Height BMI
The direct approach egen [type] newvar=fcn(arguments) [if] [in] [,options] rowtotal, rowmin, rowmax, rowfirst, rowlast, rowmean, rowmedian, rowmiss, rownonmiss, rowpctile, rowsd, concat, anycount, anymatch, anyvalue,count, diff, fill, group, iqr, kurt, max, mdev, mean, median, min, mode, mtr, pc, pctile, rank, sd, seq, skew, std, tag, total Datamanagement
The direct approach egen [type] newvar=fcn(arguments) [if] [in] [,options] rowtotal, rowmin, rowmax, rowfirst, rowlast, rowmean, rowmedian, rowmiss, rownonmiss, rowpctile, rowsd, concat, anycount, anymatch, anyvalue,count, diff, fill, group, iqr, kurt, max, mdev, mean, median, min, mode, mtr, pc, pctile, rank, sd, seq, skew, std, tag, total Ex.: egen income=rowtotal(inc*) Datamanagement IncMar IncApr IncMay IncJan IncFeb income IncJun IncJul …
Looking under the skirts – just for inspiration viewsource _growmin.adothe rowmin() function of egen program define _growmin version 6, missing gettoken type 0 : 0 gettoken g 0 : 0 gettoken eqs 0 : 0 syntax varlist [if] [in] [, BY(string)] if `"`by'"' != "" { _egennoby rowmin() `"`by'"' } tempvar touse mark `touse' `if' `in' quietly { gen `type' `g' = . tokenize `varlist' while "`1'"!="" { replace `g' = cond(`1' < `g',`1',`g') mac shift } } end
Looking under the skirts – just for inspiration viewsource _growmin.adothe rowmin() function of egen program define _growmin version 6, missing gettoken type 0 : 0 gettoken g 0 : 0 gettoken eqs 0 : 0 syntax varlist [if] [in] [, BY(string)] if `"`by'"' != "" { _egennoby rowmin() `"`by'"' } tempvar touse mark `touse' `if' `in' quietly { 1. gen `type' `g' = . 2. tokenize `varlist' 3. while "`1'"!="" { 4. replace `g' = cond(`1' < `g',`1',`g') 5. mac shift 6. } } end 1.Initialize target variable 2.Prepare the variable-list 3.Looping: 4. In-the-loop-commands
Prepare the variable-list . local vars incJan incFeb incMar incApr incMay incJun /// incJul incAug incSep incOct incNov incDec Full specification of each and every variable – OK with 12 but what in case of hundreds? The list is stored in `vars' . unab vars: inc* . unab vars: incJan-incDec Variables can be specified with wildcards - The expanded list is stored in `vars' (unab means unabbreviate – however the command itself can’t be un-abbreviated) . ds inc* . ds incJan-incDec incJan incFeb incMar incApr incMay incJun incJul incAug incSep incOct incNov incDec Variables can be specified with wildcards - The list is stored in `r(varlist)’ Nice feature: the expanded list is shown for inspection 1.Initialize target variable 2.Prepare the variable-list 3.Looping: 4. In-the-loop-commands
Looping ”foreach” is the quickest and the most transparent loop command foreach lvar in incJan incFeb { // do stuff with "`lvar'” } unab lvar: inc* foreach lvar in `lvar' { // do stuff with "`lvar'” } ds inc* foreach lvar in `r(varlist)' { // do stuff with "`lvar'” } 1.Initialize target variable 2.Prepare the variable-list 3.Looping: 4. In-the-loop-commands
Looping Hold + press … on numeric keypad Left single-quote alt 0 ` = 9 6 ”foreach” is the quickest and the most transparent loop command foreach lvar in incJan incFeb { // do stuff with "`lvar'” } Hold + press … on numeric keypad Right single-quote ’ 0 alt = 3 9 unab lvar: inc* foreach lvar in `lvar' { // do stuff with "`lvar'” } ds inc* foreach lvar in `r(varlist)' { // do stuff with "`lvar'” } 1.Initialize target variable 2.Prepare the variable-list 3.Looping: 4. In-the-loop-commands
In the loop generate minimum=. unab vars: inc* foreach lvar in `vars' { replace minimum = cond(`lvar' < minimum,`lvar’,minimum) } generate minimum=. unab vars: inc* foreach lvar in `vars' { replace minimum = `lvar’ if `lvar’<minimum } generate minimum=. unab vars: inc* foreach lvar in `vars' { if `lvar’<minimum { replace minimum = `lvar’ } } ! 1.Initialize target variable 2.Prepare the variable-list 3.Looping: 4. In-the-loop-commands
Some of the danish participants who might know ”the DREAM database” will propably be able to see how these approaches can be useful when working with this fantastic but difficult construction.