260 likes | 543 Views
Introduction to Stata. Max Perez Leon Quinoso Brian Fried StatLab. Create a folder named IntroStata in the desktop. Lets put all files in that folder Very simple. We can use StatTransfer (which usually comes with Stata) or export data directly using Stata. Exporting a Dataset.
E N D
Introduction to Stata Max Perez Leon Quinoso Brian Fried StatLab
Create a folder named IntroStata in the desktop. • Lets put all files in that folder • Very simple. We can use StatTransfer (which usually comes with Stata) or export data directly using Stata Exporting a Dataset
Working space cd • Changing working space to IntroStata folder (the exact path will be different for each user) cd “C:\Users\MPLQ\Desktop\IntroStata” • Stata always has a default working directory Working Directory
Different ways to call a dataset use “C:\Users\MPLQ\Desktop\IntroStata\example1.dta” • If we defined the working directory, we do not need to specify all the path. Notice we are also using command “clear”. clear use example1.dta clear insheet using example1.csv Calling dataset
Browsing/Describing data browse br brpatient department edit list list patient age desc desc age desc score* codebook tab patient tab age tab department tab depart Examining data
tab age survey sum score2000 sum score2000,detail sum score2000,det sum score* br *t sort score2000 gsort survey gsort - survey gsort -survey score2000 Examining data
Using “in” and “if” browse in 1/4 browse if age<=20 Qualifiers br if age<20 | age >=40 br if (age<15 | age >=40) & department == “FES” Qualifiers
It is not a good practice to use the command window for our research. We should have a file in which we store all our commands and that allows us to run efficiently our procedures. • Important Shortcuts: • “Ctrl+D” (visible) • “Ctrl+R” (invisible) • If we select some lines, the shortcuts will only run the commands in those specific lines. If we do not select any lines, it will run all the do file. • Comments start with asterisk. Do files
clear set mem 100m set more off cd “C:\Users\MPLQ\Desktop\IntroStata” use example1 How do I usually start a Do file?
generate doubscore= score2000 * 2 gen av_score=( score2000 + score2001 + score2002 )/3 gen ones=1 gen indicator= score2000<.5 • Missing values. Be aware that missing values are different from zero: gen small= av_score if av_score<0.4 tab small tab small,m Creating/Modifying variables
Operations with missing values give missing values gen small_modify= small * 10 replace small_modify=0 if small_modify==. replace av_score=1000 if av_score<0.5 • Renaming variables rename doubscore score_double • Creating dummy variables tab survey, gen(Dsurvey) br br *survey* Creating/Modifying variables
Egen egen mscore2000=max(score2000) br *score2000 egen Dscore2000=max(score2000),by(department) br score2000 department Dscore2000 • Bysort bysort department: tab score2000 bysort survey: sum score2001 gen index=_n br bysort department: gen dep_index=_n • Collapse collapse (count) patient (mean) score2000 (sum) score2001 (sd) score2002,by(department) Creating/Modifying variables
Label a variable to convey more information label var survey "Patients Survey in 2010“ desc • Label values of categorical variables label define ex_label 1 low 2 medium 3 high label values survey ex_label desc Labeling
Notice variable department is a string variable desc • Sometimes we would like to store a string variable as a numeric variable, but not loose the information contained in the strings. encode department, gen(dep_num) desc • I can reconvert numeric to string decode dep_num,gen(department2) br department dep_num department2 Numeric/Strings
Long and wide format. Our original data is in wide format reshape long score ,i( patient ) j(year) reshape wide score ,i( patient ) j(year) Reshape
Open file example2 use example2,clear use example1,clear append using example2 • Open file example3. Watch out, we are saving and replacing example3 but sorted by patient identification number. use example3,clear sort patient save,replace use example1, clear merge patient using C:\Users\MPLQ\Desktop\IntroStata\example3.dta Append/Merge
use example1, clear append using example2 sort patient merge patient using C:\Users\MPLQ\Desktop\IntroStata\example3.dta Append/Merge
clear set mem 100m set more off cd "C:\Users\MPLQ\Desktop\IntroStata" use example1 capture log close log using history, replace text tab survey log close log using history, text append tab department log close Log file: Printing results in .txt
Be very careful when saving data. You could be eliminating your original data and months of hard work. Always keep a copy of your original data on a separate folder. save final_database save final_database,replace use final_database,clear save,replace Saving
Mean sum score2000,det return list • Correlations corr score2000 score2001 score2002 corr score* corr score*, covar • Regression reg score2000 score2001 score2002 reg score2000 score2001 score2002, noconstant reg score2000 score2001 score2002, robust ereturn list Simple Statistics
It is very useful to use Stata menus to obtain the command lines. scatter score2000 score2001 graph matrix score* graph bar (count) patient, over(survey) Graphs
Text within brackets [] are optional restrictions or options. • Underlined sections indicate acceptable abbreviations help tab help help help gen Help file
Local-macro variables • Foreach command (loops) • Regular expressions (Very useful if working with strings) • Commands • #delimit • return list • ereturn list • macro list Things to look up
Stata’s YouTube channel: http://www.youtube.com/user/statacorp/featured • http://survey-design.com.au/tips.html • http://www.ats.ucla.edu/stat/stata/ • http://data.princeton.edu/stata/ • http://dss.princeton.edu/online_help/stats_packages/stata/ WebPages