400 likes | 608 Views
CCPR Computing Services Workshop: Introduction to Stata June, 2006. Outline. Stata Command Syntax Basic Commands Abbreviations Missing Values Combining Data Using do-files Basic programming Special Topics Getting Help Updating Stata. Stata Syntax. Basic command syntax:
E N D
CCPR Computing ServicesWorkshop: Introduction to StataJune, 2006
Outline • Stata • Command Syntax • Basic Commands • Abbreviations • Missing Values • Combining Data • Using do-files • Basic programming • Special Topics • Getting Help • Updating Stata
Stata Syntax • Basic command syntax: [by varlist:] command [varlist] [= exp] [if exp] [in range] [weighttype=weight] [, options] • Brackets = optional portions • Italics = user specified
Stata Syntax, cont. • Complete syntax [by varlist:] command[varlist] [= exp] [if exp][in range] [weighttype=weight] [, options] • Example 1 (webuse union) • Stata Command: .bysort black: summarizeageif year >= 80, detail • Results: • Summarizes age separately for different values of black, including only observations for which year >= 80, includes extra detail.
Stata Syntax, cont. • Complete syntax [by varlist:] command[varlist][= exp][if exp][in range] [weighttype=weight] [, options] • Example 2 (webuse union) • Stata Commands: .generateagelt30= age .replaceagelt30= 1if age < 30 .replaceagelt30= 0if age >= 30 & age < . • Result: • Variable agelt30 set equal to 1, 0, or missing • Generally [= exp] used with commands generate and replace
Basic Commands – Load “auto” data and look at some vars • Load data from Stata’s website webuse auto.dta • Look at dataset describe • Summarize some variables codebook make headroom, header inspect weight length
Basic Commands – Load “auto” data and look at some vars • Look at first and last observation list make price mpg rep78 if _n==1 list make price mpg rep78 if _n==_N • Summarize a variable in a table table foreign table foreign, c(mean mpg sd mpg)
Keep/Save a Subset of the Data • “Keep” a subset of the variables in memory keep make headroom trunk weight length • List variables in current dataset • ds • List string variables in current dataset • ds, has(type string) • Save current dataset • save tempdata/myauto
Generating New Variables • Create new variable = headroom squared generate headroom2 = headroom^2 • Generate numeric from string variable encode make, generate(makeNum) list make makeNum in 1/5 • Can’t tell it’s numeric, but look at “storage type” in describe: describe make makeNum
Generating New Variables, cont. • Create categorical variable from continuous variable • “price” is integer-valued with minimum 3291 and max 15906 • Generate categorical version - Method 1: generate priceCat = 0 replace priceCat = 1 if price < 5000 replace priceCat = 2 if price >= 5000 & price < 10000 replace priceCat = 3 if price >= 10000 & price < .
Generating New Variables, cont. • Generate categorical version of numerical variable: Method 2 generate priceCat2 = price recode priceCat2 (min/5000 = 1) (5000/10000=2) (10000/max=3) • Compare price, priceCat, and priceCat2 table price priceCat table priceCat priceCat2
Variable Labels and Value Labels • Create a description for a variable: label variable priceCat “Categorical price" • Create labels to represent variable values: label define priceCatLabels 1 cheap 2 mid-range 3 expensive label values priceCat priceCatLabels • View results: describe list price priceCat in 1/10
Reshape • Wide -> Long: reshape long uniqueschool author, i(year session order) j(count) • Long -> Wide: reshape wide author, i(year session order) j(count) Wide format: Long format:
A few other commands • compress - saves data more efficiently • sort/ gsort • order • rename • more
Abbreviations in Stata • Abbreviating command, option, and variable names • shortest uniquely identifying name is sufficient • Example: • Assume three variables are in use: make, price, mpg • “UN-abbreviated” Stata command: .summarize make price • AbbreviatedStata command: .su ma p • Exceptions • describe (d), list (l), and some others • Commands that change/delete • Functions implemented by ado-files
Missing Values in Stata 8 and 9 • Stata 8 and later versions • 27 representations of numerical “missing” • ., .a, .b, … , .z • Relational comparisons • Biggest number < . < .a < .b < … < .z • Mathematical functions • missing + nonmissing = missing • String missing = • Empty quote: “”
Missing Values in Stata - Pitfalls • Pitfall #1 • Missing values changed after Stata7: • Pitfall #2 • Do NOT: .replace weightlt200 = 0 if weight >= 200 • INSTEAD: .replace weightlt200 = 0 if weight >= 200 & weight < .
Combining Data • Append vs. Merge • Append – two datasets with same variables, different observations • Merge – two datasets with same or related observations, different variables • Appending data in Stata • Example: append.do
Combining Data- merge and joinby • Demonstrate with two sample datasets: • Neighborhood and County samples • One-to-one merge • onetoone.do • One-to-many merge – use match merge • onetomany.do • Many-to-many merge – use joinby • manytomany.do
Combining Data • Variable _merge (generated by merge and joinby) • Pitfalls • pitfall_merge1.do: Merging unsorted data • pitfall_merge2.do : many-to-many using merge instead of joinby
Do-files • What is a do-file? • Stata commands can be executed interactively or via a do-file • A do-file is a text file containing commands that can be read by Stata • Running a do-file within Stata .do dofilename.do
Do-files • Why use a do-file? • Documentation • Communication • Reproduce interactive session? • Interactive vs. do-files • Record EVERYTHING to recreate results in your do-file!
Do-files > Header, Version Control • Header • Include in do-files – name, project, project location, date, purpose, inputs, outputs, special instructions • Version Control • include version at top of do-file • Why? • Example: • Under version 7, .==.a==.b==….==.z
Do-files > Comments • Comments • Lines beginning with * will be ignored • Words between // and end of line will be ignored • Spanning commands over two lines: • Words between /* and */ will be ignored, including end of line character • Words between /// and beginning of next line will be ignored
Do-file > End of Line Character • Commands requiring multiple lines • delimit ; • This command tells Stata to read semi-colons as the end-of-line character instead of the carriage return • Comment out the carriage return with • /* at the end of line and */ at the beginning of next • Comment out the carriage return with ///
Do-files > Examples webuse auto, clear *this is a comment #delimit ; summarize price mpg rep78 headroom trunk weight; #delimit cr summarize price mpg rep78 headroom trunk weight //this is a comment summarize price mpg rep78 /// headroom trunk weight summarize price mpg rep78 /* */ headroom trunk weight
Saving output • Work in do-files and log your sessions! • log using filename • replace, append • log close • Output choices: • *.log file - ASCII file • *.smcl file - nicer format for viewing and printing in Stata
Saving Output, cont. • Graphs are not saved in log files • Use “saving” option of graph commands • saving(graph.ext) • Export current graph: • graph export graph.ext • Ex: graph export graph.eps • Supported formats: • .ps, .eps, .wmf, .emf .pict
Example using local macro . local mypath "C:\Documents and Settings\MyStata" . display `mypath' C:\Documents invalid name r(198); . display C:\Documents and Settings\MyStata C:\Documents invalid name r(198); . display "`mypath'" C:\Documents and Settings\MyStata
Example– foreach, return, display *see samplePrograms.do, runLoop foreach var of varlist tenure-lnwage { quietly summarize `var' local varmean = r(mean) display "Variable `var' has mean `varmean’ " }
Example using forvalues, display *see samplePrograms.do, runCount forvalues counter = 1/10 { display `counter' } forvalues counter = 0(2)10 { display `counter' }
Example: forvalues, generating random variables *see samplePrograms.do, runRandomGen forvalues j = 1/3 { generate x`j' = uniform() generate y`j' = invnormal(uniform()) } foreach x of varlist x1-x3 y1-y3 { summarize `x' }
Example – if/else *see samplePrograms.do, runIfElse foreach var of varlist tenure-ln_wage { quietly summarize `var' local varmean = r(mean) if `varmean' > 10 { display "`var' has mean greater than 10" } else { display "`var' has mean less than 10" } }
Special Topic: regular expressions • webuse auto • List all values of make starting with a capital and containing an additional capital: list make if regexm(make, "^[A-Z].+[A-Z].+") • AND ending in a number list make if regexm(make, "^[A-Z].+[A-Z].+[0-9]+$")
Special Topic: accessing data in another database • odbc list • odbc query testStata • odbc query testStata • odbc desc "Summary2006$“ • odbc load year type session order author1 author2, table("Summary2006$") dsn("testStata")
Special Topic: Exporting results using outreg • User-written program called outreg • From within Stata, type findit outreg • Very simple!! • Basically add one line of code after each regression to export results • For an example of code, see http://www.ats.ucla.edu/stat/stata/faq/outreg.htm
Getting Help in Stata • help command_name • abbreviated version of manual • search • search keywords, local • search keywords, net • search keywords, all • findit keywords • same as search keywords, all • Search Stata Listserver and Stata FAQ
Stata Resources • www.stata.com > Resources and Support • Search Stata Listserver • Search Stata (FAQ) • Stata Journal (SJ) • articles for subscribers • programs free • Stata Technical Bulletin (STB) • replaced with the Stata Journal • Articles available for purchase, programs free • Courses (for fee)
Updating Stata • help update • update all