Research Methods

Research Methods Lecture 2 The dummies’ guide to STATA Wiji Arulampalam 18/10/2006

Econometrics Software • You can use any software that does what you need • See Timberlake for details of what does what well [www.timberlake.co.uk] • PC Give is hard to beat for time series analysis • Microfit, EViews are good alternatives • STATA does (just about) everything. • STATA (and everything else) is available as a delivered application on the network.

WHY STATA • Need to know how to use STATA for (i) Econometrics A [next term] (ii) Econometrics B [this term] (iii) Panel Data Econometrics [next term] • E-Views demo will be given by the Econometrics tutors! • The above two should be sufficient

STATA • Hopefully you will have access by next week • So full demo next week • Stata command file wages.do and data file wages.dta on the module web page for you to practice

STATA • Use STATA: FOR • large survey datasets (merging them) • complex nonlinear models (e.g. LDV’s) • But see also LimDep • nonparametric and evaluation methods • you want to • continue studying economics • be a professional economist • learn something new • you hate PC Give.

Some useful websites • Stata’s own resources for learning STATA • Stata website, Stata journal, Stata library, Statalist archive • http://www.stata.com/links/resources1.html • Michigan’s web-based guide to STATA (for SA) • UCLA resources to help you learn and use STATA: • http://www. ats.ucla.edu/stat/stata • including movies and “web-books”

AccessingSTATA Available from your ‘Delivered Applications’ Double click on icon!

Buttons/Menu

Enter commands here

OR use the do editor to create a .do file

Results window Better to save the output – more later

Click for Extensive Help OR Type help in command line help

Type help in command line help xxx

Exit, clear

Click and point in v9 Menu/tabs Exit, clear

Important features (1) • NOTE • Always use lowercase in STATA • Otherwise you can get very confused • More --more-- in your output window  more output to come. [Press spacebar and the next page appears] • Command set more off turn this off • Not enough memory [so reset!] • . set mem XXXm (allocate XXX mb of data) • . set matsize XXX (max matrix size XXX square)

Important features (2) • To Break • To stop anything hit the “break” (menu button with red cross, or hit Ctrl and C simultaneously)

Using data on disk (1) • Opening a dataset • datasets need to be rectangular [variables in columns; observations in rows ] • Stata datasets have a .dta extension • Will read excel or text files • Otherwise use Stat/Transferto convert other format files to stata files

Using data on disk (2) • There are several ways of getting data into STATA: eg: wages.dta . use wages (or click: file/open on the menu bar) . use lwage ed exp in 1/1000 if fem==1 . insheet using wages.csv (or .txt) (imports an Excel csv file or a “text” file)

Opens the file List of variables

Basic data reporting (1) • .describe (or press F3 key) • Lists the variable names and labels • .describe using wages • Lists the variable names etc WITHOUT loading the data into memory (useful if the data is too big to fit) • .codebook • Tells you about the means, labels, missing values etc

Basic data reporting (2) • sort and count • .sort personid • sorts data by personid • .count if personid==personid[_n-1] • counts how many unique separate personids • _n-1 is the previous observation

First look at the data (1) • .list lwage ed exp in 1/10 if fem>=0 • Lists the first 10 rows of var1 to var3 for which var4≥0 • .tab fem union (or tabulate) [variables should be integers] • gives a crosstab of fem vs union

First look at the data (2) • .summ fem union (or summarize or sum) • means, std devs etc for x1 and x2 • .corr ed exp in 1/100 if fem<1 (,cov) • correlation coeffs (or covariances) for selected data • .pwcorr ed exp lwage [does all pairwise corr coeffs]

Tabulating (1) • tab x1 x2 if x4==0, sum(x3) • gives the means of x3 for each cell of the x1 vs x2 crosstabulation for observations where x4=0 • tab x1 x2, missing • Includes the missing values • tab x1 x2, nolabel • Uses numeric codes instead of labels • Eg “1” instead of “NorthWest” etc

Tabulating (1) • tab x1 x2, col • Gives % of column instead of count • Can get row percentages by using rowinstead • Or both by using row col • table educ ethnic, c(mean wage) row col • Customises the table so it includes the mean (or median or mx or count or sd ….) of wage by cells

Labelling • Always have your data comprehensively labelled .label data “This is pooled GHS 90-99” .label variable reg “region” .lab define reglab 0 “North” 1 “South” 2 “Middle” .lab values region reglab • Tedious to do for lots of variables • but then your output will be intelligibly labelled • other people will be able to understand it in future

Data manipulation (1) • Data can be renamed, recoded, and transformed: Command .generate or gen for short . gen logrw=log((earn/hours)/rpi) . gen agesq=age^2 (squares) . gen region1=(region==1) (1 if true, 0 if not) . gen ylagged=y[ _n-1 ] (_n is the obs # in STATA)

Data manipulation (2) • Command recode: . recode x1 .=0, 1/5=1 (. is missing value (mv)) . replace rate=rate/100 . replace age=25 if age==250 . egen meaninc=mean(income), by (region) (see help egen for details)

Data selection (1) • You can also organise your data set with various commands: . keep if _n<=1000 ( _n is the observation number) . drop region . drop if ethnic~=1 keeps only the first 1000 observations, drops region, and drops all the observations where the variable ethnic≠1 (~= is “not equal to”)

Data selection (2) • Then save the smaller file for subsequent analysis . save newfile . save, replace (take care – it overwrites existing file)

Functions • Lots of functions are possible. • See . help functions • Obvious ones like • Log(), abs(), int(), round(), sqrt(), min(), max(), sum() • And many very specialised ones. • Statistical functions • distributions • String functions • Converting strings to numbers and vice versa • Date functions • Converting dates to numbers and vice versa • And lots more

Command files • Stata command files have a .do extension • It is ALWAYS good practice to use a .do file • you will know exactly what you have done. • It makes it easy to develop ideas. • And correct mistakes. • . do wages.do, nostop • (echoes to screen, and keeps going after error encountered) • Or . run wages.do (executes “silently”)

Keeping track of output (1) • Can scroll back your screen (upto a point) • But better to open a log file at the beginning of your session, and close it at the end. • Click on file, log, begin . Or type . log using myoutput . Commands…………………… . log close [log command allows the replace and append options.]

Keeping track of output (2) • Default is .smcl file extension (that STATA can read) • .log extension gives an ASCII file that anything can edit • ALWAYS LOG your output is a good way of developing a .do file – since it saves the commands as well as the output

Next Lecture Monday 23rd October F107 11:00-12:00 STATA demo

Research Methods