1 / 32

Research Methods Lecture 3 More STATA

Research Methods Lecture 3 More STATA. Ian Walker Room S2.109  i.walker@warwick.ac.uk  02475 23054. Slides available at: http://www2.warwick.ac.uk/fac/soc/economics/pg/modules/rm/notes/iw_lectures/. Housekeeping announcement. Stat-Transfer. Use STAT-TRANSFER to convert data. Click on

kerry
Download Presentation

Research Methods Lecture 3 More STATA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research MethodsLecture 3More STATA Ian Walker Room S2.109 i.walker@warwick.ac.uk 02475 23054 Slides available at: http://www2.warwick.ac.uk/fac/soc/economics/pg/modules/rm/notes/iw_lectures/

  2. Housekeeping announcement

  3. Stat-Transfer • Use STAT-TRANSFER to convert data. • Click on • Stat-transfer is “point and click”. • Just tell it the file name and format • and the format you want it in. • Click “transfer”.

  4. Stat Transfer options • Useful options for creating a manageable dataset from a large one: • Keep or drop variables • Change variable format • E.g. float to integer • Select observations • E.g. “where (income + benefits)/famsize < 4500” • Can be used for reading a large STATA dataset and writing a smaller one • Avoids doing this in STATA itself

  5. Customising STATA • profile.do runs automatically when STATA starts • Edit it to include commands you want to invoke every time .set mem 200m .log using justincase.log, replace • Define preferences for STATA’s look and feel • Click on Prefs in menu • Colours, graph scheme, etc. • Save window positioning

  6. Merging data - 1 • file1 has id x1 x2 x3 , file2 has id x4 x4 x5. • You can merge using “key” in BOTH files (id) • But you need to sort both files first. use file1 . sort id sorts file1 according to id variable . save, replace . use file 2 . sort id sorts file2 according to id variable . merge using file1 . drop if _merge~=3 drops obs with any missing info . save file3

  7. Merging data - 2 • For each row (id) all vars in file1 added to corresponding row of file2 (if there is one). • .merge creates a new variable, _merge • which =1 for those obs only in file1, =2 for those only in file2, and =3 for those in both. • So the syntax above drops those obs that don’t have data in both files • and saves the result containing x1-x6 in file3 • .append to add more obs on the same vars.

  8. Collapsing data (use with care) • Collapse converts the data in memory into a dataset of means (or sums, medians, etc.) • This is useful when you want to provide summary info at a higher level of aggregation • For example, suppose a dataset contains data on individuals – say their reg and whether u/e • To find the average u/e rates across reg type: . collapse unemp, by(region) leaves 1 obs for each reg and mean u/e rate.

  9. Reshaping files • Data may be “long” but thin • Eg each record is a household member • But there are few vars - say wage and hours • Data may be “wide” but short • each record is a household and has lots of vars • (eg w1 w2 w3 hours1 hours2 hours3) . reshape long inc ue, i(id) j(year) wide to long . reshape wideinc ue, i(id) j(year) back to wide . Handy for merging data together and for panel data

  10. Syntax to remember >= means "greater or equal",  &     means    "and",  |       means    "or"  = means “set equal to” == means “is it equal to?” ~= means “not equal” (or use != ) . means missing value For example . keep ifx1>= 1 & x1<=3 | x1==7 & x1 ~= . . gen x = log(y) . reg y x if z == 1 & y != .

  11. Using STATA as a calculator • .display command • .dis 22/7 • disp log(250) • di exp(3.6) • di chiprob(2,6.45) (i.e. 2 df, deviance 6.45) • returns 0.398 (i.e. its significant at 5% level) • display _N • Returns the sample size • (_N is the number of the last obs)

  12. Using the data editor • Open a datafile (eg auto.dta) • Click on the icon • Or type .edit • You can edit datapoints! • Or just browse the datafile

  13. STATA’s editor • STATA has an editor that allows you to create do files • Enter cmds – 1 per line • Save the commands in a “do” file • Highlight commands and click the button with page (or page with text) and down arrow to “run” (or “do”) commands.

  14. Saving output • Scroll – best to open “log file” (and close it). • Click on file, log, begin . • Or type . log using myoutput Then type some commands here and then . log close • log command allows replace and append • Default is a .smcl file extension (to “view”) • It doesn’t save graphs • Copy graphs (use cut and paste or use menus)

  15. Saving commands • You might prefer own extension, say, .log • then you get an ASCII file that anything can edit • you can translate files to and from smcl format • click on file, log, translate and fill in the dialog box • Logging your output is a good way of developing a .do file • since it saves the commands as well as output • Or you can just log the commands • type .cmdlog using xxx • You can turn logging off and back on • .log off then .log on when ready to resume

  16. Useful tips for .do files .#delimit ; /*makes ; the end of line char) */ .use mydata, clear ; .set more off ; .set mem 200m ; .set matsize 200 ; .log using xxxx.log, replace ; . . .log close ; .exit, clear ;

  17. Handling string variables encode • Use encode when the original var is a character var (eg gender is "m" or "f’") • encode command does not produce dummy variables, it just assigns numbers to each group defined by the character variable. • In this example, gender was the original character var and sex is new numeric var: . encode gender, gen(sex) • decode does the opposite

  18. Extended generate (.egen) egen • Useful when you need a new variable that is the mean, median, etc. of another variable • for all observations or for groups of observations. • Also useful when you need to simply number groups of observations based on some classification variables. • Great when you have panel data

  19. .egen examples . egen sumvar1 = sum(var1) creates sumvar1 as sum of values of var1 . egen meanvar1= mean(var1), by(var3) creates meanvar1 as mean of all values of var1 . egen counter = count(id), by(company) creates count as the number of companies with nonmissing id’s . egen groupid = group(month year) assigns a number to each month/year group

  20. Saving typing - 1 macro • For defining lists of vars (globally or locally). .local macvar x1 x2 x3 x4 x5 x6 .reg y1 `macvar’ .reg y2 `macvar’ .reg y3 `macvar’ .reg y4 `macvar’ • macvar becomes string “x1 x2 x3 x4 x5 x6" • Pay careful attention to the different type of quotation marks

  21. Saving typing - 2 for • Performs same command on several vars. • It can use several types of variable lists .for var1-var25: replace @=. if @==99 replaces 99 by missings) .for var*: replace @=. if @=99 replaces 99 by missings .for 1-3, ltype(numeric): gen q@==0 creates q1=0, q2=0, q3=0 .for a b c, ltype(any): gen str2 @="x” creates a=x, b=x, c=x

  22. Regression models - I • Linear regression and related models when the outcome variable is continuous • OLS, 2SLS, 3SLS, IV, quantile reg, Box-Cox … • Binary outcome data • the outcome variable is 0 or 1(or y/n) • probit, logit, nested logit...; • Multiple outcome data • the outcome variable is 1, 2, ..., • conditional logit, ordered probit

  23. Regression models - II • Count data • the outcome variable is 0, 1, 2, ..., occurrences • Poisson regression, negative binomial • Choice models • multinomial choice • A, B or C • Multinomial logit, Random utility model, unordered probit, nested logit, ...etc • Selection models • Truncated, censored • Tobit, Heckman selection models; • linear regression or probit with selection

  24. Regression models - III • STATA supports several special data types. • Once type is defined special commands work • Time series • Estimate ARIMA, and ARCH models • Estimators for autocorrelation and heteroscedasticity • Estimate MA and other smoothers • Tests for auto, het, unit roots - h, d, LM, Q, ADF, P-P ….. • TS graphs

  25. Special data types: survey • Non-randomness induces OLS to be inefficient • STATA can handle non-random survey data • see the “syv***” commands • Example (stratified sample of medical cases): . webuse nhanes2f, clear . svyset psuid [pweight=finalwgt], strata(stratid) . svy: reg zinc age age2 weight female black orace rural . reg zinc age age2 weight female black orace rural

  26. Special data types: duration • Survival time data • See the “st***” commands .stset failtime /*sets the var that defines duration*/ • Estimates a wide variety of models to explain duration

  27. ST regression supports Weibull, Cox PH and other options . streg load bearings, distribution(weibull) After streg you can plot the estimated hazard with . stcurve, cumhaz STATA allows functions to be plotted by specifying the function: E.g. Weibull “hazard” model – Weibull example ….

  28. Special data types: Panel data • STATA can handle “panel” data easily • see the “xt***” commands • Common commands are .xtdes Describe pattern of xt data .xtsum Summarize xt data .xttab Tabulate xt data .xtline Line plots with xt data .xtreg Fixed and random effects

  29. Panel data • An xt dataset looks like this: pid yr_visit fev age sex height smokes ---------------------------------------------------------- 1071 1991 1.21 25 1 69 0 1071 1992 1.52 26 1 69 0 1071 1993 1.32 28 1 68 0 1072 1991 1.33 18 1 71 1 1072 1992 1.18 20 1 71 1 1072 1993 1.19 21 1 71 0 • xt*** cmds need vars identify person and “wave”: . iis pid . tis yr_visit • Or use the tsset command . tsset pid yr_visit, yearly

  30. Panel regression • Once STATA has been told how to read the data it can perform regressions quite quickly: . xtreg y x, fe . xtreg y x, re

  31. Further advice • See Stephen Jenkins’ excellent course on duration modelling in STATA • Steve Pudney’s excellent panel course • Beware his example dataset is 30mb+ • To get up and running • Just have a go - you won’t break it! • Try some of the commands in this lecture • To start to get proficient • Sign up for netcourses

More Related