1 / 60

STATA: An Introduction Into the Basics

STATA: An Introduction Into the Basics. Prof. Dr. Herbert Brücker University of Bamberg Seminar “Migration and the Labour Market” May 31, 2012. Repetition of last meeting (I/III) 1 The Borjas (QJE 2003) model The regression equation: y ijt = wage or unemployment rate

dinesh
Download Presentation

STATA: An Introduction Into the Basics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STATA: An Introduction Into the Basics Prof. Dr. Herbert Brücker University of Bamberg Seminar “Migration and the Labour Market” May 31, 2012

  2. Repetition of last meeting (I/III) • 1 The Borjas (QJE 2003) model • The regression equation: • yijt = wage or unemployment rate • pijt = migration share • si = vector of eduction dummies (i = 1, 2, 3, 4) • xj = vector of work experience dummies (j = 1, 2, …, 8) • πt= vector of time dummies (t = 1, 2, …, 25) • si × xj = vector of time-experience dummies • si × πt = vector of education-time dummies • xi × πt = vector of experience-time dummies

  3. Repetition of last meeting (II/III) • The linear regression model • The univariate model • The multivariate model • Econometric basics • endogenous (LHS) and exogenous (RHS) variables • regression coefficients/parameters • error (disturbance) term • assumptions of linear regression model • ordinary least squares estimator • Interpretation of regression output • coefficient • standard deviation • t-statistics • p-value

  4. Repetition of last meeting (III/III) • Regression diagnostics • R-squared and adjusted R-squared • F-statistics • Degrees of freedom

  5. Contents of Today’s Meeting • The STATA Software package • The Structure of STATA: Three files • Getting started • The STATA Menues • The General Structure of STATA • Working with DO FILES • Describe your data • Running regressions

  6. 1 STATA SOFTWARE PACKAGE • Image of STATA DVD in the Campus Net under: • “\\software\campliz” • or: • “\\software.uni-bamberg.de\campliz” • Then: Start -> Ausführen • Them: INSERT your licence: • Serial number: …. • Code: …. • Authorisation key: ….

  7. Structure of STATA: Three files • The DATA file (.dta) where you have your data. • You can watch you data with the DATA BROWSER and edit your data with the DATA EDITOR • The DO file (.do) where you run and save your commands of any session. Very useful (i) to organise your data set, (ii) to see what you have done in the last session, (iii) to replicate what you have done in last session, (iv) to exchange work with your collaborators. • You write and run your commands with the DO FILE EDITOR

  8. Structure of STATA: Three files • The LOG file (.log) which automatically reports all things which you have done during your session. Is automatically saved after your session. Not often used, but useful if something goes wrong.

  9. 3 Getting started: the STATA empty window

  10. 3 Getting started: The STATA empty window The reviewwindowreportsyourpreviouscommands The mainwindow: showscommands, outputandmessageswhicharriveduringyoursession The variables window: Shows variables ofyourdataset The commandwindow: hereyoucan type yourcommands

  11. 3 Getting started: the windows after data loading Reports commands(one in thiscase) Reports resultofcommands List of variables

  12. 3 Getting started • In principle, you can start your STATA session by (i) loading your data set and (ii) typing your commands in the command window. • It is however recommended to use the DO FILE EDITOR right from the beginning. • But let’s look at the STATA menues first.

  13. 4 The STATA Menues • For watching your data and changing your data by hand you need the DATA BROWSER and the DATA EDITOR. • For starting and running your DO files you need the DO FILE EDITOR. • The other menues are not relevant for the beginning. The datapath The do fileeditor The dataeditor The databrowser The variables manager The helpmenue

  14. 4 The STATA Menues: The DATA EDITOR/BROWSER The difference between the data browser and the data editor is that you can manipulate data in the editor and only watch them in the browser.

  15. 4 The STATA Menues: The DATA EDITOR/BROWSER STRING variable NUMERICAL variable Youhavetwotypesof variables: NUMERICAL variables (black) and so-called STRING variables (blue) (e.g. text). STATA canidentify STRING variables, but youcannot do numericaloperationswiththem.

  16. 4 The STATA Menues: The DATA EDITOR/BROWSER HINT: Youcantransferdata e.g. from an EXCEL fileinto a STATA filebycopyandpaste (STRG C + STRG V) andviceversa in thedataeditor. But youhavetobecarefulthatyou EXCEL isrun in English, otherwiseyourdatamightbereadas STRING variables by STATA. Ofcoursetherearemanyotherwaystotransferdatafrom Excel to STATA.

  17. 5The Grammar of STATA General Structure of STATA commands [prefix :] command [varlist] [if] [in] [weight] [, options]

  18. 5 General structure of STATA We will concentrate on: [prefix :] command [varlist] [if][in] [weight] [, options]

  19. 5 General structure of STATA We will concentrate on: [prefix :] command [varlist] [if][in] [weight] [, options] What you want to do?

  20. 5 General structure of STATA • There are two types of variables (data): • numerical variables, e.g.: 0, 1, 501, 0.5, -12 etc. • string variables, e.g.: no voc train , male, female etc. • How to deal with the data types: • Numerical variables: you can do all mathematical operations, e.g. var1 + var2, var1/var2, var1*var2 etc. • String variables: You have to use quotation marks for identifcation, e.g. • var1 = 1 if sex == “female”

  21. 6 Working with DO FILES • The standard approach is to start your work with a DO FILE • Click on the DO FILE editor button after starting STATA • Load an existing DO FILE or start a new one • Start the DO FILE with a command to load your data, e.g. • use “path\data.dta”, clear • or, more specifically, with • use “C:\Users\Herbert\Documents\STATA\Wagecurve\DE.dta", clear

  22. Open your DO FILE editor • After starting STATA click on the DO FILE editor button The do fileeditor

  23. How does a DO FILE look like Descriptionsofwhatyouhavedone in stars * Commands

  24. The DO FILE menue Clickingthisbuttonrunstheentire DO FILE (not recommended) Clickingthisbuttonruns a selectionofmarkedcommands (recommended) Note: STATA stopsthe DO File execution after thefirstmistake in yourcommands. Thatmakesitadvisabletoproceedstepbystep.

  25. 6 Step 1: Loading your data • use “C:\Users\Herbert\Documents\STATA\Wagecurve\DE.dta", clear • The use command loads the data • the “path\DE.dta” provides STATA the information on the path where to find the data and the name of the data file (e.g. DE.dta) • the clear command after the comma clears the memory, which is needed if you have used other data sets before • Push the “Execute Selection (DO)” button to run the selected command(s) • You can also run the entire DO File by pushing the “Execute Selection Quietly (RUN)” button

  26. Loading your data (I/II) • Write the command use „path\XXX.dta“, clear • Mark the line and run the command by clicking the execution button

  27. Loading your data (II/II)

  28. 6 Step 2: Manipulating your data (I/VI) • It is useful to save only a basic data set and generate the variables you need at the beginning of each session. That saves storage space (recommended in case of large data sets) • Generating DUMMY variables • Use the gen command, e.g. • gen D_ed1 = 0 • This creates a variable consisting only of zeros • Then use the replace command, e.g. • replace D_ed1 = 1 if ed1 == 1 • This replaces the zeros with 1 if the variables ed1 has a values of 1.

  29. Generating Dummy Variables: DO FILE commands

  30. Generating Dummy Variables: STATA main window

  31. 6 Step 2: Manipulating your data (II/VI) • Another example for generating dummy variables: • Use the gen command, e.g. • gen year_1 = 0 • This creates a variable consisting only of zeros • Then use the replace command, e.g. • Year_1 = 1 if year == 1991 • This replaces the zeros with 1 if the year variable has a values of 1991 • Note: The STATA syntax requires that you have to use after an if command always a double == for the definition of the value

  32. 6 Step 2: Manipulating your data (III/VI) • Creating series of dummy variables if it is too cumbersome to create them individually, e.g. in case of interaction dummies • Syntax: • forvaluesi = 1/3 {forvalues j = 1/4{ gen D_ed`i’*D_ex`j’ } } • i.e. for each value I = 1,2,3 and each value j = 1,2,3,4 you generate an interaction dummy by multiplying the dummy variables for education and experience. Take care of the {}!

  33. Generating Dummy Variables: Advanced techniques

  34. Generating Dummy Variables: Advanced techniques

  35. Generating Dummy Variables: Advanced techniques

  36. 6 Step 2: Manipulating your data (IV/VI) • Transforming variables into log variables • Syntax: • gen ln_wijt = lnwijt • By using again the gen command you can transform the wage variable wijt into the natural logarithm of the wage by applying the ln operator

  37. Transforming data

  38. 6 Step 2: Manipulating your data (V/VI) • Useful operators in STATA: • + add • - subtract • * multiply • / divide • ln transform into natural log • exp transform into exponential value

  39. 6 Step 2: Manipulating your data (VI/VI) • Control what you have done • Check you variables for mistakes in the browse modus of the data set • You can delete wrong variable by using the drop command, e.g. • drop ln_wijt • Which simply drops your variable from the data set. Then you can create the correct one.

  40. 6 Step 3 Organize your data with globals • It is not convenient if you have to work with too many variables, e.g. 200 dummy variables (that is cumbersome to type some by hand) • You can define globals, which comprise many variables • Syntax: • glo [name of global [list of variables] • gloD_i Ded_1 Ded_2 D_ed3 • i.e the global D_i consists of the variables Ded_1 Ded_2 and Ded_3 • If you want to use the global later you have to type • $[globalname], i.e. $D_i

  41. Creating globals

  42. 7 Describe your data (I/II) • Any econometric analysis requires in the first step that you provide descriptive statistics to the reader. This helps to understand what’s going on • This can be easily done with the sum command • sum [variable name(s)] • sum LHijtLFijtwijtln_wijt • The sum command creates a table with the complete descriptive statistics, i.e. observations, mean, standard deviation, minimum, maximum

  43. Summary statistics

  44. Summary statistics

  45. 7 Describe your data (II/II) • Present your data graphically • It is usually helpful if you present the main information /vairables in your data set graphically • There are many graphical commands, use the Graphicsmenue • the simplest way is to show the development of your variable(s) over time • Syntax: • graph twoway line [variable1] [variable2] if … • graph twoway line wqjt year if ed==1 & ex == 1 • This produces a two-dimensional variable with the wage on the vertical and the year on the horizontal axis for education group 1 and experience group 1

  46. Making a graph

  47. Graph of mean wage in education 1 and experience 1 group

  48. Graph of migration rate in edu 1 and exp 1 group

  49. 8 Running regressions • The standard OLS regression command in STATA is • Syntax • regress depvar [list of indepvar ] [if], [options] • regress ln_wijtm_ijtD_iD_jD_t

  50. 8 Running Regressions Recall: What is a linear regression model The general econometric model: γi indicates the dependent (or: endogenous) variable x1i,ki exogenous variable, explaining the independent variable β0 constantorthe y-axisintercept (if x = 0) β1,2,k regressioncoefficientorparameterofregression εi residual, disturbanceterm

More Related