1 / 83

WORKSHOP ON ECONOMIC ANALYSIS OF CLIMATE CHANGE PRACTICAL LESSONS ON STATA 11

WORKSHOP ON ECONOMIC ANALYSIS OF CLIMATE CHANGE PRACTICAL LESSONS ON STATA 11. INTERACTIVE USE OF STATA Interactive use means that STATA commands are initiated within STATA.

Download Presentation

WORKSHOP ON ECONOMIC ANALYSIS OF CLIMATE CHANGE PRACTICAL LESSONS ON STATA 11

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WORKSHOP ON ECONOMIC ANALYSIS OF CLIMATE CHANGE PRACTICAL LESSONS ON STATA 11

  2. INTERACTIVE USE OF STATA • Interactive use means that STATA commands are initiated within STATA. • A graphical user interface (GUI) for stat is available. It enables almost all the STATA commands to be accessed using drop down menus. • STATA allows users to directly type commands to execute a particular task. • The standard procedure however in STATA is to aggregate the various commands needed into one file called a do-file that can be run with or without interactive use.

  3. BASICS IN STATA • Like most softwares, STATA has some example data sets that allows ‘amateur’ users to use as starting point in learning STATA. • An example of such data sets is the auto.dta data • To access the example data: • Click File/Example Datasets/… Example datasets installed with Stata • Select the data set auto.dta • Interactive Users can however type the command • sysuse auto

  4. DATA MANAGEMENT • To describe the variables in the data set type: • describe or des • Or to describe some specific variables type add the name of the variable to the command. • Eg: des mpg • NB: stata commands does not allow upper case • If you wish to the summary statistics of the variable type: • summarize,detail • sum, detail • su, detail • su, d • You can drop the subcommand detail if you wish to obtain the basic summary statistics. • You can summarize specific variables • sumvarlist, detail • Eg: sum mpg, detail • sum mpg • su mpg

  5. DATA MANAGEMENT • If you are only interested in a subset of your data, you can inspect it using filters. E.g. If you are only interested in price of a particular type of car you can type: • sum if price>=3000 & price<=4400 • sum if mpg>=16& mpg<=23 • And then you can contrast • sum if price>=3000 |price<=4400 • sum if mpg>=16 |mpg<=23 • Interpretation of Logical Operators in STATA. >= greater or equal to <= less or equal to == equal to & and | or != or ~= not equal to > greater than < Less than . missing

  6. DATA MANAGEMENT • The usual arithmetic operators (+,-,*,/) are applicable in STATA. • STATA allows users to tabulate variables to know the distribution of a variable • tabulate mpg • tab mpg

  7. DATA MANAGEMENT • Some data/variables have been coded with value labels already assigned to the values. If the user wants to know the actual values used type: • tabvarlist, nolabel • Eg: tab foreign, no label GENERATING NEW VARIABLES • You can create a new variable by combining new variables or by performing some arithmetic operations. [gen, egen, recode] • To create a ratio of two variables: • gen mpgratio=mpg/weight • sum mpgratio

  8. DATA MANAGEMENT The same procedure can be applied to obtain traditional transformations such as: Square gen mpg2=mpg^2 Cubic gen mpg3=mpg^3 Square roots gen mpgsqrt=sqrt(mpg) Exponential gen expmpg=exp(mpg) Natual logs gen lnmpg=ln(mpg) gen logmpg=log(mpg) Base 10 genl10mpg=log10(mpg)

  9. DATA MANAGEMENT • Eg: gen lprice=log(price+1) • Why +1? This helps eliminate the problem of estimating the log of zero or missing numbers. • Sometimes the user may want to generate a new variable within a particular range. • gen lprice=log(price) if mpg==. • gen llprice=log(price) if mpg>15 • The generate command can also be used to create new (binary) variables. • Eg: from the auto.dta data set we are using, may be interested in finding out how many cars were repaired more than two times in 1978. Thus we create a new variable repair =1 if the vehicle was repaired more than twice or 0 if otherwise.

  10. DATA MANAGEMENT • Use the command: gen repair =1 if rep78>2 replace repair=0 if rep78<=2 or replace repair=0 if repair==. • You can also create categorical variables from a set of continuous variables. tab mpg gen mpgcat=1 if mpg<15 replace mpgcat=2 if mpg>=16& mpg<26 replace mpgcat=3 if mpg>26 & mpg<=35 replace mpgcat=4 if mpg>35 tab mpgcat

  11. DATA MANAGEMENT • tabulate….., generate This command is useful for creating a set of dummy variables (variables with a value of 0 or 1) depending on the value of an existing categorical variable. The syntax is: tabold var, gen (new var) Eg: tab rep78, gen(repair) tab foreign, gen(origin) • The old variable is categorical. The new variables will take the form: newvar1, newvar2, newvar3…….

  12. DATA MANAGEMENT EGEN This is an extended version of “generate” to create a new variable by aggregating the existing data. The syntax is: egennewvar = fcn(argument) [if exp] [in range] , by(var) where newvar is the new variable to be created fcn is one of numerous functions such as: count( ) ; max( ); min( ) ; mean( ); median( ); rank( ) ; sd( ); sum( ); argument is normally just a variable var in the by() subcommand must be a categorical variable. Eg: Egenavg=mean(mpg) : creates variable of average mpg over entire sample Egen avg2=median (weight), by (foreign) : creates variable of median weight of cars for each origin. egentotalrepairs=sum(rep78), by(foreign) : generates total repairs of vehicles from each origin. egenprodwgt= sum(weight*price), by (make)

  13. DATA MANAGEMENT recode • This command changes the values of a categorical variable according to the rules specified. The syntax is: • recode varname oldvalue=newvalue oldvalue=newvalue … [if exp] [in range] • recode foreign 0=1 1=2 • Recode rep78 .=9 *=7

  14. DATA MANAGEMENT • recode is also an extension to replace that recodes categorical variables and generates a new variable if the generate () option is used. • recode rep78(1/2=1) (3=2) (4/5=3), gen (repcat) • This creates a new variable that takes on value of 1,2 or 3. The repcat variables is set to missing if rep78 doesn’t lie in any of the ranges given in the recode command.

  15. Xtile • This command creates a new variable that indicates which category a record falls into, when the sample is sorted by an existing variable and divided into n groups of equal size. • The syntax is: • xtilenewvar=variable[if exp][in range],nq(#) Newvar is the new categorical variable created. Variable is the existing variable used to create the quantile. # is the number of different categories. Eg: pctile mpg1quint= mpg, nq(5) pctile weight1dec=weight, nq(5)

  16. LIST The most detailed of the commonly used descriptive commands is list. List displays the values of variables by observation. If varlist is not specified the output will contain the value for every variable. list varlist ,or l varlistEg: list mpg Xi: Indicator Variables A complete set of mutually exclusive categorical indicator dummy variables can be created in several ways. A simpler method is the xi command: xi i.rep78, noomit The noomit option is added because the default setting is to omit the lowest category. INSPECT inspect variable [if exp] [in range] Gives a small histogram, the number of values that are: unique; positive, zero, negative; integer and non-integer; missing.

  17. LABEL VARIABLE This command is used to attach labels to variables in order to make the output easier to understand. For example, we know that maritalstat indicates the marital status of the head of household. But other people using the tables may not know this. So we may want to label the variables as follows: label variable region “Region of country” Label variable maritalstat “marital status” LABEL VALUES This command attaches named set of value labels to a categorical variable. The syntax is: label values varnamelblname where varname is the categorical variable which will get the labels lblname is a set of labels that have already been defined by label define Here are some examples of labeling values in Stata. label variable yield "Yield (tons/hectare)" gives label to variable yield label define yesno 0 no 1 yes defines set of labels called yesno label values electricity yesno attaches labels to the variable “electricity” label define yesno 3 "perhaps", add adds new value label to existing set label define yesno 3 "maybe", modify modifies existing value label label define reglbl 1 West 2 Center 3 East defines regional labels label values region reglbl attaches regional labels to region label define reglbl 2 Central, modify modifies regional labels

  18. TABULATE … SUMMARIZE • This command creates one- and two-way tables that summarize continuous variables. The command tabulate by itself gives frequencies and percentages in each cell (cross-tabulations). With the “summarize” option, we can put means and other statistics of a continuous variable. • The syntax is: tabulate varname1 varname2 [if exp] [in range], summarize(varname3) options • where • varname1 is a categorical row variable • varname2 is a categorical column variable (optional) • varname3 is the continuous variable summarized in each cell • options can be used to tell Stata which statistics you want • tab make, sum(mpg) gives the mean, std deviation, and frequency of mpg for each car model. • tab make, sum(price) mean gives the mean price for each car • tab foreign weight, sum(price)

  19. Tabstat This command gives summary statistics for a set of continuous variable for each value of a categorical variable. The syntax is: tabstatvarlist [if exp] [in range] , stat(statname [...]) by(varname) where varlist is a list of continuous variables statname is a type of statistic varname is a categorical variable. Example:

  20. table This command can creates many types of tables. It is probably the most flexible and useful of all the table commands in Stata. The syntax is: table rowvarcolvar [if exp] [in range], c(clist) [row col] where rowvar is the categorical row variable colvar is the categorical column variable clist is a list of statistic and variables row is an option to include a summary row col is an option to include a summary column Examples: table foreign, c(mean rep78 sd rep78 median rep78) – table of yield statistics by region . table foreign rep78, c(mean mpg) –table of average mpg by foreign rep78 • table foreign, c(mean rep78 mean mpg) –table of average rep78 & mpg by foreign

  21. MODIFYING DATA FILES • This section describes a number of commands that are used to modify and combine data files in Stata. rename , drop , keep, rename This command renames variables. Syntax: renameoldnamenewname • Eg: rename mpg mile_per_gallon drop This command deletes records or variables. drop if price>=4000 drop if foreign==1 keep This command deletes everything but specified observations or variables. Keep if price<=3000 keep mpg rep78 headroom trunk if foreign

  22. PRESENTING DATA WITH GRAPHS • In Stata, graphs are primarily made with the graph command, followed by numerous subcommands for controlling the type and format of graph. In addition to graph, there are many other commands that draw graphs. graph twoway bar pie matrix connect( ) msymbol( ) histogram scatter http://www.stata.com/support/faqs/graphics/piechart.html

  23. PRESENTING DATA WITH GRAPHS graph This command generates numerous types of graphs and diagrams. The syntax is: graph graphtype [varlist] [if exp] [in range] [, options] where graphtype is the type of graph varlist is the list of variables to graph if is used to limit observations that are included based on the exp condition in is used to limit observations that are included based on the case number options are commands to control the look of the graph

  24. graph bar income, over(sexhead) over( locality)

  25. Histograms histogram income, by(sexhead) normal bin(20) histogram income, by(locality) normal bin(20) histogram mpg, by( foreign) normal bin(20) Nb: bin () refers to the number of columns it should include in the histogram

  26. Scatter Plots scatter mpg price scatter mpg price,by(foreign)

  27. PIE CHARTS In Stata, pie and bar charts are drawn using the sum of the variables specified. Therefore, any zero values will not appear in the chart, as they sum to zero and make no difference to the sum of any other values. If you have a categorical variable that contains labeled integers (for example, 0 or 1, or 1 upwards), and you want a pie or bar chart, you presumably want to show counts or frequencies of those integer values. To create pie charts, first run the variable through tabulate to produce a set of indicator variables: Eg: tab foreign, gen (f) graph pie f1 f2 Try: tabulate rep78, generate(r) . graph r1 r2 r3 r4 r5, pie graph r1 r2 r3 r4 r5, bar

  28. Do-file Editor A Do-file is a file that stores a Stata program (a set of commands) so that you can edit it and run it later. The Do-file Editor is like a simplified word processor for writing Stata programs. Why use the Do-file Editor rather than the Command window or the menu system? • It makes it easier to check and fix errors, • it allows you to run the commands later, • it lets you show others how you got your result, and • it allows you to collaborate with others on the analysis. • In general, any time you are running more than 5-10 commands to get a result, it is easier and safer to use a Do-file to store the commands.

  29. LOG FILES • You can click on File/Log to begin or close a log file (Suspend and Resume are to temporarily turn off and on the log). • You can use “log” commands in the Command window • You can use “log” commands in a Do-file.

  30. OPENING FILES STATA FILES (.dta) To open a statafile: usefilename, clear Eg: use "G:\fenergydata.dta", clear usevarlist using filename, clear [for a subset of the data file]. Alternatively you can use the drop down menu bar to import the data • File/open/………………….. (select the data) IMPORTING EXCEL DATA To import data from excel, one has to convert the data into an CSV [tab delimited] format. For non stata files, the command for importing data is “insheet using” • insheet using filename, clear • Eg: insheet using "C:\Users\myjumens\Desktop\fenergydata.csv" • Alternatively you can use the drop down menu bar to import the data. • File/import/ASCII data created by spreadsheet/ …… (select the data)

  31. CODING QUESTIONAIRES INTO STATA • Coding data into STATA can be done in the DATA VIEW • Generate new variables. Eg: gen q1=. gen q2=. • Click Data Editor on the menu bar • Click on Variable manager

  32. Type the variable name Type the variable label Click on the manage to display a new dialog box Click Apply to add your commands into the system

  33. Click on create label • Creating Value Labels Type the value label here Type in the value. Eg: 1 Type in the corresponding label to the values assigned Click on Add

  34. Note that you can create all the value labels for all the questions before exiting the manage value label dialog box • Assign the imputed value labels to their corresponding questions, or variables in the Variables Manager. • Exit the Variables Manager dialog box and go back to the data editor. • You can now type in the coded response.

  35. MICROECONOMETRICREGRESSION ANALYSIS • Ordinary Least Squares • Probit Models • Logit Models • Ordered Probit/Logit Models • Multinomial Logit Models • Tobit Models

  36. Ordinary Least Squares Like most statistical packages, STATA allows users to run some basic regressions such as the OLS. The syntax is: regressdependent var independent var Eg: regress gpa tuce psi reg gpa tuce psi

  37. LOGIT AND PROBIT MODELS • Probit and logit models are among the most widely used members of the family of generalized linear models in the case of binary dependent variables. • These group of models allows researchers to analyse data on issues even though the dependent variables are binary (0, 1). • Eg: yes/ no; married or not married; foreign or domestic

  38. PROBIT MODEL Let us examine whether a new method of teaching economics, PSI, significantly influence performance in later economics courses using the probit model. The dependent variable used is GRADE, which indicates whether a student’s grade in intermediate macroeconomics course was higher than that in the principle course. The probit model is specified as:

  39. Estimation of Probit Model probit grade gpa psi tuce

  40. The basic probit commands report coefficient estimates and the underlying standard errors. These coefficients are the index coefficients and what we can only say is the direction of the effect and partial effects on the Probit index/score. They do not correspond to the average partial effects. • Let’s try to interpret the results: • Tuce: one unit increase in tuce increases the probit index by 0.05 standard deviations. • But are we concerned with an Probit index? No • In analysing binary choice models the parameter of interest are not the index coefficients, rather the marginal/ partial effects.

  41. Marginal Effects • It gives the derivative of the probability that the dependent variable equals one with respect to a particular conditioning variable. In stata these marginal effects can be computed using two methods • dprobit • mfx compute

  42. Interpretation For one unit increase in the dependent variable from the baseline, the probability of an event is expected to increase/decrease For instance one unit increase in GPA from the baseline (3.11), the probability of grade improvement increases by 53.3 %. NB: The interpretation for dummy variables differs: The coefficients are discrete changes not marginal effects The interpretation of PSI is that a student exposed to PSI has a probability of grade improvement of 0.46 greater than another student who is not exposed to the same method.

  43. LOGIT MODEL The logit model yields similar results as the probit model.

More Related