zlyang@smu.sg mysmu/faculty/zlyang/

ECON686: Panel Data Analysis Term II, 2018-2019 zlyang@smu.edu.sg http://www.mysmu.edu/faculty/zlyang/

Chapter 1: Introduction This chapter presents some basics for panel data analysis, and basics for using the popular software Stata, including • Concept of panel data or longitudinal data • Popular sources and examples of panel data • Benefits and limitations of panel data • Basic panel data models, and related important concepts. • Basics for Stata • An overview of the course

What is Panel Data? Panel data refers to observations made on Nunits (individuals, households, firms, countries, etc.), over T points in time. • In economics and social sciences, this can be achieved by surveying a number of units, and following them over time. • If observations are made on N units at a fixed time point, we obtain a cross-section data; if observations are made on one unit over T time periods, we obtain a time series data. • Panel data combine the two, and panel data analysis represents a marriage of regression and time series analysis. • Panel data are usually observed at regular time intervals (monthly, yearly, etc.), and are balanced (all units are observed at all periods). • Panel data could be a short panel (many units and few time periods), a long panel (many time periods and few units), and a large panel (many units and many time periods).

Sources of Panel Data In economics, short panels are synonymous with micro panels, and panels with small to moderate N are called macro panels. There are many open sources for construction of panel data. • The well-known sources for micro panel data include: • Panel Study of Income Dynamics (PSID), by Institute of Social Research at University of Michigan, http://psidonline.isr.umich.edu. • National Longitudinal Survey (NLS), a set of surveys sponsored by the Bureau of Labor Statistics, http://www.bls.gov/nls/home.htm. • Current Population Survey (CPS), by Bureau of Census for the Bureau of Labor Statistics, http://www.census.gov/cps. • Living Standard Measurement Study (LSMS), by World Bank, http://www.worldbank.org/LSMS. • German Social-Economic Panel, http://www.diw.de/soep. • Canadian Survey of Labor Income Dynamics (SLID), collected by Statistics Canada, www.statcan.gc.ca.

Sources of Panel Data • Japanese Survey on Consumers (JPSC), www.kakeiken.or.jp. • German Social-Economic Panel, http://www.diw.de/soep. • Russian Longitudinal Monitoring Survey, 1992, by Carolina Population Center, U. of North Carolina, http://www.cpc.unc.edu/projects/. • Korea Labor and Income Panel Study, http://www.kli.re.kr/klips/en/about/introduce.jsp. • The well-known macro panels include: • Penn World Table (PWT), www.nber.org, 188 Countries, 1950-2004. • World Bank, http://data.worldbank.org. • International Monetary Fund (IMF), www.imf.org. • United Nations, http://unstats.un.org/unsd/economic_main.htm. • European Central Bank, www.ecb.int. See Sec.1.1 of Baltagi (2013) for details on sources of panel data. There are also many ready-for-use panel data sets. For example,

Example 1: Statewide Capital Productivity The data, from Munnell (1990), gives indicators related to public capital productivity for 48 US states observed over 17 years (1970-1986). It can be downloaded by clicking the link below: http://people.stern.nyu.edu/wgreene/Econometrics/PanelDataEconometrics.htm and then choosing “Panel Data Sets”. It has been extensively used for illustrating the applications of the regular panel data models, and recently the applications of spatial panel data models.

Example 1: Statewide Capital Productivity • Variables in the data file (productivity.csv) are: • STATE = state name • ST_ABB = state abbreviation • YR = year, 1970,...,1986 • P_CAP = public capital • HWY = highway capital • WATER = water utility capital • UTIL = utility capital • PC = private capital • GSP = gross state product • EMP = employment • UNEMP= unemployment rate • See Baltagi (2005, p. 25) for the analysis of these data. The article on which the analysis is based is Munnell, A., "Why has Productivity Declined? Productivity and Public Investment," New England Economic Review, 1990, pp. 3-22. The data can also be downloaded from the website for Baltagi'stext: https://www.wiley.com/legacy/wileychi/baltagi3e/

Example 2: Cigarette Demand This is another well known panel data that has been applied under various panel data model frameworks, non-spatial or spatial, fixed effects or random effects, static or dynamic. In particular, the demand equations for cigarettes for United States were estimated, based on a panel of 46 states over 30 time periods (1963-1992), given on the Wiley website for Baltagi (2005): https://www.wiley.com/legacy/wileychi/baltagi3e/. Variables in the data file Cigar.txtare: (1) STATE = State abbreviation. (2) YR = YEAR. (3) Price per pack of cigarettes. (4) Population. (5) Population above the age of 16. (6) CPI = Consumer price index with (1983=100) (7) NDI = Per capita disposable income. (8) C = Cigarette sales in packs per capita. (9) PIMIN = Minimum price in adjoining states per pack of cigarettes. Several time dummies corresponding to the major policy interventions in 1965, 1968 and 1971 can be added into the model.

Example 3: Returns to Schooling Data The Returns to Schooling Data with 595 Individuals and 7 Years, were analysed in Cornwell, C. and Rupert, P. (1988), "Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied Econometrics, 3, pp. 149-155. See Baltagi (2005, Sec. 7.5) for further analysis. The data were downloaded from the same websites. Variables in the file cornwell&rupert.csv are EXP = work experience WKS = weeks worked OCC = occupation, 1 if blue collar, IND = 1 if manufacturing industry SOUTH= 1 if resides in south SMSA = 1 if resides in a city (SMSA) MS = 1 if married FEM = 1 if female UNION= 1 if wage set by unioincontract ED = years of education BLK = 1 if individual is black LWAGE = log of wage

Why Should We Use Panel Data • There are numerous benefits of using panel data: • Panel data enable us to control for individual heterogeneity; • Panels give more informative data, more variability, less collinearity among the variables, more degrees of freedom, and more efficiency; • With panel data, one is better able to study dynamics of adjustment; • They are more suitable for identifying and measuring effects that are not detectable in pure cross-section and pure time-series data; • They allow us to construct and test more complicated behavioural models than do pure cross-section and pure time-series data; • Micro panel data gathered on individuals, firms, and households can be measured more accurately than similar variables measured at the macro level. Biases resulted from aggregation over time or individuals may be reduced or eliminated; • Macro panel data, on the other hand, have longer time series, and panel unit root tests have standard asymptotic distributions.

Why Should We Use Panel Data • Limitations of panel data include: • Design and data collection problems; • Distortions of measurement errors; • Selectivity problems: • Self selection: • Nonresponse: • Attrition: • Short time series dimension; • Cross-section dependence: macro panels on countries or regions with long time series that do not account for cross-country dependence may lead to misleading inference. See Sec.1.2 of Baltagi (2013) for details on the limitation of panel data.

Basic Panel Data Models A panel data regression differs from a regular cross-section or a time series regression in that it has a double subscripts, i.e., ,i = 1, , N; t= 1, , T. (1.1) where i represents individuals, households, firms, countries, etc., and t represents time, is a scalar parameter, is a K1 vector of parameters, is the itth observation on K explanatory variables. • If the set of disturbances {} are independent, and identically distributed (iid), then Model (1.1) is no different from a regular multiple linear regression model. • If , where denotes the unobservable individual-specific effects, and are the remainder disturbances (idiosyncratic errors) which are iid across i and t, then Model (1.1) is called the one-way error component model.

Basic Panel Data Models • If further , where denotes the unobservable time-specific effects, then Model (1.1) is called the two-way error component model. • The is time-invariant and could be thenindividual’s unobserved ability (innate ability); the is individual-invariant and could be the unobserved macro economic shock at time t. • The individual and time effects and could be correlated with the time-varying regressors in an arbitrary manner. If this is the case, and have to be treated as unknown parameters, giving rise to a panel data model called the fixed effects model; • Otherwise, if and are uncorrelated with , then they are treated as iid random variables, giving the random effects model. • Thus, we could have (i) one-way fixed effects model, (ii) one-way random effects model, (iii) two-way fixed effects model, and (iv) two-way random effects model. The fixed vs random effects specification is an important issue in panel data modelling.

STATA Basics We are using Stata/SE 15 for Windows for this course. Other software such as R and Matlab can be used, but on your own. • For new Stata users, we suggest entering Stata by clicking on the Stata icon, opening one of the Stata example data sets, and doing some basic statistical analysis. To use the menus: • Select File > Example datasets... . • Click on Example datasets installed with Stata. • Click on describe for auto.dta, #for descriptions of variables. • Click on use for auto.dta, #to read the dataset into Stata. • We can get a quick glimpse at the data by browsing them in the Data Editor. • This can be done by clicking on the Data Editor (Browse) button, • or by selecting Data > Data Editor > Data Editor (Browse) from the menus, • or by simply typing the command browse in the command window.

STATA Basics sysdescribeauto.dta Contains data 1978 Automobile Data obs: 74 13 Apr 2016 17:45 vars: 12 size: 3,478 ----------------------------------------------------------------------- storage display value variable name type format label variable label ----------------------------------------------------------------------- make str18 %-18s Make and Model price int %8.0gc Price mpg int %8.0g Mileage (mpg) rep78 int %8.0g Repair Record 1978 headroom float %6.1f Headroom (in.) trunk int %8.0g Trunk space (cu. ft.) weight int %8.0gc Weight (lbs.) length int %8.0g Length (in.) turn int %8.0g Turn Circle (ft.) displacement int %8.0g Displacement (cu. in.) gear_ratio float %6.2f Gear Ratio foreign byte %8.0g origin Car type ----------------------------------------------------------------------- Sorted by: foreign

STATA Basics

STATA Basics • The auto.dta is a cross-section data, and the key Stata commands for analyzing cross-section data are • browse: to see the data, • describe: describing the data, • summarize: summarizing the cross-section data, • regress: performing linear regression of a response variable on a set of explanatory variables. • For more information, see Stata manual gsw.pdf: • click Help > PDF documentation > [GS] Getting started > [GSW] Getting started with Stata for Windows, • or go folder: Program Files (x86) in C: drive, • locate folder Stata15 > docs, and find file gsw.pdf. • The most useful manual is [U] User’s Guide, or the file u.pdf.

STATA Basics  describe Contains data from C:\Program Files (x86)\Stata15\ado\base/a/auto.dta obs: 74 1978 Automobile Data vars: 12 13 Apr 2016 17:45 size: 3,182 (_dta has notes) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- make str18 %-18s Make and Model price int %8.0gc Price mpg int %8.0g Mileage (mpg) rep78 int %8.0g Repair Record 1978 headroom float %6.1f Headroom (in.) trunk int %8.0g Trunk space (cu. ft.) weight int %8.0gc Weight (lbs.) length int %8.0g Length (in.) turn int %8.0g Turn Circle (ft.) displacement int %8.0g Displacement (cu. in.) gear_ratio float %6.2f Gear Ratio foreign byte %8.0g origin Car type ------------------------------------------------------------------------------- Sorted by: foreign

STATA Basics  summarize price mpg headroom trunk weight length turn gear_ratio Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- price | 74 6165.257 2949.496 3291 15906 mpg | 74 21.2973 5.785503 12 41 headroom | 74 2.993243 .8459948 1.5 5 trunk | 74 13.75676 4.277404 5 23 weight | 74 3019.459 777.1936 1760 4840 -------------+--------------------------------------------------------- length | 74 187.9324 22.26634 142 233 turn | 74 39.64865 4.399354 31 51 gear_ratio | 74 3.014865 .4562871 2.19 3.89

STATA Basics regress price mpg headroom trunk weight length displacement foreign Source | SS df MS Number of obs = 74 -------------+---------------------------------- F(7, 66) = 13.25 Model | 371020030 7 53002861.4 Prob > F = 0.0000 Residual | 264045367 66 4000687.37 R-squared = 0.5842 -------------+---------------------------------- Adj R-squared = 0.5401 Total | 635065396 73 8699525.97 Root MSE = 2000.2 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -15.34676 70.80743 -0.22 0.829 -156.7183 126.0248 headroom | -673.1991 372.6962 -1.81 0.075 -1417.311 70.91289 trunk | 60.50331 91.91919 0.66 0.513 -123.0193 244.0259 weight | 4.596442 1.197555 3.84 0.000 2.205446 6.987438 length | -83.39048 35.7935 -2.33 0.023 -154.8545 -11.92645 displacement | 10.0928 5.946751 1.70 0.094 -1.780274 21.96587 foreign | 3764.15 664.6912 5.66 0.000 2437.051 5091.249 _cons | 6357.472 5315.07 1.20 0.236 -4254.406 16969.35 ------------------------------------------------------------------------------

STATA Basics • For analyzing panel data, • click Help > PDF documentation > [XT] Longitudinal Data/Panel Data, • or go folder: Program Files (x86) in C: drive, locate folder Stata15 > docs, and find file xt.pdf. • All Stata manuals can be found in Help > PDF documentation. Or directly go to the folder where Stata is installed, typically, C: > Program Files (x86) >Stata15 > docs. • The manual xt.pdf documents the xt commands and is referred to as [XT] in cross-references. If you are new to xtcommands, we recommend that you read the following sections first: • [XT] xt Introduction to xt commands • [XT] xtset Declare a dataset to be panel data • [XT] xtreg Fixed-, between-, and random-effects, and population-averaged linear models.

STATA Basics Setup xtsetDeclare data to be panel data Data management and exploration tools xtdescribeDescribe pattern of xt data xtsumSummarize xt data xttabTabulate xt data xtdataFaster specification searches with xt data xtlinePanel-data line plots Linear panel regression estimators xtreg Fixed-, between-, and random-effects, and population-averaged linear models xtregar Fixed- and random-effects models with an AR(1) disturbance xtgls Fit panel-data models by using GLS xtpcse Linear regression with panel-corrected standard errors xthtaylorHausman–Taylor estimator for error-components models xtivreg Instrumental variables and two-stage least squares

STATA Basics • Other Useful Commands: • help: find information for a Stata command, e.g., at the command window, type “help regress”, “help function”; • search: it does a keyword search, and is useful if the Stata command is not exactly known, e.g., “search ols”; • findit: it provides broadest possible keywords search, “findit weak instr”; • hsearch: Unlike the findit command, it uses a whole word search, e.g., “hsearchweak instrument”. • Arithmetic, relational, and logical operators: • The arithmetic operators in Stata are: + (addition),  (subtraction), * (multiplication), / (division), ^ (raised to a power), and – (negation). For example, to compute and display: , we type in the command window: display -2*(9/(8+2-7))^2, resulting . display -2*(9/(8+2-7))^2 -18

STATA Basics • Matrix and Matrix Calculations: • The Stata manual [P] matrix, or p.pdf, summarizes the matrix commands. • matrix define: defining a matrix, e.g., “matrix define A = (1,2,3 \4,5,6)” • matrix list: showing the content of matrix A • Scalar c = A[2, 3]: assigning the (2,3)-element of A to a scalar c. The matrix monadic operators are -B negation B'transpose • Matrix Dyatic Operators: • B \ C add rows of C below rows of B (row join) • B , C add columns of C to the right of B (column join) • B + C addition • B - C subtraction • B * C multiplication (including mult. by scalar) • B / z division by scalar • B # C Kronecker product Type in the command window: help matrix to get more information on matrix manipulations.

Course Overview In this course, we focus on the common panel data models, and their implementations using Stata. We also provide many real data applications. Major topics include: Panel Data Models with One-way Effects Panel Data Models with Two-way Effects Test Hypotheses with Panel Data Heteroskedasticity and Serial Correlation Dynamic Panel Data Models Spatial Panel Data Models Course Outline

zlyang@smu.sg mysmu/faculty/zlyang/