1 / 16

Follow-up from Last Time

Follow-up from Last Time. Getting data on the same line (Elena’s problem) Pull the variables (oldvar1 & oldvar2) out of the database Create separate dataset for each one, sorted by id and drop the other Merge them by id Generate a third variable generate newvar = oldvar1 if source==source1

easter
Download Presentation

Follow-up from Last Time

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Follow-up from Last Time • Getting data on the same line (Elena’s problem) • Pull the variables (oldvar1 & oldvar2) out of the database • Create separate dataset for each one, sorted by id and drop the other • Merge them by id • Generate a third variable • generate newvar = oldvar1 if source==source1 • replace newvar = oldvar2 if newvar==. • Loops in do files • forvaluesi=1/20, or i=10(10)1000 { generate x’i’ = i*10 } • foreach x in varlist (or numlist or macro) { <commands> } • while: local i = 1 while ‘i‘ < 20 { <commands> local i = ‘i‘ + 1 }

  2. Reviewing Commands • Sort • Describe • Summarize • Merge • Collapse • Reshape • Correlate • Generate, replace • regress • graph two-way • predict • test • mkcorr • outreg2 • Other commands: set more off

  3. Debriefing the database: Means

  4. Debriefing the Database • What went wrong along the way? • Population File missing • Code mismatch on polcon • Do file won’t run • Creating operator count variable • Missing data • Source: WDI, polcon, operator db • Other?

  5. Linear Regression Y = Xβ + ε βOLS = (X’X)-1X’Y • X’Y = X’Xβ + X’ε • X’ε = 0 by assumption  β = (X’X)-1X’Y

  6. Why linear regression? • Good foundation for thinking about all analysis. • criteria for estimators • unbiased: E(β*) = β • efficient: σ2(β*) < σ2(β) • asymptotic properties: plim β* • montecarlo studies for small sample properties • maximum likelihood estimation • given a population distribution, which parameters of the distribution best match the observed data? • For normal error term, βMLE = βOLS • R2 • error term • Many of the problems we discuss in regression are found in the assumptions concerning the error term: probability distribution, variance, correlation ^

  7. Assumptions of the Classic Linear Model

  8. More general frameworks build from the linear model • (feasible) Generalized Least Squares: GLS or fGLS • Weighted least squares with sample variance/covariance as the weighting matrix • reg3 or xtgls • Generalized Linear Model: GLM • g{E(y)} = xβ, y ~ F • g{} is the link function • F is the distribution family • Classical model with normal errors: • g{} is identity & y ~ Normal • Alternatives: • g{}: logarithmic, logit, probit, complementary log-log, negative binomial • F: normal, binomial, poisson, negative binomial, gamma • glm or xtgee

  9. Rest of class: homework • Discuss missing data: how might it affect your analysis? What do you know about the differences between the known values and the missing values? • Create a categorical variable for polcon • polcon_hi = 1 if polcon is greater than median, 0 otherwise • Scatter plot mobile_subs x polcon_cat • Add a regression line to the scatter plot • Scatter plot mobile_subs x gni/cap • Add a quadratic line • Add a confidence interval to the quadratic line • Create a lagged variable for mobile subs • Build a regression model for mobile_subs • Start with one variable & build to full model • How does the output change? In the final analysis, which variable would you want to start with? End with? • Are there any variables that should not be included? • Which variables have a meaningful effect? • Which variable seems to increase the R2 the most? • Which variable would make the most sense to include with a nonlinear effect? • Diagnostics • graph residuals • Test for equal variance • Graph marginal effect of each variable • Graph predicted y for range of population • Choose two coefficients and test that they are different from one another • Create a correlation table and regression table with your results • Hand in: Corr & Regression tables, graphs of marginal effects, written answers to questions above

  10. Missing Data • Summarize • Compare: pick most incomplete variable • Take a relatively complete descriptive variable, such as pop or GDP • Test if mean is different for observations where the incomplete variable is defined and missing • Sort & browse • Examine observations for differences where the variable is missing

  11. Categorical Variable • Where is the median stored? • Summarize polcon • r(p50) gives the median [r(N), r(mean), r(max), r(Var)] • gen polcon_hi = 0 • replace polcon_hi = 1 if polcon>r(p50) • Scatter mobile_subspolcon_hi • Why doesn’t this look great? • jitter • Add two lines: • Scatter mobile_subspolcon_hi|| lfitmobile_subspolcon_hi • Scatter mobile_subspolcon_hi|| lfitcimobile_subspolcon_hi

  12. Graph quadratic fit & confidence intervals • Scatter mobile_subsgnipercap • Add a quadratic line • || qfitmobile_subsgnipercap • || qfitcimobile_subsgnipercap

  13. Lagged variable • Start with wdi_mobile • Easy lag: redefine Y2001 as mobile_lag • Reshape long • Hard lag: often necessary • Sort id year • gen mobilesubs_lag = mobilesubs[_n-1] • keep if year==2002 • keep id mobilesubs_lag • merge into database

  14. Regression • regress mobile_subsgdp pop gnipercaptelpolcon ops • graph residuals • rvfplot (vs. fitted), rvpplot (vs. predictor) • test for equal variance • estathettest • test for omitted variable • estatovtest • robust estimation: • “White-Huber heteroskedasticity-consistent estimator”, “sandwhich estimator” “White-washing the data” • regress <outcome variable> <explanatory variables>, vce(robust) • graph added effect of each variable • avplots

  15. Post-estimation • Predict • Predict yhat • Estimates • store output for analysis, eg for hausman test • Test • simple and composite Wald tests • lrtest

  16. Making tables • Correlation table • mkcorr • Regression table • outreg2

More Related