Censoring

ECON 6002 Econometrics Memorial University of Newfoundland Limited Dependent Variable Models Censoring Adapted from Vera Tabakova’s notes

Censoring, Truncation, sample selection and related models • We nowconsider two closely related models: • regression when the dependent variable of interest is incompletely observed (due to censoring or truncation) • regression when the dependent variable is completely observed but is observed in a selected sample that is not representative of the population Principles of Econometrics, 3rd Edition Slide16-2

Censoring, Truncation, sample selection and related models OLS regression yields inconsistent estimates because the sample is not representative of the population The first-generation estimation methods require strong distributional assumptions and even seemingly minor departures from those assumptions, such as heteroskedasticity, can lead to inconsistency Principles of Econometrics, 3rd Edition Slide16-3

16.7 Limited Dependent Variables • 16.7.1 Censored Data Figure 16.3 Histogram of Wife’s Hours of Work in 1975 Principles of Econometrics, 3rd Edition

16.7.1 Censored Data Having censored data means that a substantial fraction of the observations on the dependent variable take a limit value. The regression function is no longer given by (16.30). The least squares estimators of the regression parameters obtained by running a regression of y on x are biased and inconsistent—least squares estimation fails. Principles of Econometrics, 3rd Edition

Censoring versus Truncation Censoring occurs when some of the observations of the dependent variable have been recorded as having reached a limit value regardless of what their actual value might be For instance, anyone earning $1 million or more per year might be recorded in your dataset at the upper limit of $1 million

Censoring versus Truncation With truncation, we only observe the value of the regressors when the dependent variable takes a certain value (usually a positive one instead of zero) With censoring we observe in principle the value of the regressors for everyone, but not the value of the dependent variable for those whose dependent variable takes a value beyond the limit

16.7.2 A Monte Carlo Experiment We give the parameters the specific values and Assume Principles of Econometrics, 3rd Edition

16.7.2 A Monte Carlo Experiment • Create N = 200 random values of xi that are spread evenly (or uniformly) over the interval [0, 20]. These we will keep fixed in further simulations. • Obtain N = 200 random values ei from a normal distribution with mean 0 and variance 16. • Create N = 200 values of the latent variable. • Obtain N = 200 values of the observed yi using Principles of Econometrics, 3rd Edition

16.7.2 A Monte Carlo Experiment Figure 16.4 Uncensored Sample Data and Regression Function Principles of Econometrics, 3rd Edition

16.7.2 A Monte Carlo Experiment Figure 16.5 Censored Sample Data, and Latent Regression Function and Least Squares Fitted Line Principles of Econometrics, 3rd Edition

16.7.2 A Monte Carlo Experiment Principles of Econometrics, 3rd Edition

16.7.3 Maximum Likelihood Estimation The maximum likelihood procedure is called Tobit in honor of James Tobin, winner of the 1981 Nobel Prize in Economics, who first studied this model. The probit probability that yi = 0 is: Principles of Econometrics, 3rd Edition

16.7.3 Maximum Likelihood Estimation The maximum likelihood estimator is consistent and asymptotically normal, with a known covariance matrix. Using the artificial data the fitted values are: Principles of Econometrics, 3rd Edition

16.7.3 Maximum Likelihood Estimation Principles of Econometrics, 3rd Edition

16.7.4 Tobit Model Interpretation Because the cdf values are positive, the sign of the coefficient does tell the direction of the marginal effect, just not its magnitude. If β2 > 0, as x increases the cdf function approaches 1, and the slope of the regression function approaches that of the latent variable model. Principles of Econometrics, 3rd Edition

16.7.4 Tobit Model Interpretation Figure 16.6 Censored Sample Data, and Regression Functions for Observed and Positive y values Uncensored mean Truncated mean Censored mean Principles of Econometrics, 3rd Edition

16.7.5 An Example 26.66 Marginal effect on the observed hours while 73.29 is the effect on the underlying “unconditional” hours* *NB: in all cases the expectation is conditional on the values of the regressors, so do not get confused by the terminology here Principles of Econometrics, 3rd Edition

16.7.5 An Example Principles of Econometrics, 3rd Edition

Postestimation and interpretation • Estimating the model by OLS with the zero observations in the model would reduce all of the slope coefficients substantially • Eliminating the zero observations as in the OLS regression just shown even reverses the sign of the effect of years of schooling (though it is a non-significant effect) • For only women in the labor force, more schooling has no effect on hours worked • If you consider the entire population of women, however, more schooling does increase hours, but we can now see that it is likely by encouraging more women into the labor force, not by encouraging those already in the market to work more hours Principles of Econometrics, 3rd Edition

STATA commands that help you with the complex marginal effects calculations in this chapter see: • http://www.stata.com/support/faqs/statistics/mfx-after-ologit/#intreg • There are several marginal effects of potential interest after -tobit-: • the marginal effect on the expected value of the latentdependent variable (on E(y*), simply given by the Tobit estimate) • the marginal effect on the expected value of the dependent variable conditional on its being greater than the lower limit (on E(y|x, y>0)=E(y*|x, y>0)) • the marginal effect on the expected value of the observed (that is zeros included) dependent variable (on E(y|x), given by Expression 16.35) • the marginal effect on the probability of the dependent variable exceeding the lower limit Principles of Econometrics, 3rd Edition

STATA commands that help you with the complex marginal effects calculations in this chapter see: http://www.stata.com/support/faqs/statistics/mfx-after-ologit/#intreg By default Stata chooses the effect on the latent variable option, which are exactly the same as the coefficients estimated by -tobit-. You will have to specify the -predict()- option in -mfx- to get the other marginal effects. See help mfx- help tobitpostestimation- Principles of Econometrics, 3rd Edition

STATA commands that help you with the complex marginal effects calculations in this chapter see: • http://www.stata.com/support/faqs/statistics/mfx-after-ologit/#intreg • the marginal effect on the expected value of the latentdependent variable (on E(y*), simply given by the Tobit estimate) • the marginal effect on the expected value of the dependent variable conditional* on its being uncensored, that is, greater than the lower limit (on E(y|x, y>0)=E(y*|x, y>0)) • mfx compute, predict(e(0,.)) • mfxcompute, predict(e(a,b)) • *NB: in all cases the expectation is conditional on the values of the regressors, so do not get confused by the terminology here Principles of Econometrics, 3rd Edition

STATA commands that help you with the complex marginal effects calculations in this chapter see: • http://www.stata.com/support/faqs/statistics/mfx-after-ologit/#intreg • the marginal effect on the expected value of the observed (that is, zeros included) dependent variable (on E(y|x), given by Expression 16.35) • mfx compute, predict(ys(0,.)) • mfx compute, predict(ys(a,b)) • the marginal effect on the probability of the dependent variable exceeding the lower limit • mfx compute, predict(p(0,1)) • mfx compute, predict(p(a,b)) Principles of Econometrics, 3rd Edition

Interval regression • Interval dataare data recorded in intervals rather than as a continuous variable • Survey data are often collected in this way to make it easier for the respondent and to provide some greater anonymity in responses to more personal question such as income and age • Income is often reported in intervals of $10,000 and then topcodedat a figure like $100,000 or $130,000 • In contingent valuation studies, sometimes a questions to elicit willingness to pay ask respondents to choose an interval Such data are then censored at multiple points, with the observed data y being only the particular interval in which the unobservedy∗ lies Principles of Econometrics, 3rd Edition

Interval regression • Interval dataare data recorded in intervals rather than as a continuous variable • In these cases you have a multi-censored dependent variable Principles of Econometrics, 3rd Edition

Interval regression • Interval dataare data recorded in intervals rather than as a continuous variable • STATA’s intreg will help with this model Principles of Econometrics, 3rd Edition

Interval regression • Interval dataare data recorded in intervals rather than as a continuous variable • In contingent valuation studies, sometimes a double-bound dichotomous-choice questions to elicit willingness to pay • In these cases you have a doubly-censored dependent variable with two variable limits • STATA’s intreg will help with this model Principles of Econometrics, 3rd Edition

Interval regression • Interval dataare data recorded in intervals rather than as a continuous variable • You are probably guessing that another (less flexible) way to model these cases is by using an ordered regression model • The ordered probit in particular would be quite close to the interval regression model Principles of Econometrics, 3rd Edition

Interval regression • Interval dataare data recorded in intervals rather than as a continuous variable • STATA’s intreg will help with this model • Example: http://www.ats.ucla.edu/stat/stata/dae/intreg.htm Principles of Econometrics, 3rd Edition

Interval regression • STATA’s intreg will help with this model • intreg depvar1 depvar2 [indepvars] [if] [in] [weight] [, options] • By choosing the depvar1 depvar2 smartly you can also fit other models: Type of data depvar1 depvar2 ---------------------------------------------- point data a = [a,a] a a interval data [a,b] a b left-censored data (-inf,b] . b right-censored data [a,inf) a . ---------------------------------------------- Principles of Econometrics, 3rd Edition

Keywords • binary choice models • censored data • latent variables • likelihood function • limited dependent variables • log-likelihood function • marginal effect • maximum likelihood estimation • multinomial choice models • ordered choice models • ordered probit • ordinal variables • probit • tobitmodel • truncated data Principles of Econometrics, 3rd Edition

Further models Survival analysis (time-to-event data analysis)

References Hoffmann, 2004 for all topics Long, S. and J. Freese for all topics Agresti, A. (2001) Categorical Data Analysis (2nd ed). New York: Wiley.

Censoring

Censoring

Presentation Transcript

10. Censoring, Tobit and Two Part Models

16. Censoring, Tobit and Two Part Models

Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

COMP 2903 A35- The List: Look Who’s Censoring the Internet Now

Informative Censoring Addressing Bias in Effect Estimates Due to Study Drop-out

Optimal Design of ALT under Progressive Type I Interval Censoring with Random Removals

Informative Censoring Addressing Bias in Effect Estimates Due to Study Drop-out