IPSS Ch 2. Selection Problem

IPSS Ch 2. Selection Problem • 2.1. The Nature of the Problem Non-Response, Dropped from Census, Sample Attrition in Longitudinal Survey, Censored Data We (Social Scientists) are interested in Treatment-Effects, e.g., • What is the effect of Treatment on Y ? Schooling Market Wages Welfare Labor supply Sentencing Policy Crime commission New Drug AIDS patients Surgery Life span Chemotherapy Life span We can’t observe the differences.

IPSS Ch 2. Selection Problem Selection Problem Example: Market Wage depends on, Schooling, Work Experience, Demographic Background (covariates) Note: The selection problem is logically separate from the extrapolation problem. (New Challenge) Extrapolation Problem -arises from the fact that random sampling does not yield observations of y off the support of x. Selection Problem - arises when a censored random sampling process does not fully reveal the behavior of y on the support of x.

IPSS Ch 2. Selection Problem • Binary outcome z (y,z,x) z = 1 if y is observed, z = 0 if not observed Observe yonly when z =1. Example: y : Market Wage x : Education, Work Experience, Race, Sex, …(covariates) z : observation z =1 observed, z = 0 not observed

IPSS Ch 2. Selection Problem • Outline of Chapter 2 2.2 worst case scenario: no information on g 2.3 an empirical illustration 2.4 identifying power of prior information 2.5 – 2.8 problems of identifying treatment effects

IPSS Ch 2. Selection Problem • 2.2. Identification from Censored Samples Alone • Two Negative Facts • Fact 1. Conditional Probability • Assume exogenous or ignorable selection • (2.3) P(y| x, z = 0) = P(y| x, z = 1) • P(y| x) = P(y| x, z = 1) • Can we refute validity of (2.3)? No! • Assumption (2.3) is necessarily consistent with the empirical evidence.

IPSS Ch 2. Selection Problem Statistical Inference ・The selection problem is a failure of identification. The bounds are functions of P(y| x, z = 1) and P(z| x). Wecan estimate the features of these distributions, and obtain estimates of the bounds. Example: to estimate the bound (2.6) on P(y∈B| x) Estimate P(y∈B| x, z = 1) and P(z = 1| x) as in Section 1.3. The precision of an estimate of the bound can be measured by confidence interval around the estimate.

IPSS Ch 2. Selection Problem Distinction between the bound and the confidence interval (around its estimate) The bound on P(y∈B| x) is a population concept. what could be learned about P(y∈B | x) if one knew P(y∈B| x, z = 1) and P(z| x). The confidence interval is a sampling concept. the precision with which the bound is estimated when estimates of P(y∈B| x, z = 1) and P(z| x) are obtained from a sample of fixed size. The confidence interval is typically wider than the bound but narrows to match the bound as the sample size increases.

IPSS Ch 2. Selection Problem 2.3. Bounding the Probability of Exiting Homelessness Population: Homeless People at time t0 Outcome (y): y = 1 Home y = 0 Still Homeless Background (x): race, sex, education, etc Selection: z = 1 interviewed, z = 0 not interviewed Conditioning Variable: Sex Male Sample size at t0: 106 Sample size at t1: 64 21 out of HL P(y=1| male, z = 1) = 21/64 P(z=1| male) = 64/106 Bound of P(y=1| male) [21/106, 63/106] = [0.20,0.59]

IPSS Ch 2. Selection Problem Female Sample size at t0: 31 Sample size at t1: 14 3 out of HL Bound of P(y = 1| female) [3/31, 20/31] = [0.10, 0.65] Point : Without restrictions on the attrition process, we have got meaningful bounds Continuous case Condition: Sex, Income Income : What was the best job you ever had? ($/week) Sample size Male 89 Female 22

IPSS Ch 2. Selection Problem Fig.2.1 Attrition Probabilities P(z=0| x)

IPSS Ch 2. Selection Problem ・The estimated bound is tightest at the low end of the income domain and spreads as income increases. The interval ： [.24, .55] at income $50 [.23, .66] at income $600. ・This spreading reflects the fact that the estimated probability of attrition increases with income. Is the Cup Part Empty or Part Full? P(male exits HL) = P(y = 1|male) : [.20, .59] Improvement from [0.0, 1.0] Can we narrow the interval? Can we pin down the P(y = 1| male)?

IPSS Ch 2. Selection Problem