Measurement Error: The difference between an observed variable and the variable that belongs in a multivariate regressio

Measurement Error:The difference between an observed variable and the variable that belongs in a multivariate regression equation. (Wooldridge, 866).

The Centrality of Measurement • Measuring abstract concepts is central to testing political science, economics, and policy theories • Statistical analyses are of little use if not connected to broader concepts and theories • Theories and concepts are of little use if they cannot be measured and tested

The Goals of Measurement • Validity: The extent to which operational measures reflect the concepts they are intended to represent • Reliability: The extent to which operational measures can be replicated • In practice, we may need to trade off between validity and reliability

4 Types of Measurement Error • Systematic Measurement Errors • Constant Bias • Variable Bias • Stochastic Measurement Errors 3) Errors in the Dependent Variable 4) Errors in the Independent Variables

1. Systematic but Constant Error • All values of x or y (or both) are biased by the same amount • Descriptive statistics are obviously wrong • In regression analyses, estimated constant will be biased • Coefficients for variables (and therefore causal inference) will NOT be biased

Example: Reported Tolerance • Imagine everyone exaggerates their level of tolerance • Intercept and descriptive stats are inflated by bias • Slope coefficients do not change

2. Systematic Variable Bias • The measurement error is associated with changes in the independent variable. • This causes SEVERE problems (not surprisingly) • Coefficients are biased • Direction depends upon the nature and cause of the bias • All inferences – both descriptive and causal – are biased • Our job is think hard about the direction of the bias.

Example: (Mis)Reported Tolerance • Imagine educated people are more aware of or swayed by “socially correct” answers • That means, the more educated, the worse the predictions! • Descriptive Inferences are wrong • Causal inferences are wrong

3. Stochastic Error in Measuring the Dependent Variable • y*=y + ε (measurement error) • This scenario creates few problems • Dependent variable already HAS a stochastic component • Measurement error simply increases its size • All coefficients and descriptive inferences are unbiased • Coefficient estimates are less efficient

Example: Capturing Real Tolerance • “Tolerance” is a complex concept that is difficult to measure with survey response • All inferences are unbiased • Data is “noisy” and inferences are less certain

Showing Inferences are Unbiased

Showing Inferences are Unbiased • Thus we can substitute our error filled measure of y and get unbiased estimates IF: • E(u)=0 The expected value of the error is equal to 0. AND • E(ε)=0 The expected value of the residual with measurement error is equal to 0. • In essence, the errors on either side of the regression line cancel each other out. Were the errors in one direction, we would have bias.

Showing the Estimates are Inefficient

Showing the Estimates are Inefficient Numerator Denominator

Showing the Estimates are Inefficient • The variance of the measurement error appears in the numerator and not the denominator of the variance of the predictor. • Thus the overall variance of βhat will be increased by the variance of the measurement error divided by the variance of X

4. Stochastic Measurement Error in the Independent Variables • x*=x+ε (measurement error) • As you recall, to derive OLS we assumed x’s were fixed and thus measured without error (!) • Not surprisingly, if x is measured with error, then OLS is biased • BUT what do we know about the direction of the bias?

Stochastic Measurement Error in the Independent Variables • It turns out that if x is measured with stochastic error, the coefficients are biased toward zero. • In other words, the coefficients are always underestimated, never overestimated • This is critical for hypothesis testing

Stochastic Measurement Error in the Independent Variables • Thus a coefficient may be insignificant due to random measurement error • But if we observe a small coefficient despite random measurement error, then we know that the true coefficient would be of larger absolute value

Example: Tolerance as an Independent Variable • Again, “tolerance” is difficult to measure with survey response, but now it’s the independent variable (x) not the dependent variable (y) • Now errors “stretch out” the data • Bias estimated coefficient toward 0

Showing Bias is Toward Zero Denominator Numerator

Showing Bias is Toward Zero • Thus the variance of the measurement error appears in the denominator but not the numerator • Therefore, the larger the measurement error, the closer are estimator βhat will be to 0. This underestimation is known as attenuation bias (Wooldridge, 323). • As a result the coefficient is biased toward zero, but it cannot pass through zero and flip signs.

Coping With Systematic Measurement Error • Be VERY aware of measurement errors that could be correlated with your independent variables. • If feasible, distance yourself from the measurement of variables you collect • Have others code data for you • Use multiple coders • Code cases in a “blind” fashion. (No peeking at the y-values).

Bias towards state sector Attitude towards the private sector Attitude is improving over time Firm Perceptions Attitude does not depend on revenue contribution SOEBias Rating of provincial equitization policy Percentage of loans to SOE from state bank branches Change in the number of local SOEs since peak Published Data SOE share of provincial industrial output Coping with Stochastic Measurement Error • Build scales based on multiple indicators. (i.e. Hard and Soft data from the Provincial Competitiveness Index)

Bias Toward SOES by Province

Coping with Stochastic Measurement Error Above all else, be aware of the implications of measurement error and predict the direction and import of the bias. • EIVREG is one technique within STATA, but it is very demanding. - We have to feel incredibly confident that we know the normal measurement error of a particular research instrument. Then, we can correct our estimates by a certain percentage. • Use “Instrumental Variables” approach to purge measures of their error • LISREL and other latent variables estimators simultaneously model errors in multiple variables

Instrumental Variable Regression • Modified form of regression equation • Use a variable that is uncorrelated with error term of regression of y on x, and highly correlated with x. • If so, we can identify the parameter φ.y x Z • Famous examples of this technique involve using rainfall to correct errors in measuring turnout.

Measurement Error: The difference between an observed variable and the variable that belongs in a multivariate regressio