370 likes | 613 Views
An Introduction to Quantile Regression: Modeling Group Disparity by Comparing Entire Distributions. Xiaozhou Wang Graduate Fellow Quantitative Social Science Initiative Penn State University (xzw108@psu.edu). Motivation. Group disparity Gender Race Class Immigration…
E N D
An Introduction to Quantile Regression: Modeling Group Disparity by Comparing Entire Distributions Xiaozhou Wang Graduate Fellow Quantitative Social Science Initiative Penn State University (xzw108@psu.edu)
Motivation • Group disparity • Gender • Race • Class • Immigration… • How to model group disparity • Descriptive statistics • Regression model
Critiques of the mean-comparison approach “It is difficult to understand why statisticians commonly limit their inquiries to Averages, and do not revel in more comprehensive views. Their souls seem as dull to the charm of variety as that of the native of one of our flat English counties, whose retrospect of Switzerland was that, if its mountains could be thrown into its lakes, two nuisances would be got rid of at once”. — Sir Francis Galton Natural Inheritance (1889): 62-3.
Are distributions identical when they have equal means and variances? Normal distribution Gamma distribution
Kernel density estimation The basic idea (Silverman 1986) Or, where • K(x) is the kernel function, which satisfies: • Very often, K will be a symmetric probability density function. • h is the smoothing parameter (bandwidth).
Let’s compare the distributions. • Both of the two earnings distributions changed from 1996 to 2006. • For native whites, there is a growing share of population who make about 80,000 or more. • Hispanic immigrants also made some progress – less people are found at the lower end of their earnings distribution.
Regression model The OLS model • The conditional mean of the outcome variable is expressed as a product of characteristics and the "prices" of these characteristics. • The estimated model provides “a grand summary for the averages of the distributions corresponding to the set of Xs” (Mosteller and Tukey 1977).
Application with earnings data from CPS • Let’s model the log earnings of the foreign-born Hispanic immigrants as a linear combination of the following covariates. • Experience • Experience^2 • Year of education • Living in the south • Living in a non-metropolitan area • Being married • Data: CPS March supplements in 2006 • A monthly survey of about 50,000 households • The primary source of detailed information on income and work experience in the United States • The best source to study labor market outcome of immigrants so far (Farley and Alba 2002)
Data • Sample of analysis • Non-institutional population • Men of Hispanic origin • Aged at 25-64 by the time of survey interview • Worked for at least one week during the last calendar year
Can we generalize our findings to the entire distribution? • “Just as the mean gives an incomplete picture of a single distribution, so the regression curve gives a correspondingly incomplete picture for a set of distributions” (Mosteller and Tukey 1977).
The solution – Quantile Regression • Now we model the θth quantile of the outcome variable (logged earnings in this case) as a product of the characteristics and premiums to the characteristics. That is, . • is a vector of coefficients (or premiums) • X is a vector of the covariates (characteristics) • For any given θin (0,1), can be estimated by minimizing in β (Koenker and Bassett 1978) , where .
Quantile regression (QR): application Foreign-born Hispanics Native whites
The observed group disparity PDF • To what extent is the observed group disparity due to: • A. difference in covariates? • B. difference in coefficients ?
Use QR to estimate the conditional distribution given the covariates In Bassett and Koenker (1982; 1986), it is established that under regularity conditions, the estimated conditional quantile function is a strongly consistent estimator of the population quantile function. Probability Integral Transformation Theorem: if U is a uniform random variable on [0,1], then F-1(U) has distribution F.
Use QR to estimate the conditional distribution given the covariates • We estimate the quantile of earnings distribution given the observed characteristics of the respondents, where are drawn from a uniform (0, 1) distribution. • By the Probability Integral Transformation Theorem, constitute a random sample from F(Y|X), the (estimated) conditional distribution of earnings given X (Albrecht et al 2009: 384).
From F(Y|X) to F(Y) • According to Machado & Mata (2005), to ‘integrate X out’ and get a sample from the marginal earnings distribution, we can generate a random sample of size m with replacement from the rows of X, denoted by . Now form a random sample of size m from the marginal earnings distribution for the respondents.
Marginal distributions Foreign-born Hispanics Native whites
Comparing the marginal distributions PDF Quantile function
The questions yet to answer • To what extent the observed group disparity is due to: • A. difference in covariates? • B. difference in coefficients? • Counterfactual analysis • A. when group A have same coefficients as group B • B. when group A have same covariates as group B
The counterfactual analysis • Once we have the marginal earnings distributions for each group, we can obtain the counterfactual densities by weighting the marginal density of wages with appropriate weights. • The counterfactual earnings distribution for the Hispanic immigrants would be fully characterized by if the returns to characteristics are identical to the “prices” for native whites. • The counterfactual earnings distribution for the Hispanic immigrants given they share the same characteristics as the native do in the same year can be characterized by .
The counterfactual earnings distribution for the foreign-born Hispanics With the covariates of the native whites With the coefficients of the native whites
Quantile functions for the counterfactuals (foreign-born Hispanics) With the covariates of the native whites With the coefficients of the native whites
Quantile functions for the counterfactuals (foreign-born Hispanics) With the covariates of the native whites With the coefficients of the native whites
Earnings assimilation of Hispanic immigrants in the U.S. Pessimistic view Optimistic view • Segmented assimilation model (López and Stanton-Salazar 2001; Portes and Rumbaut 2001; Zhou 1997) • Racial disadvantaged model (Glazer and Moynihan 1963) • Declining quality of immigrants (Borjas 1995, 1999; Borjas and Friedberg 2007) • Classical assimilation theory (Robert Park (1926); Irving Child (1943); Oscar Handlin (1951); Milton Gordon (1964)) • Institutional perspective (Alba and Nee; Alba and Farley)
Perspectives about economic assimilation of the Hispanic immigrants • Pessimistic view • Segmented assimilation model (López and Stanton-Salazar 2001; Portes and Rumbaut 2001; Zhou 1997) • Declining employment in manufacturing • Racial discrimination against ethnic minorities • Deteriorated urban infrastructure – in particular ,the public education system may not provide advanced training for immigrant students to move upward • Transnationalism – it is easier to maintain strong tie to immigrants’ own community • Less-educated second generation may look down upon the job their parents take and then join the “underclass” – that is, a “downward assimilation” trajectory
Perspectives about economic assimilation of the Hispanic immigrants • Pessimistic view • Racial disadvantaged model (Glazer and Moynihan 1963) • The host society is highly stratified according to race and ethnicity and thus the earning differentials associated with nativity is just a reflection of the ethnic inequality in earnings • Discrimination may undermine immigrants’ motivation to move upward (Bean & Stevens: 99) • Declining quality of immigrants (Borjas 1995, 1999; Borjas and Friedberg 2007) • The education gap between immigrants and native whites has been widen by the 1990s • Declining quality of immigrant cohorts
Perspectives about economic assimilation of the Hispanic immigrants • Optimistic view • Classical assimilation theory (Robert Park (1926); Irving Child (1943); Oscar Handlin (1951); Milton Gordon (1964)) • Immigrants experience steady improvement in socioeconomic outcomes over time, within and across generations • Institutional perspective (Alba and Nee; Alba and Farley) • Legal, social and economic changes may now guarantee the secondgeneration protections and opportunities • Labor markets have been extremely tight throughout the 1990s, which “should have benefited the bottom of the labor queue”(Alba and Farley: 677)
Earnings assimilation of Hispanic immigrants in the U.S. • Assimilation as a product of assimilation in characteristics and assimilation in returns to characteristics
Quantile functions for the counterfactuals (foreign-born Hispanics) With the covariates of the native whites With the coefficients of the native whites
Quantile functions for the counterfactuals (2nd gen. Hispanics) With the covariates of the native whites With the coefficients of the native whites
Summary of findings Foreign-born Hispanics 2nd gen. Hispanics • Assimilation in covariates will increase the earnings of immigrants across the whole distribution. • Assimilation in coefficients will decrease the earnings of those at the bottom of the distribution, while benefit those who have higher earnings. • Assimilation in covariates and in coefficients will increase of the earnings of the immigrants. However, it is the assimilation in covariates that plays a more important role in reducing the earnings gap across the whole distribution.
Conclusion Foreign-born Hispanics 2nd gen. Hispanics • Racial disadvantage model and the declining quality of immigrants argument do not hold for the immigrants at the lower end of the earnings distribution. • Downward assimilation does not hold for the immigrants since their difference with the native whites is mainly due to difference in covariates.
Some References on Quantile Regression • Albrecht, J., A. Van Vuuren, et al. (2009). "Counterfactual distributions with sample selection adjustments: Econometric theory and an application to the Netherlands." Labour Economics 16(4): 383-396. • Bassett, G. W. and R. W. Koenker (1986). "Strong consistency of regression quantiles and related empirical processes." Econometric Theory 2(2): 191-201. • Bassett Jr, G. and R. Koenker (1982). "An empirical quantile function for linear models with iid errors." Journal of the American Statistical Association 77(378): 407-415. • Machado, J. and J. Mata (2005). "Counterfactual decomposition of changes in wage distributions using quantile regression." Journal of Applied Econometrics 20(4): 445-466. • Mosteller, F. and J. W. Tukey (1977). "Data analysis and regression. A second course in statistics." • Silverman, B (1986). "Density estimation for statistics and data analysis. " Monographs on Statistics and Applied Probability. Chapman and Hall, New York.
Software • R: Package ‘quantreg’ by Roger Koenker