300 likes | 635 Views
Advanced Topics in Regression. BUSI 6220: Optional Material. Quantile Regression Analysis of Causality Mediation Analysis Hierarchical Linear Modeling Compiled by Nick Evangelopoulos, 2013. Part 1: Quantile Regression. Motivation for Quantile Regression. Problem
E N D
Advanced Topics in Regression BUSI 6220: Optional Material • Quantile Regression • Analysis of Causality • Mediation Analysis • Hierarchical Linear Modeling Compiled by Nick Evangelopoulos, 2013
Motivation for Quantile Regression Problem ANOVA and regression provide information only about the conditional mean. More knowledge about the distribution of the statistic may be important. The covariates may shift not only the location or scale of the distribution, they may affect the shape as well. Solution Quantile regression models the relationship between X and the conditional quantiles of Y given X = x
Quantile Definition • Definition: Given p ∈ [0, 1]. A pth quantile of a random variable Z is any number ζp such that Pr(Z< ζp ) ≤ p ≤ Pr(Z ≤ζp ). The solution always exists, but need not be unique.Ex: Suppose Z={3, 4, 7, 9, 9, 11, 17, 21} and p=0.5 then Pr(Z<9) = 3/8 ≤ 1/2 ≤ Pr(Z ≤ 9) = 5/8 So, the 50th percentile is equal to 9
Quantile Regression • A family of conditional quantiles of Y given X=x. • The median regression line is also the OLS regression line. The other quantile functions are solutions to a set of linear programming problems
Quantile Regression A scatter of daily high temperature in Sydney. The red line is the 45-degree line
Quantile Regression Quantiles at .9, .75, .5, .25, and .10. Given yesterday’s temperature, today’s temperature has an expected distribution which is non-symmetrical
Quantile RegressionEstimation • The quantile regression coefficients are the solution to • The k first order conditions are
Quantile RegressionCoefficient Interpretation • The marginal change in the Θth conditional quantile due to a marginal change in the jth element of x. There is no guarantee that the ith person will remain in the same quantile after her x is changed.
Quantile RegressionBibliography • Koenker and Hullock (2001), “Quantile Regression,” Journal of Economic Perspectives, Vol. 15, Pps. 143-156. • Buchinsky (1998), “Recent Advances in Quantile Regression Models”, Journal of Human Resources, Vo. 33, Pps. 88-126. • www.econ.uiuc.edu/~roger • http://Lib.stat.cmu.edu/R/CRAN
Quantile Regression in SAS Optional Reading: Colin (Lin) Chen, An Introduction to Quantile Regression and the QUANTREG Procedure, SUGI30, Paper 213-30
Part 2: Analysis of Causality • For more information: BUSI 6280 • The material presented here is based on a paper by Josef Brüderl (University of Mannheim, Germany)
Methods for analysis of causality exploit a data structure of multi-dimensional longitudinal data, which is typically described in the statistics and econometrics literature as Panel Data • Panel data is defined as a combination of cross-section data, where data on one or more variables are collected at the same point in time, and time-series data, where data are collected at regular time intervals. • Analysis of panel data will be performed using the TSCREG procedure in the statistical package SAS (Allison 2005; Mohd Nor & Maarof 2007) and the xtreg procedure in the statistical package Stata (Brüderl 2005). Panel Data
Allison, P.D. (2005). Fixed Effects Regression Methods for Longitudinal Data Using SAS. SAS Press. • Brüderl, J. (2005). Panel Data Analysis. University of Mannheim, http://www2.sowi.uni-mannheim.de/lsssm/veranst/Panelanalyse.pdf (accessed October 15, 2012) • Mohd Nor, A. H. S., & Maarof, F. (2007). “Panel Data Analysis Using SAS”. Proceedings of the 21st Annual SAS Malaysia Forum, 5th September 2007, Kuala Lumpur. • Halaby, C. (2004). Panel Models in Sociological Research. Annual Review of Sociology, 30: 507-544. • Wooldridge, J. (2002). Econometric Analysis of Cross Section and Panel Data. MIT Press. • Wooldridge, J. (2003). Introductory Econometrics: A Modern Approach. Thomson. Chapters 13, 14. • Baron and Kenny (1986) References
Part 3: Mediation Analysis • For more information: BUSI 6280, EPSY 6270 • The material presented here is based on Wikipedia
Mediation is a hypothesized causal chain in which one variable affects a second variable that, in turn, affects a third variable. The intervening variable, M, is the mediator. It “mediates” the relationship between a predictor, X, and an outcome Y. • a and b: direct effects of X on M and M on Y, resp. • c’: direct effect of X on Y after accounting for M Mediation Models c’ a b X M Y
The Baron and Kenny (1986) approach is not the best, but many researchers are still using it • STEP 1: Conduct a simple regression analysis with X predicting Y to test for path c alone • c is the direct effect of X on Y, without taking into account M. This is not the same as c’ on the previous slide! Baron and Kenny steps c X M Y
STEP 2: Conduct a simple regression analysis with X predicting M to test the significance of path a alone Baron and Kenny steps a X M Y
STEP 3: Conduct a simple regression analysis with M predicting Y to test the significance of path b alone • The purpose of Steps 1-3 is to establish that zero-order relationships among the variables exist. If one or more of these relationships are non-significant, researchers usually conclude that mediation is not possible or likely • Assuming there are significant relationships from Steps 1 through 3, proceed to Step 4. Baron and Kenny steps b X M Y
STEP 4: Conduct a multiple regression analysis with X and M predicting Y • In Step 4, some form of mediation is supported if the effect of M (path b) remains significant after controlling for X. If X is no longer significant when M is controlled, the finding supports full mediation. If X is still significant, the finding supports partial mediation. Baron and Kenny steps c’ b X M Y
STEP 1: Conduct a multiple regression analysis with X and M predicting Y: Y = b0 + b1X + b2M + e • STEP 2: Conduct a simple regression analysis with X predicting M: M = b3 + b4X + u • STEP 3: Compute the indirect effect as bindirect = (b2)(b4) • Significance is best determined using bootstrapping Sobel steps c’ b a X M Y X M Y
The Structural Equation Modeling (SEM) approach is considered the best for testing mediation effects. In SEM, a single mediation model is tested. • Full mediation and partial mediation models can be compared by fitting both as alternative models. The model with the highest fit statistics is the more appropriate SEM approach c’ a b a b X M Y X M Y Full mediation Partial mediation
Baron, R.M. & Kenny, D.A. (1986). The Moderator-Mediator variable distinction in Social Psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182. • MacKinnon, D.P. (2008). Introduction to statistical mediation analysis. Mahwah, NJ: Erlbaum. • Sobel, M. E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. In S. Leinhardt (Ed.), Sociological Methodology (pp. 290-312). Washington DC: American Sociological Association. References
Part 4: Hierarchical Linear Modeling • For more information: BUSI 6480, EPSY 6230 (EPSY offered at the UNT College of Education)
Multilevel models are particularly appropriate for research designs where the data for participants is organized at more than one level • Analysis of Covariance (ANCOVA) include nested designs • Individuals nested within groups • Companies nested within industries Multilevel Models