440 likes | 694 Views
PART 8. Two Stage & Joint Models. SEERMED DATA . Motivation:. End of Life Colorectal Cancer Costs. $500,000. $0. Expenditure. Professional Health-Care Services. HMO. Hospice. FFS. Medicare. Private Ins. Rejected Allowed Co-Pay Deductibles. Data.
E N D
PART 8 Two Stage & Joint Models BIO656--Multilevel Models
SEERMED DATA Motivation: End of Life Colorectal Cancer Costs $500,000 $0 Expenditure BIO656--Multilevel Models
Professional Health-Care Services HMO Hospice FFS Medicare Private Ins. Rejected Allowed Co-Pay Deductibles Data Factors: Need-based Enabling Predisposing Patient – Physician Cancer Diagnosis Claims Terminal-Phase Costs 12 mos Medicare Payments BIO656--Multilevel Models Death
Data Patient – Physician Cancer Diagnosis Medicare Payments Terminal-Phase Costs 12 mos 3 mos BIO656--Multilevel Models Death
SEERMED DATA Motivation: End of Life Colorectal Cancer Costs $500,000 $0 Expenditure BIO656--Multilevel Models
A “Normal” Distribution Density Y BIO656--Multilevel Models
A Complex Distribution Density Y BIO656--Multilevel Models
Complex Distributions Mixtures of Simple Distributions Mixtures-of-Experts Models (MEM) Finite Mixture Models (FMM) Density Y McLachlan, Peel. (2001), FMM BIO656--Multilevel Models Jacobs, Jordan. (1991), MEM, Neural Comp
A simple, two-part mixture $0 1. P(Y>0) $+ 2. E(Y|Y>0) E(Y+) BIO656--Multilevel Models
A Two-Part Model:(Intensity & Size) IS – logit/lognormal 1. logit{ Pr(Yi>0) } = x 2. i.) log10(Yi+) = x + i ii.) i ~ N(0,2) 0. “Tobit” model: Tobin (1958) 1. Selection (hurdle) models: (Amemiya 1984; Heckman 1976) 2. Zero-inflated models (Lambert 1992; Green 1994) 3. Two-part models (Manning 1981; Mullahy 1998) BIO656--Multilevel Models
Another Two-Part Model:(Intensity & Size) IS – Probit/log-Gamma 1. -1{ Pr(Yi>0) } = x 2. i.) log10{E( Yi+)} = x ii.)Yi+~ (,) BIO656--Multilevel Models
A Two-Part Model:The Intensity-Size GLM IS – GLM h1 binary data link function h2 continuous data link function f exponential family w/ dispersion BIO656--Multilevel Models
Multiple Levels 1 0 + BIO656--Multilevel Models
Month 12 Monthly SEERMED Data Month 11 12 10 Month 10 11 12 + 10 11 + + BIO656--Multilevel Models
Multiple Levels 2 0 0 + + Time X X X X X X HMREM1 Month 12 f12 g1 g2 Month 11 f11 a 0 + g1 g2 Month 10 f10 a b g1 g2 b BIO656--Multilevel Models
A 2-Part Model • Intensity: logit( i) = x • Size: • i = x • Yi+ ~ f( i, ) BIO656--Multilevel Models
ui= ~ N, = ai0aa bi0babb A Longitudinal 2-Part Model • Intensity: logit( ic) = x+ zai • Size: • ic = x + zbi • Yi+c~ f( ic, ) 1. Olsen, Schafer, (2001) 2. Tooze, Grunwald, Jones, (2002) 3. Yau, Lee, Ng, (2002) 3. Random Effects: BIO656--Multilevel Models
Data Analysis: 3 General Steps • Exploration • Model Fitting and Estimation • Diagnostics and the greatest of these is… BIO656--Multilevel Models
Uncooked Spaghetti Plot BIO656--Multilevel Models
Month 12 Monthly SEERMED Data Month 11 12 10 Month 10 11 12 + 10 11 + + BIO656--Multilevel Models
Month 10 & Month 11 log10(Costs) Bivariate Point Mass Bivariate Continuous Distb. Univariate Continuous Distbs. Figure 5: Seermed log10 month 1 & 2 Density 0 0 Expenditure 11 Expenditure 10 BIO656--Multilevel Models 5 5
bb aa ba PRISM plot: Month 10 & 11 SEERMED Costs Paired Response Intensity Size Mixture plot BIO656--Multilevel Models
PRISM Matrix: Months 10-12 BIO656--Multilevel Models
Intensity: Probit, Logistic Size: Lognormal, Gamma ui= ~ N, = ai0a bi0bab 2 2 SEERMED MREM • Intensity: h1( ic) = 0+1Obs+2Male+3Obs*Male+ ai • Size: • h2( ic) = 0 + 1Obs + 2Male + bi • Yi+c~ f( ic, ) • Random Effects: BIO656--Multilevel Models
Likelihood: Li() Estimation Whoa. But: Non-Linear Mixed Model (NLMM) • PQL, MCEM, MCMC, … • Adaptive Quadrature – Newton-Raphson Zeger, Karim (1991); Davidian, Giltinan, (1993); Pinheiro, Bates (1995); Mcculloch (1997); Booth et al. (2001); Rabe-Hesketh, et al. (2004) BIO656--Multilevel Models
Estimation: SAS procnlmixed data=SEERMED; parms / data=parms_start; *- 1) logistic: logit{Pr( Y>0 | a )} = Xalpha + a = “eta0” -*; eta0 = alpha0_c + alpha1_c*obs + alpha2_c*male + alpha3_c*obsmale + a; pi_c = exp(eta0) / (1+exp(eta0)); *- 2) log-normal: E( log(Y) | Y>0, b ) = XB + b = “eta1” -*; eta1 = beta0_c + beta1_c*obs + beta2_c*male + b; *- log-likelihood -*; pi=CONSTANT('PI'); if y=0 then ll1 = 0; else ll1=-.5*log(2*pi*sigma**2)-.5*((log10y-eta1)/sigma)**2; ll = (1-Gpos)*log(1-pi_c) + Gpos*log(pi_c) + Gpos*(ll1); model y ~ GENERAL(ll); RANDOM a b ~ NORMAL([0,0],[tau_aa, tau_ba, tau_bb]) SUBJECT=id; run; BIO656--Multilevel Models
Estimation: SAS (better) procnlmixed data=sanfran qpoints=10; parms / data=parms_start; *-logit-*; eta0 = alpha0_c + alpha1_c*obs + alpha2_c*male + alpha3_c*obsmale + a; expeta = exp(eta0); pi_c = expeta / (1+expeta); tau_aa = exp(logtau_a)**2; *-lognormal-*; eta1 = beta0_c + beta1_c*obs + beta2_c*male + b; phi = 10**(log10phi); *std dev of log10(Y+1)|b; tau_bb = (10**(log10tau_b))**2; *- RE Var -*; rho_ba = (exp(2*zrho_ba) - 1) / (exp(2*zrho_ba) + 1); tau_ba = rho_ba*(tau_aa*tau_bb)**.5; *- log-likelihood -*; pi=CONSTANT('PI'); if y=0 then ll1 = 0; else ll1=-.5*log(2*pi*phi**2)-.5*((log10y-eta1)/phi)**2; ll = (1-Gpos)*log(1-pi_c) + Gpos*log(pi_c) + Gpos*(ll1); model y ~ GENERAL(ll); RANDOM a b ~ NORMAL([0,0],[tau_aa, tau_ba, tau_bb]) SUBJECT=id; odsoutput ParameterEstimates = parms_new; run; BIO656--Multilevel Models
SEERMED MREM Results 1 BIO656--Multilevel Models
c MREM Profile Likelihood Plots for 3 Profile ll (alpha3) Probit*- Lognormal Probit*- Gamma Logit- Lognormal Scaled Profile Likelihood Logit- Gamma LR 6 BIO656--Multilevel Models c Intensity model Obs*Male interaction term (3)
SEERMED MREM Results 2 BIO656--Multilevel Models
bb aa ba PRISM plot: Month 10 & 11 SEERMED Costs Paired Response Intensity Size Mixture plot BIO656--Multilevel Models
SEERMED MREM Results 2 But do these models fit?… BIO656--Multilevel Models
Data vs. MREM Models Obs: ,Y BIO656--Multilevel Models Exp: P, L,G
Diagnostic PRISM Matrix: lognormal IS-GLMM Residuals Expected Observed BIO656--Multilevel Models
Diagnostic PRISM Matrix: lognormal IS-GLMM Residuals Expected Observed BIO656--Multilevel Models
Review & Related Work MEM MREM HMREM HMMMM Ideas • Simple Combinations of Simple Models + 0 2. Complex (Multi-Level) Data: BIO656--Multilevel Models Many Models & Many Pictures 12
Data vs. HMREM Models Data vs. HMMMM Models BIO656--Multilevel Models
Review & Related Work • These ideas are not just for Zero-Inflated Data • Latent Variables are useful for “connecting” things BIO656--Multilevel Models
Opportunistic Infection & IDU Always Users Interview: Reported Drug Use Intermittent Users Never Users Interview: Reported No Drug Use Opportunistic Infection Each Line Represents 1 subject’s time in the study BIO656--Multilevel Models Day in Study 6 months prior to 1st interview
Death / Dropout But what about Possible Informative Missingness? Drug Use OI BIO656--Multilevel Models
Jointly Analyze Survival & OIs 1) logistic model: logit{ Pr(OIij | ai) } = 0 + 1SUij + 2SUij*HCuseij + 3AUij+ 4Periodj + ai 2) Survival Model: log{ (t) } = 0 + 1SUij + 2AUij + ai 3) Latent Effects: ai ~ N(0,) Guo & Carlin (2004) BIO656--Multilevel Models
Warning! • But “Buyer Beware” • -- Model Assumptions • -- Identifiability • -- Model Fit • -- Marginalize & Check whenever possible • MLMs require even more due-diligence than usual BIO656--Multilevel Models
References • Mixture Models: • McLachlan, G. J. and Peel, D. (2001), Finite mixture models, John Wiley & Sons. • Jacobs, R. A. and Jordan, M. I. (1991), “Adaptive mixtures of local experts. Neural Computation,” Neural Computation, 3, 79–87. • Two-Part Models: • Tobin, J. (1958), “Estimation of Relationships for Limited Dependent Variables,” Econometrica, 25, 24–36. • Amemiya, T. (1984), “Tobit models: A survey,” Journal of Econometrics, 24, 3–61. • Heckman, J. (1976), “The common structure of statistical models of truncation, sample selection, and limited dependent variables, and a sample estimator for such models,” The Annals of Economic Development and Social Measurement, 5, 475–592. • Lambert, D. (1992), “Zero-inflated Poisson regression, with an application to defects in manufacturing,” Technometrics, 34, 1–14. • Green, W. (1994), “Accounting for excess zeros and sample selection in Poisson and negative binomial regression models,” Working Paper EC-94-10, Department of Economics, New York University • Manning, W., Newhouse, J., Orr, L., Duan, N., Keeler, E., Leibowitz, A., Marquis, M., and Phelps, C. (1981), “A two-part model of the demand for medical care: Preliminary results from the health insurance experiment,” in Health, Economics, and Health Economics, eds. van der Gaag, J. and Perlman, M., pp. 103–104. • Mullahy, J. (1998), “Much ado about two: reconsidering retransformation and the two part model in health economics,” Journal of Health Economics, 17, 247–281. BIO656--Multilevel Models
References • Longitudinal 2-part models • Olsen, M. K. and Schafer, J. L. (2001), “A two-part random-effects model for semicontinuous longitudinal data,” Journal of the American Statistical Association, 96, 730–745. • Tooze, J. A., Gunward, G. K., and Jones, R. H. (2002), “Analysis of repeated measures data with clumping at zero,” Statistical Methods in Medical Research, 11, 341–355. • Yau, K. K. W., Lee, A. H., and Ng, A. S. K. (2002), “A zero-augmented gamma mixed model for longitudinal data with many zeros,” The Australian and New Zealand Journal of Statistics 44, 177–183. • Estimation: • Zeger, S. L. and Karim, M. R. (1991), “Generalized linear models with random effects: A Gibbs sampling approach,” Journal of the American Statistical Association, 86, 79–86. • Davidian, M. and Giltinan, D. M. (1993), “Some general estimation methods for nonlinear mixed-effects models,” Journal of Biopharmaceutical Statistics, 3, 23–55. • Pinheiro, J. C. and Bates, D. M. (1995), “Approximations to the log-likelihood function in the nonlinear mixed-effects model,” Journal of Computational and Graphical Statistics,4, 12–35. • McCulloch, C. E. (1997), “Maximum likelihood algorithms for generalized linear mixed models,” Journal of the American Statistical Association, 92, 162–170. • Booth, J. G., Hobert, J. P., and Jank, W. (2001), “A survey of Monte Carlo algorithms for maximizing the likelihood of a two-stage hierarchical model,” Statistical Modelling: An International Journal, 1, 333–349. • Rabe-Hesketh, S., Skrondal, A., and Pickles, A. (2004), “Maximum likelihood estimation of limited and discrete variable models with nested random effects,” Journal of Econometrics, in press. • Other: • Guo, X. and Carlin, B.P. (2004), ``Separate and Joint Modeling of Longitudinal and Event Time Data Using Standard Computer Packages," The American Statistician, 58 16--24. BIO656--Multilevel Models