400 likes | 503 Views
A Semiparametric Approach to Forecasting US Mortality Age Patterns. Presenter : Rong Wei 1 Coauthors: Guanhua Lu 2 , Benjamin Kedem 2 and Paul D. Williams 1 1 National Center for Health Statistics (NCHS) 2 Math Dept. University of Maryland, College Park. Outline. Background Project tasks
E N D
A Semiparametric Approach to Forecasting US Mortality Age Patterns Presenter: Rong Wei1 Coauthors: Guanhua Lu2, Benjamin Kedem2 and Paul D. Williams1 1National Center for Health Statistics (NCHS) 2Math Dept. University of Maryland, College Park NCHS July 11, 2006
Outline • Background • Project tasks • Model Introduction • New Approach: Semiparametric model • Mortality forecasting: US, small states • Comparison with Lee-Carter Model • Conclusion NCHS July 11, 2006
Background • NCHS publishes race-gender specific life tables for each of 50 states plus DC decennially; • Out of 300+ tables, about 1/5 of tables could not be published due to small numbers of deaths in a short time period; • Mortality data have been well documented in NCHS for every year, state, race-gender population since 1968. NCHS July 11, 2006
An example of life tables NCHS July 11, 2006
Mortality age patterns: data from US and large states NCHS July 11, 2006
Mortality in small states: one year data vs. 30 years historical data NCHS July 11, 2006
Another view of the data: time series at each age NCHS July 11, 2006
The tasks • To solve the insufficient data problem, data from 30+ years are used to model the age-specific death pattern for small areas; • Select a time series model which gives better control for time effect and random error in multiple time series with short prediction; • Project mortality curves (one year ahead vs. many years prediction) in small areas with historical data and robust statistical methodology. NCHS July 11, 2006
Introduction to mortality forecasting models: • US mortality forecasting model by Lee and Carter (1992): • Ln( mx,t ) = ax + bx kt + ex,t • kt = kt-1 + c + et • The LC model is based on principle components. It searches for the 1st PC in n dimensional time series data and solves for the age and time parameters by singular value decompositions. • The LC model explains 60 – 93% of total dimensional variance (Girosi and King). For some populations, the 1st PC may be insufficient to explain the variance in high-dimensional data. NCHS July 11, 2006
New Approach: Semiparametric model • Semiparametric approach • Short mortality time series used from 1968 to 1998 for consistency of data collection • Combining more information from age neighborhood • Centered death rates • Emphasis on predictions of incoming years NCHS July 11, 2006
Semiparametric model NCHS July 11, 2006
Semiparametric model in Time Series NCHS July 11, 2006
Parameter estimation from pooled sample NCHS July 11, 2006
Maximum likelihood function NCHS July 11, 2006
Reference sample distribution NCHS July 11, 2006
Application on US mortality forecasting Data: • Mortality data from death certificates filed in state vital statistics offices and reported to NCHS from 1968 – 2002; • Population data from decennial census and interpolated between two adjacent decennial census • Age-specific mortality rates were calculated for each race-gender demographic population. NCHS July 11, 2006
Cont’d • 85 age-specific time series for ages 1,…, 85, where the age category 85+ includes age 85 and above; • For each age, time series is from 1970 to 2001, 2002 data are available for comparison with the prediction result; • All the 85 time series are categorized into 5 year age groups 1-5, 6-10, ..., 81-85+, a total of 17 groups; • Death rates at each age are rescaled by centralized from the averages over years; • Residuals from the time series “in the middle” of each group are taken as the reference. NCHS July 11, 2006
Mortality age-patterns across four decades: 1970 – 2000: US National Vital Statistics NCHS July 11, 2006
Age-specific time series for log-death rates NCHS July 11, 2006
Log-death rates centered by rescaling from age-specific averages over years NCHS July 11, 2006
Centered age-specific time series for log-death rates NCHS July 11, 2006
Mortality forecasting procedure NCHS July 11, 2006
Procedure cont’d NCHS July 11, 2006
Fit of TS & histogram of residuals NCHS July 11, 2006
Comparison for single age NCHS July 11, 2006
Comparison of age groups 32-34 & 31-35 Combining more information increases the fit of density curves NCHS July 11, 2006
Empirical (solid) and estimated (dot) CDF NCHS July 11, 2006
One-year-ahead predictive distribution NCHS July 11, 2006
Predicted mortality curves from LC & SP models in 2002 NCHS July 11, 2006
Predicted mortality curves for age group 1-30 NCHS July 11, 2006
Predicted mortality curves for age group 31-50 NCHS July 11, 2006
Predicted mortality curves for age group 51-70 NCHS July 11, 2006
Predicted mortality curves for age group 71-85 NCHS July 11, 2006
Mean Square Error of prediction from Semiparametric model (SP) & Lee-Carter (LC) • MSE for total population • MSE for Female NCHS July 11, 2006
Semiparametric Time Series Estimate: Mortalities in Small Populations NCHS July 11, 2006
Semiparametric Time Series Estimate: Mortalities in Small Populations NCHS July 11, 2006
Conclusion • Historical data fitted by the time series - semiparametric model can help when estimating mortality rates in small areas with insufficient observations; • Compared to LC model, the semiparametric method reduces the overall MSE appreciably due to better modeling the predictive probabilities with conditional distributions; • This is a non-Bayesian method. The Bayesian method will result in relatively large prediction interval, so further than one year ahead prediction could apply. NCHS July 11, 2006
Alternative ways to solve the problem of estimating mortalities for small areas In addition to the way of borrowing strength from historical data, other alternatives include: • Borrow strength from national mortality data; • Borrow strength from geographic neighborhood data; • Borrow strength from other area data with similarities in cause of death. NCHS July 11, 2006
Mortality curve - black male, IA 0 -2 -4 Log (q) Estimated -6 State observation National observation -8 -10 0 10 20 30 40 50 60 70 80 90 Age Small area estimation by Bayesian: borrow national data strength NCHS July 11, 2006