1 / 40

An Application of Empirical Bayes Estimation Technique on Disease Data Sharmistha Banerjee

Objective . The objective of this technique is to improve the reliability of disease maps by stabilizing the rates through statistical adjustment which recognizes that rates based on small populations are less reliable than rates based on large populations.. . . . . . What is the problem of small numbers in disease mapping?.

thi
Download Presentation

An Application of Empirical Bayes Estimation Technique on Disease Data Sharmistha Banerjee

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. An Application of Empirical Bayes Estimation Technique on Disease Data Sharmistha Banerjee

    2. Objective The objective of this technique is to improve the reliability of disease maps by stabilizing the rates through statistical adjustment which recognizes that rates based on small populations are less reliable than rates based on large populations.

    3. What is the problem of small numbers in disease mapping? Disease mapping in areas with small populations may produce highly unstable estimates of incidence rates. Devine, Louis and Halloran (1994, p.622) note that, “These rates may be highly unstable in that the addition or deletion of one or two events can cause drastic changes in the observed value. Geographical patterns in the resulting maps are likely to reflect some unknown combination of true trends in underlying risk and random variation caused by the instability of the observed rate”.

    4. The traditional approach to solving the small number problem In small area analysis with small populations the tradition is to borrow strength from time and space. The more rare the event, the more time or space or both must be used to estimate a stable rate.

    5. In the next few slides, we show how the traditional approach works by borrowing strength from space and time

    6. We “borrow strength from space” by “spatially aggregating” data As the size of the area over which we spatially aggregate data, increases, so the “spatial scale” of the map changes. The following maps show how the spatial pattern of infant mortality rates is sensitive to spatial scale.

    7. IMR 1989-92 on 0.8 mile filter On this map, infant deaths and births were spatially aggregated for circles of 0.8 mile radius on grid points 0.4 mile apart

    8. IMR 1989-92 on 1.2 mile filter On this map, infant deaths and births were spatially aggregated for circles of 1.2 mile radius on grid points 0.4 mile apart

    9. Results As we increase the filter size from 0.8 to 1.2, we are including more data in the filter area of each grid and thus we are borrowing statistical strength from space Increasing spatial aggregation smoothes the map, which suppresses more local and detailed spatial variations

    10. Deficiencies of this approach We have compromised our interest in local geographic details for our interest in obtaining a reliable estimate. Also, when we increase the filter size to include more data, we assume that the true distribution of the rates do not change through space. This assumption is very weak, as in reality distribution of the rates changes due to several factors( for example physical, environmental, economical and policy related factors).

    11. We “borrow strength from time” by “temporally aggregating” data As the number of the time interval over which we temporally aggregate data, increases, the “spatial scale” of the map changes. The following maps show how the spatial pattern of infant mortality rates is sensitive to spatial scale.

    12. In the next few slides we show the maps that borrows strength from tempora aggregation.

    13. Infant mortality rate for 1990 on 0.8 mile filter Infant mortality rates for this map are calculated by using birth and death data for 1990.

    14. Infant mortality rate for 1991 on 0.8 mile filter Infant mortality rates for this map are calculated by using birth and death data for 1991.

    15. Infant mortality rate for 1989-92 based on 0.8 mile filter Infant mortality rates for this map are calculated from the aggregated birth and death data of 1989, 90, 91 & 92.

    16. Result As we add more and more years of data to the filter area of each grid point, we borrow more and more strength from time. Assuming that in reality there is no temporal change in IMR, the map should approach closer to the map of the true distribution of IMR. This follows from from the theory of large numbers and the central limit theorem which says that as we increase sample size the mean of the sample means converges to the true mean and the sample distribution approaches the true distribution. Increasing temporal aggregation smoothes the map. It suppresses more detailed temporal variations.

    17. Deficiency of this traditional approach We have compromised our interest in temporal details for our interest in obtaining a reliable estimate. Also the assumption that true distributions do not change through time is very weak, as in reality distribution of the rates changes due to endogenous and exogenous factors.

    18. Bayesian adjustment At this point, it seems Bayesian adjustment is a better technique to approach the small number problem. The Bayesian approach permits us to keep our interest in deriving a local, geographic, estimate, by using a statistical technique which, as Kennedy-Kalafatis (1995, p.1274) noted, “seeks to estimate a rate that has been adjusted to reflect the differing contributions of ‘true’ variation and the component of overall variation due to random chance.”

    19. A different tradition: Bayesian approach The Bayesian approach to the problem of estimation takes into account prior knowledge of the process that generates the parameters of interest. Consider ? represents the underlying and unobservable distribution of a random variable across an area. The probability distribution function of ?, p(?) is called the prior, where ? denotes the numerical value of ?. Now suppose X is the actual observation across the area conditional upon ?. After the observation X has been made the conditional p.d.f., c(? /x ) is called the posterior p.d.f of ?.

    20. Bayes estimators The estimator of ? is then found by selecting the function of the observed rate X, d(x) , such that the conditional expectation of the squared difference between d(x) and ? is minimized. The Bayes solution then gives the function: d(x)=E(?/x), the mean of the conditional distribution of ? , given X=x, (Hogg and Craig, 1995, p.365).

    21. Application of Bayesian approach to find a stable set of IMR In our particular case, we assume ? js are the unknown, underlying infant mortality rates (IMRs), normally distributed across i areas, where j=1,2....k. We also assume that ? j has a known prior distribution, p(? ) ~ N(? 0 , ?02). Now let us assume that the observed IMR Xj is drawn from a distribution N(? , ?2), where ? is an unknown parameter and ?2 is known.

    22. Total variations in the observed rates The observed rates Xj are spread across the grid points 0.4 miles apart in the circles of 0.8 miles radius in the Des Moines area for the period 1989-94 and the Xj s are assumed to have two components of variation. These two components are well defined in Devine, Louis and Halloran (1994, p.623) and in this case, these components can be interpreted as: Spatial Component of variation: The rate varies from one filtered area to another. A variation that we want to illustrate on the map. Temporal Component of variation: The rate for the same filtered area varies from time to time about the underlying rate that we wish to reduce.

    23. The implicit assumption is therefore, if the rate within an area varies about the expected rate then this variation can be accrued to sampling variation which may also be interpreted as the temporal variation. Here we draw k random samples of the observed rates Xi , i=1,2,3 ...n, where n is the number of years or sample size.

    25. Bayes estimator ?b is the Bayes estimator and is the weighted average of the mean of the observed rates and the mean of the prior. The weight is the ratio of prior and conditional variance.

    26. The Empirical Bayes estimator However, in the Bayesian approach the parameters of the prior, that is ?0 and ?02 may be unknown. In the empirical Bayes approach, we estimate ?0 and ?02 from the parameters of empirical observations. In our case these parameters are given by:

    27. The empirical Bayes estimators

    28. The empirical Bayes estimator is thus the weighted average of the observed rate and an estimate of the prior mean. The weight is an estimated ratio of the conditional and prior variances. If an observed rate is stable, that is, if the area-specific population sizes are large, the estimator is close to the observed rate. If, on the other hand, the area-specific population size is small the empirical Bayes estimator shrinks towards the estimates prior mean (Devine, Lois and Halloran, 1994, p.625).

    29. The Empirical Bayes Adjustment has long been used in disease mapping. So far the approach has been used only on tables which show disease rates for small areas, which when mapped, are shown as choropleth maps. But the same method can be used to adjust the rate for any area and if the areas chosen are overlapping spatial filters, then the adjusted values can be mapped as a continuous value when computed for a fine grid spread across the region. The advantage of this approach is that the method can be used to show variations on a very fine scale and thus pinpoint the areas where the incidence rate is significant.

    30. The Following Slides shows choropleth maps for both Bayes adjusted and unadjusted.

    31. Unadjusted IMR for census tracts in Des Moines

    32. Bayes adjusted IMR for census tracts in Des Moines

    33. The next two slides shows Bayes adjusted and unadjusted maps on spatially filtered disease data.

    34. Unadjusted IMR for 1989-92 on 0.8 mile filter

    35. Bayes adjusted IMR for 1989-92 on 0.8 mile filter

    36. In the next slide, a small section of the spread sheet is shown that can be used to compute the empirical Bayes estimator. The column named ID gives the ID of grid points for which the estimator is computed. The next 4 columns shows IMR for 4 years, the 6th column is the sum of IMR for the years 1989-92. The 7th column gives temporal variation for each ID The 8th column gives the estimate of the prior variance The 9th column calculates the weight from the equation of empirical Bayes estimator The 10th column gives the empirical Bayes estimator for the IMR aggregated over the period 1989-92.

    38. Deficiencies of the empirical Bayes adjustment technique and some efforts to overcome them. To derive the estimator, we make some assumptions about the prior and conditional distributions, to which the estimates are very vulnerable. If we check the validity of the distributional assumptions, then it is possible to produce more reliable estimates. Empirical Bayes estimators are often criticized for being over smoothed within a short band with most of the values being close to the estimate of the prior mean. In constrained empirical Bayes estimation technique the function of squared difference, d(x) is minimized subject to some constraint to take care of the over smoothness of rates (Ghosh, 1992).

    39. References Devine, O., Louis, T. and Halloran M. ‘Empirical Bayes methods for stabilizing incidence rates before mapping’, Epidemiology, 5, 622-630 (1994). Ghosh, M. ‘Constrained Bayes estimation with applications’, Journal of American Statistical Association, 87, 533-540 (1992).

    40. References Hogg, R. and Craig, A. Introduction to Mathematical Statistics, Prentice-Hall, Inc.,New Jersey,1995, Chapter 8, pp.363-372. Kennedy-Kalafatis, S. ‘Reliability-adjusted disease maps’, Social Science and Medicine, 41, 1273-1287 (1995). Rushton, G. and Lolonis, P. ‘Exploratory spatial Analysis of birth defect rates in an urban population’, Statistics in Medicine,15, 717-726 (1996).

More Related