260 likes | 286 Views
Learn about synthetic estimators for small area studies in Ireland to improve precision & accuracy of prevalence estimates. Explore model-based estimators, issues, limits, confidentiality, validation, and future possibilities.
E N D
Anthony Staines DCU Synthetic estimators in Ireland
What are synthetic estimators? • Estimates of something you haven't got • Typically estimates for a small area of something • Making maximum use of what you have
Example • Lung cancer risk • Smoking is a key explanation • Suppose you want to study the geography of lung cancer
What you have • Smoking data from a national survey by age and sex • Small area level data on population and cancer incidence by age and sex
What you can do at once • Estimate prevalence for small areas included in the study • Using the sample in the study
What's wrong with this? • The areas you need may not be included • The estimates will be very imprecise
You can do better • In some obvious ways • And some not so obvious
What you assume • National age and sex specific rates apply in each small area
And so • From these you calculate small area specific prevalence estimates • This is indirect standardisation • Can be done smarter • requiring aggregation properties to hold • Adding in area level covariates (urban/rural etc.)
Can you do better? • Yes
Model based estimators • These have a long history • Many diverse applications • Combine survey data and some kind of 'census data' • 'Census data' is that available for every area of interest
Roughly • Use the survey data to estimate relationships • at the relevant level • between survey covariates • and the census data
Then • Assume the same relationship applies in the other areas
Issues • Modelling can be hard • Remember these are predictive models, not explanatory models • Data not easy to get at the right small area level
Models • models using individual level covariates only • models using area level covariates only • models combining individual and area-level covariates
Limits • Available data • Confidentiality • Complexity of methods, esp. multi-level methods • Validation
Spatial data limits • Have to be able to link survey and census to the same set of small areas • Given the primitive systems in the UK and the nearly non-existent systems in the Republic this is a lot of work • Errors here will lead to biassed estimates
Confidentiality • Need to respect confidentiality of survey respondents • May limit the data available for these purposes • May need to design survey and survey consent process carefully to get good estimates
Modelling • Can become very complex • Clustered survey designs • Survey weights • Variable selection • Model diagnostics
What and where to model • Data may exist at many different geographies • Multi-level models with individual, household, local and regional effects can be considered • GIS might be very useful here for data handling • Not advisable to aggregate covariates at different spatial levels • This is just making a bad embedded synthetic estimator
Validation • Not easy to do, but essential • How do you validate your synthetic estimates? • Cross-validation? • Another survey? • ?
Options • How about • Health Atlas Ireland? • This is a system built for HSE, (led by Howard Johnson) to plan health services • It already has • Maps • Census • HIPE • Mortality data
Census output options • Recently they have developed a very flexible census output system • Uses census data at ED level • Locations of houses • Assumes that all the houses in a DED are exchangeable
Census output options • Allocates census data to any given area • Directly weighted by using the number of households and the ED composition of the desired area
Futures? • Modern design of surveys • Could readily be extended to do SA from almost any survey data where the necessary geographical data have bene collected • Greatly improves value for money of large scale surveys