260 likes | 368 Views
M o S e S Starts for the Promised Land. Andy Turner Outline Introduction Population Modelling Progress Next Steps Feedback. Introduction. A religious story? Lost in the Desert? Our heading? The Promised Land SIM-UK GeoSIM.
E N D
MoSeS Starts for the Promised Land Andy Turner Outline • Introduction • Population Modelling Progress • Next Steps • Feedback Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
Introduction Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
A religious story? • Lost in the Desert? • Our heading? • The Promised Land • SIM-UK • GeoSIM Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
Modelling and Simulation for e-Social Science (MoSeS) Mark Birkin, Martin Clarke, Phil Rees, Andy Turner, Belinda Wu (School of Geography) Haibo Chen (Institute for Transport Studies) Justin Keen (Institute for Health Sciences) John Hodrien, Paul Townend, Jie Xu (School of Computing) Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
MoSeS is a Node of the National Centre for e-Social Science http://www.ncess.ac.uk • NCeSS aims to investigate, promote and support the use of eScience in social science research Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
eScience • Based on Grid Computing and collaboration • What is Grid Computing? • Many definitions… • A move towards ubiquitous computing • A service/protocol for sharing Information Technology (IT) resource over the Internet • Computer scientists are building the next generation of computational infrastructure • ‘[The Grid] intends to make access to computing power, scientific data repositories and experimental facilities as easy as the Web makes access to information.’ (Tony Blair, 2002) Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
eScience • Grid Computing Environments and The Grid • Enhance capabilities for IT resource sharing for research • Is about providing easy and secure access to massive computational resources, software and data promoting collaborative working of virtual organisations • e-Social Science is eScience targeted and geared for applications more specific to social science including a major part of geography Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
MoSeS Aims and Objectives • Raise awareness of eScience and eResearch • Develop practical geographical e-Social Science applications demonstrating the potential of Grid Computing • Model the UK human population at individual and higher organisational levels • households, communities, regions • disparate and/or geographically diffuse organisations and society • service orientated government • Develop and package a suit of modelling tools which allows specific research and policy questions to be addressed with demonstrator applications for: • Health • Business • Transport Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
MoSeS Initial Tasks • Develop methods to generate individual human population data for the UK from 2001 UK human population census data • Develop a Toy Model • Dynamic agent based microsimulation modelling toolkit and apply it to simulate change in the UK • Develop applications for • Health • Business • Transport Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
MoSeS Challenges • Grid enabling the data and tools • Visualisation • Google Earth • Computer Games • Collaboration • Retaining a problem focus • Design and Development Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
MoSeS Current Parallel Developments • Belinda Wu is working on the applications beginning with a Toy Model for Leeds • Paul Townend is working on Grid Enabling • Andy Turner is focussing on the population modelling • The MoSeS team are meeting regularly and plan a launch some time next year when we hope to have something impressive to show off to NCeSS colleagues and invited guests from the eScience community, government and business Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
MoSeS Human Population Model • Current focus on the contemporary situation looking forwards over the next 25 years • Primarily data wanted for individuals grouped into households • Need to develop a method to synthesise and enrich data since available census and social survey data is not sufficient in coverage and detail • A method was outlined in the proposal • This is being implemented and results are being tested Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
Population Modelling Method • To select a fitting set of individual records from the 2001 UK Population Census 3% Individual Sample of Anonymised Records (ISAR) to represent the individuals for regions given by 2001 UK Population Census Area Statistics (CAS) • Initial focus is for regions called Output Areas • Smallest Census Output Areas • Typically about 300 people, 100 households • Begin with Leeds and scale up to the UK Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
Combination • Given the population (p) of an Output Area we want to select a sub-sample of this size from the n = 1843525 records in the ISAR • The general formula for finding the number of permutations of size p taken from n objects npPermutations is: • Approximately np Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
Computation • Number of potential solutions too great to find the best fitting solution by a brute force search? • Probably, yes, even using all the computational power of The Grid • Interestingly the number of potential solutions is even greater for larger regions than Output Areas (although there are less of them) • Fortunately we are only interested in specific types of solution and can constrain our search • For some criteria hard constraints are appropriate and for other variables optimisation is the key within these constraints Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
Constraints • What can we constrain to? • There are limits • The more detailed the constraint criteria the less likely it can be met • The ISAR is only a 3% sample • Specific CAS tabulations • The aggregations of variables are bespoke • Beware of errors especially systematically introduced disclosure control measures • Census data are estimates and contain unknown level of error • What is most important to ensure is right? • Age/Gender profile • Number of Household Reference People • Household Composition • Social Class • Health status etc… Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
Getting to Grips with ISAR and CAS data • 2001 UK Census data is unusual (like most census data) • Details are lost by aggregation and accuracy is deliberately worsened via the application of disclosure control measures • This is done for confidentiality reasons and as users we are forced to appreciate this • On the one hand this generates jobs, on the other hand, it renders census data almost useless for supporting certain applications • Details on UK Census data including ISAR and CAS are available via • http://www.statistics.gov.uk/census/ • Usefully 2001 CAS tables that do not currently exist can be commissioned • There is an application procedure for gaining access to Controlled Access Microdata Sample (CAMS) records from the 2001 Census • The data is supposedly better • It will be hard for us to use due to the way it is controlled Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
CAS • Themed Tables • 6 cross tabulations • E.g. CT001 • Theme Table On All Dependent Children • 348 cells • Univariate Tables • 43 tabulations • E.g. UV003 • Sex • 3 cells • Key Statistics Tables • 31 tabulations • E.G KS001 • Usually Resident Population • 6 cells • Standard offerings • 53 cross tabulations • E.g. CS001 • Age/Sex/Resident Type • 250 cells Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
Constraint and Optimisation using Key Statistics • As a first step we have constrained by age and ensured that we have the correct number of household reference people • Makes it easier to construct households for Toy Model • Our fitness function is a simple Sum of Squared Errors (SSE) for a number of aggregate variables • Measure of the difference between aggregate counts from the ISAR records and the published and aggregated CAS Key Statistics • Initial focus on health and household composition Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
Optimisation Variables • Health variables • peopleWhoseGeneralHealthWasGood • peopleWhoseGeneralHealthWasFairlyGood • peopleWhoseGeneralHealthWasNotGood • peopleWithLimitingLongTermIllness • peopleWithoutLimitingLongTermIllness (Derived) • Houshold Composition variables • oneFamilyAndNoChildren (Derived) • marriedOrCohabitingCoupleWithChildren (Derived) • loneParentHouseholdsWithChildren (Derived) (Derived) means calculated from other variables Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
Optimisation and Goodness of Fit • Initially for each Output Area in Leeds we generated 10000 possibly different solutions and picked the best one • Now we are using a genetic algorithm to assist in finding a better solution • More strategic • Constraints form genes • Effectively each genetic bit string is an ordered boolean array for the ISAR • AGE0 and HRP order • Currently genetic algorithm works by breeding and mutation and survival of the fittest Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
Next Steps 1 • Constraints • Additional constraint by gender • Should improve household formation • Need to use Standard CAS cross tabulations • Problems due to confidentiality • Perhaps need to consider larger regions than Output Areas • Beginning investigating what other constraints are possible • Leeds • UK • Identify problem Output Areas • Optimisation • Use more optimisation variables • Experiment with the genetic algortihm Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
Next Steps 2 • Testing • Examine results • Mapping • Optimised variables • Exogenous variables • Grid Enabling • Data • Provenance • Toy Model • Publication Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
MoSeS Recap • We are developing a dynamic geographic microsimulation of the UK • A model comprising of individual people that occupy the UK environment and move about it through time interacting in numerous ways • Each individual will have family, household and social networks and reasonably complex characteristics and behaviour • The idea is to build a platform for simulating change in the UK for ASAP Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
Thank you ! • Any feedback or questions? • Please email • A.G.D.Turner@leeds.ac.uk http://www.ncess.ac.uk Alternative Futures – ASAP Research Cluster Seminar 16th November 2005
Acknowledgements • Thanks to all involved in the production of the maps that I grabbed off the internet for the start of this presentation Alternative Futures – ASAP Research Cluster Seminar 16th November 2005