390 likes | 546 Views
Methodbox: From open-data to open-insight. MethodBox Team Jul 2011. Presentation. Problem Data tsunami + puddles of insight Solution Collective efficient science Deployment Sense-making networks on open-data. Quote.
E N D
Methodbox:From open-data to open-insight MethodBox Team Jul 2011
Presentation • ProblemData tsunami + puddles of insight • SolutionCollective efficient science • DeploymentSense-making networks on open-data
Quote “…you call it Epidemiology and we call it quantitative Social Science” A leading researcher, Jul 2011 Open dataCommon methodsPotentially complementary expertise
Obesity Example Fragmented understandingof public health problems such as obesity...data, methods/models and expertisesplit acrossdisciplines (e.g. social vs. biomedical)and settings (e.g. academia vs. healthcare)
Puddles of research around the organising principle … but policies need the big picture
Data Example • Time series data from Health Visitors from Wirral • Data deposit with UKDA but no uses for 16 years • Children measured at the time the obesity epidemic took hold…
Material deprivation affecting children (households with children: % on benefits in 2001-3) Wirral (0.3M), UK Fifths of IDAC 2004 Red (light) = most deprived Red (dark) Purple Blue (dark) Blue (light) = most affluent
BMI of 3 yr olds 1988 - 1989 Fifths of BMI SDS BMI fifth Red (light) = fattest Red (dark) Purple Blue (dark) Blue (light) = thinnest
BMI of 3 yr olds 1990 - 1991 Fifths of BMI SDS BMI fifth Red (light) = fattest Red (dark) Purple Blue (dark) Blue (light) = thinnest
BMI of 3 yr olds 1992 - 1993 Fifths of BMI SDS BMI fifth Red (light) = fattest Red (dark) Purple Blue (dark) Blue (light) = thinnest
BMI of 3 yr olds 1994 - 1995 Fifths of BMI SDS BMI fifth Red (light) = fattest Red (dark) Purple Blue (dark) Blue (light) = thinnest
BMI of 3 yr olds 1996 - 1997 Fifths of BMI SDS BMI fifth Red (light) = fattest Red (dark) Purple Blue (dark) Blue (light) = thinnest
BMI of 3 yr olds 1998 - 1999 Fifths of BMI SDS BMI fifth Red (light) = fattest Red (dark) Purple Blue (dark) Blue (light) = thinnest
BMI of 3 yr olds 2000 – 2001 Fifths of BMI SDS BMI fifth Red (light) = fattest Red (dark) Purple Blue (dark) Blue (light) = thinnest
BMI of 3 yr olds 2002 - 2003 Fifths of BMI SDS BMI fifth Red (light) = fattest Red (dark) Purple Blue (dark) Blue (light) = thinnest
Child Obesity:Action 6 years after signal in the data Body Mass Index (BMI) trend in Wirral 3y-olds from 1988 to 2003 0.5 0.4 0.3 0.2 0.1 Three-monthly rolling average BMI SDS 0 Actions -0.1 Clues -0.2 -0.3 -0.4 Mar-88 Jul-89 Nov-90 Apr-92 Aug-93 Jan-95 May-96 Sep-97 Feb-99 Jun-00 Nov-01 Mar-03 Aug-04 Month of measurement by Health Visitor SDS = standard deviation score from 1990 British Growth Reference charts – adjusts for age and sex of the child
Similar Data in 2011 • National Child Measurement Programme • Anonymised national database • Could be opened (like national pupil database) extend to other policy-relevant, timely research
Data Already in UK Data Archive • Example: Health Surveys for England (annual) • Analyses feed national policies • Does evidence need to be localised?...
Women and not menfrom low-income households are fatter in England 27.5 27 26.5 BMI 26 25.5 25 Women 1 2 Men 3 4 5 Income fifth (low to high) Data from Health Survey for England
Women from low-income households and men from high-income householdsare fatter in Greater Manchester 27.5 27 26.5 BMI 26 25.5 25 Women 1 2 Men 3 4 5 Income fifth (low to high) Data from Health Survey for England
Linked-data ≠Linked: data, methods & investigators Social Research: Data, methods & investigators Biomedical Research: Data, methods & investigators Previous slides showsocial-biomedical signalsabout obesityfrom under-used datasets
MethodBox Aim ..to increase the sharing and reuse ofdata sources & extractsand data processing methodsin one in-silico environment (‘e-Lab’)shared by social and health researchers
e-Lab Research Object Research protocol Statistical analysis scripts Data-sources Analysis-logs & notes Find Share Reuse Data-preparation scripts Figures/Graphics Working datasets Manuscripts References Slides Socially-stimulating science, in-silico
National Dataset Example • Health Surveys for England • Large-scale (participants * variables) • Annual since early 90s • Under-used by NHS who fund it • Key barrier:extracting a research-ready subset of data • Data archive playground = e-Lab
Supporting and Developing Interdisciplinary Understanding Sharing resources – tools, methods, data First step - sharing of resources Shared resources provide the basis for discussion Discussions lead to deeper interdisciplinary understanding Understanding of other domains promotes more effective interdisciplinary working Sharing expertise – discussions and reuse around shared resources Developing interdisciplinary understanding – language, tacit assumptions, methods Promoting interdisciplinary working
Facilitating a social networkof data archive users……toward a reward environmentfor sharing data, methods,and expertise
Browsing for data extractsmade by a social networkof data archive users…
Shopping for variables from across different years of survey collections…
Sharing and visibility Linking a data extractwith a script forderiving variables… Making the data extractvisible…
Enabling user-visibility for data extraction or derivation contributions…
Current MethodBox Video link
Training Course Apr `10 • Trained a mixture of NHS, academic and industry users of HSE in the use of Methodbox • Course run in conjunction with CCSR • Feedback forms completed by 15 of 16 attendees, asked to rate Methodbox from 1 (negative) to 7 (positive) on the following statements: • I thought MethodBox was: • Terrible - Wonderful: Mean = 5.57 • Difficult to understand - Easy = 5.57 • Frustrating to use - Satisfying = 5.79 • Dull - Stimulating = 5.29 • Rigid - Flexible = 5.71 • Difficult to navigate - easy to navigate = 6
MethodBox Evolution • Amazon-like user-prompting forother variables that may be relevantto the set being extracted • More surveys/datasets incorporated • User-contributed & community-curated datasets • …. • Feature request list exceeds resources
Building on Successful E-Science • Most widely used scientific workflow sharing systems: myGrid, Taverna, myExperiment • Over a decade of programme funding sustained world leading • E-Infrastructure R&D ready to leverage more outputs from open-linked data
Toward Open Insight • Researcher A is expert in deprivation • Researcher B is expert in obesity • Both use a common data archivebut don’t usually meet • MethodBox shares the expertise of A and Bto create a more complete model of deprivation in obesity
Conclusion • Open-data alone is not enough • Social e-infrastructure for science is needed • Sharing insights and methods is key, and can be achieved through systems like MethodBox + ESDS