190 likes | 292 Views
Shared Genomics: Extended Reasoning for Epidemiology. Iain Buchan University of Manchester Microsoft Research Visit, Manchester, 17 th June 2008. Today. Identify the need to extend epidemiology and introduce the two central studies
E N D
Shared Genomics:Extended Reasoning for Epidemiology Iain Buchan University of Manchester Microsoft Research Visit,Manchester, 17th June 2008
Today • Identify the need to extend epidemiologyand introduce the two central studies • Introduce the software engineering approach to building the shared genomics platform • Explore some extensions to statistical genetics and consider how to incorporate them 09:15 10:30 11:30
Thanks • Microsoft • Mark (Project Management) • Gareth (HPC Software Engineering) • Peter/Melandra Ltd (Annotation Bases) • David (Bioinformatics & Novel Analytics) • George (Bioinformatics) • Carole (Social Computing & myGrid) • Adnan, Angela & Fernando (MAAS Study) • John & Martin (Salford NHS)
def. Epidemiology “the study ofthe distributionand determinantsof diseaseand health-related statesin populations” JM Last, 2000
Epidemiology 1600-1860 Imagination Summarisation Knowledge Observation
Epidemiology 1860-2000 Imagination Summarisation & Statistical Modelling Knowledge Observation± Experimentation
Exhausted Epidemiology Platform Problem 1:Dwindling hits from tools todetect independent “causes” Problem 2:Knowledge can’t be managedby reading papers any more The big public health problems e.g. Type 2 Diabeteshave “complex webs of causes” The “data-set” and structureextend beyondthe study’s observations
GP GP GP Hosp. GP F I R E W A L L Outputs Person-identifiable and sensitive information removed Data Repository In PCT Anonymised Data Repository in PCT 24-hourly updates Real-time Link on NHS number Trusted person poses question(s) Optometrist Eye screening Community nurses Podiatry Biomics Data Deaths, Demographics etc.
Exposure (simple): Food intake & physical activity Modifying factors (e.g. sex) Exposure (compound): Sustained +ve energy balance Intermediate outcome: Overweight Intermediate outcome: Central Obesity Outcome (state): Type 2 diabetes Outcome (function): Early death Confounding factors (e.g. transport)
Obesity Graphs (emerging) Foresight
…ATTTAGGACCAATAAGTCT… …AATTAGGATCAATAAGTCT… ? ? Gene Association Studies Which genetic variation is responsible for disease variation Single nucleotide polymorphisms (SNPs) Human genome = 3 billion bases 3 million sites of variation Current cohorts ≈ 5,000 individuals vs. 500,000 SNPs
Patients S N P s Crude Pan-Genome Scans for( i = 1 to #random permutations) { } for( j = 1 to #SNPs) { } for( k = 1 to #patients) { disease status vs. locus status 2 } Given a typical 5k patients, 0.5m SNPs and 10k permutations: 20k 2 calcs per sec on modern single core 70 hrs single SNPs; ≈1,980 years for [n*(n-1)]/2 SNP pairs
Shared Views of Structure Pathway/expert 1 Pathway/expert 2 Causal pathway Modifiers Outcome Exposure Confounders
Evidence & Theory Data Configuration Algorithms Knowledge Management Visualisation Insight Abstract thinking Signal
Simple Algorithms Simple Algorithms z z z G G G P P P 1) 2) 3) Computational free-thinking, for insights from richly-observed health & environments
Shared Genomics Platform • Apr-Jun 08 • Statistical genetics algorithms for Win cluster • Annotation-base prototype • Jul-Dec 08 • Basic statistical genetics platform • Model genome-wide analyses with MSR & PIs • Jan 09 – Mar 10 • Epidemiologists driving genome-wide analyses • Integrated modelling and annotation
Wider Opportunities • Text-mining for causal inference • Prototyping planned with NaCTeM • Stress-testing annotation workflow systems • Proposal to OMII ENGAGE • Novel visualisation of annotations • Novel statistical algorithms • Graph-based causality workbench • Potential grant applications
Published papers; unpublished papers; slides; abstracts; blogs; experts; workflows; statistical scriptssignposts to other relevant data... Catalogues, ontologies, search engines, text-mining, analytical services, social networks etc. Causality Workbench Factors a, b and c are not in my study, but they cluster with it in various ways:Factor b is a potentially important measured confounder – I will add it... Modifiers Causality? Exposures Confounders Errors Outcome Structure