1 / 19

Shared Genomics: Extended Reasoning for Epidemiology

Shared Genomics: Extended Reasoning for Epidemiology. Iain Buchan University of Manchester Microsoft Research Visit, Manchester, 17 th June 2008. Today. Identify the need to extend epidemiology and introduce the two central studies

kailey
Download Presentation

Shared Genomics: Extended Reasoning for Epidemiology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Shared Genomics:Extended Reasoning for Epidemiology Iain Buchan University of Manchester Microsoft Research Visit,Manchester, 17th June 2008

  2. Today • Identify the need to extend epidemiologyand introduce the two central studies • Introduce the software engineering approach to building the shared genomics platform • Explore some extensions to statistical genetics and consider how to incorporate them 09:15 10:30 11:30

  3. Thanks • Microsoft • Mark (Project Management) • Gareth (HPC Software Engineering) • Peter/Melandra Ltd (Annotation Bases) • David (Bioinformatics & Novel Analytics) • George (Bioinformatics) • Carole (Social Computing & myGrid) • Adnan, Angela & Fernando (MAAS Study) • John & Martin (Salford NHS)

  4. Introductions

  5. def. Epidemiology “the study ofthe distributionand determinantsof diseaseand health-related statesin populations” JM Last, 2000

  6. Epidemiology 1600-1860 Imagination Summarisation Knowledge Observation

  7. Epidemiology 1860-2000 Imagination Summarisation & Statistical Modelling Knowledge Observation± Experimentation

  8. Exhausted Epidemiology Platform Problem 1:Dwindling hits from tools todetect independent “causes” Problem 2:Knowledge can’t be managedby reading papers any more The big public health problems e.g. Type 2 Diabeteshave “complex webs of causes” The “data-set” and structureextend beyondthe study’s observations

  9. GP GP GP Hosp. GP F I R E W A L L Outputs Person-identifiable and sensitive information removed Data Repository In PCT Anonymised Data Repository in PCT 24-hourly updates Real-time Link on NHS number Trusted person poses question(s) Optometrist Eye screening Community nurses Podiatry Biomics Data Deaths, Demographics etc.

  10. Exposure (simple): Food intake & physical activity Modifying factors (e.g. sex) Exposure (compound): Sustained +ve energy balance Intermediate outcome: Overweight Intermediate outcome: Central Obesity Outcome (state): Type 2 diabetes Outcome (function): Early death Confounding factors (e.g. transport)

  11. Obesity Graphs (emerging) Foresight

  12. …ATTTAGGACCAATAAGTCT… …AATTAGGATCAATAAGTCT… ? ? Gene Association Studies Which genetic variation is responsible for disease variation Single nucleotide polymorphisms (SNPs) Human genome = 3 billion bases  3 million sites of variation Current cohorts ≈ 5,000 individuals vs. 500,000 SNPs

  13.  Patients  S N P s Crude Pan-Genome Scans for( i = 1 to #random permutations) { } for( j = 1 to #SNPs) { } for( k = 1 to #patients) { disease status vs. locus status 2 } Given a typical 5k patients, 0.5m SNPs and 10k permutations: 20k 2 calcs per sec on modern single core  70 hrs single SNPs;  ≈1,980 years for [n*(n-1)]/2 SNP pairs

  14. Shared Views of Structure Pathway/expert 1 Pathway/expert 2 Causal pathway Modifiers Outcome Exposure Confounders

  15. Evidence & Theory Data Configuration Algorithms Knowledge Management Visualisation Insight Abstract thinking Signal

  16. Simple Algorithms Simple Algorithms z z z G G G P P P 1) 2) 3) Computational free-thinking, for insights from richly-observed health & environments

  17. Shared Genomics Platform • Apr-Jun 08 • Statistical genetics algorithms for Win cluster • Annotation-base prototype • Jul-Dec 08 • Basic statistical genetics platform • Model genome-wide analyses with MSR & PIs • Jan 09 – Mar 10 • Epidemiologists driving genome-wide analyses • Integrated modelling and annotation

  18. Wider Opportunities • Text-mining for causal inference • Prototyping planned with NaCTeM • Stress-testing annotation workflow systems • Proposal to OMII ENGAGE • Novel visualisation of annotations • Novel statistical algorithms • Graph-based causality workbench • Potential grant applications

  19. Published papers; unpublished papers; slides; abstracts; blogs; experts; workflows; statistical scriptssignposts to other relevant data... Catalogues, ontologies, search engines, text-mining, analytical services, social networks etc. Causality Workbench Factors a, b and c are not in my study, but they cluster with it in various ways:Factor b is a potentially important measured confounder – I will add it... Modifiers Causality? Exposures Confounders Errors Outcome Structure

More Related