1 / 25

A flexible, scalable genomics framework for integrating heterogeneous vector sequence data

A flexible, scalable genomics framework for integrating heterogeneous vector sequence data. Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame. Assembly required…. VectorBase is here to help (esp. – OMICs data).

alexia
Download Presentation

A flexible, scalable genomics framework for integrating heterogeneous vector sequence data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A flexible, scalable genomics framework for integrating heterogeneous vector sequence data Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame

  2. Assembly required…

  3. VectorBase is here to help (esp. –OMICs data) Please see me and/or Dan Lawson (EBI) anytime this meeting

  4. Anopheles gambiae M & S Lawnziak, Emrich et al. (2010, Science)

  5. Some genomic regions display footprint of strong, recent selection Lawniczak, Emrich et al. 2010 Science

  6. FlexReseq tool for integrating diverse sequence data Reference: ACGTCGT TACTGC Sample_1: ACGTC GATACTGC ACGTCGATAT TGC ACGTCGATAT TGC AC GTCGAT ACTGC ACGTCGAT ACTGC Sample_2: ACG TCGT TAT TGC ACGTCGT TAT TGC ACGTCGT TAT TGC ACGTCGT TAT TGC ACGTC GT TAT TGC

  7. Genome Analysis Toolkit (GATK): Map-Reduce framework that allows efficient access to large resequencing data sets FlexReseq: A module for GATK: Configurable interface allows easy data exploration Modular implementation of rules allows for easy extension of software Saves you from lots of scripting (Perl) code! FlexReseq implementation McKenna et al., Genome Research, 2010

  8. A malaria use-case for FlexReseq How did drug-resistance evolve? Why are some parasites drug-resistant? Goal: we want to connect genotype (genome) to phenotype (drug response) Samarakoon, Regier, et al., BMC Genomics, 2011

  9. 1. Whole genome shotgun sequencing 2. Reference genome mapping Parents HB3, Dd2 Parental genomes [shotgun libraries] Mapped: SSAHA2 http://www.sanger.ac.uk NCBI Trace Archive [28] Genetic cross Wellems et al. 1990 [24] Progeny recombinants SC05 7C126 Progeny genomes [shotgun libraries] Reference genome (3D7) Shotgun libraries GS-FLX technology 454/Roche PlasmoDB (v5.4) [27]

  10. A more detailed map of P. falciparum (A) 7C126 (B) SC05 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Chromosome Chromosomeposition Dd2 HB3

  11. Association of 2La with clines of aridity in Nigeria… 24,000 mosquitoes 194 sampling localities Modified from Coluzzi et al (1979)

  12. High-throughput sequencing • Data from Besansky lab • Illumina Genome Analyzer • 4 population pools(S-form) • SHRiMP alignment • BWA works also C. Cheng et al, unpublished

  13. Differential mapping biases do exist

  14. Population haplotyping

  15. In situ error isolation Has been shown to be important in ancient DNA-based ecology

  16. Thanks to… Notre Dame Bioinformatics Lab, Summer 2010 VectorBase (NIH/NIAID) • Dr. Nora Besansky (ND) • Dr. Frank Collins (ND) • Rory Carmichael, Andrew Shehan, Nate Konopinski, Dave Campbell (ND), others… Anopheles genome cluster group i5K Arthropod Genomics Consortium steering committee

More Related