1 / 24

Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data

Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data. Kai Ye k.ye@lumc.nl. Data collection for osteoarthritis, cardiovascular disease and longevity. Serum parameters Cellular characteristics (biobank) Skin ageing Glycosylation Metabonomic

annis
Download Presentation

Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics at Molecular Epidemiology- new tools for identifying indels in sequencing data Kai Ye k.ye@lumc.nl

  2. Data collection for osteoarthritis, cardiovascular disease and longevity Serum parameters Cellular characteristics (biobank) Skin ageing Glycosylation Metabonomic Transcriptomic Genetic (GWAS/sequence) Epigenetic Data Integration

  3. Glycosylation Cell responses Joost Kok Erik vd Akker Kai Ye Statistical analysis metabonomic analysis Genetic & Epigenetic analyses Biochem analyses Expression analysis

  4. About me • 1995 – 2003 B.S. and M.S. in biology and pharmaceutical science • 2004 – 2008 PhD with Cum Laude at Leiden University. Thesis title: Novel algorithms for protein sequence analysis • 2008 – 2009 Postdoc at European Bioinformatics Institute, collaborating with scientists in Sanger Institute • Currently assistant professor at MolEpi

  5. A Pindel approach for identifying indels in Next-Gen sequencing data • Paired-end reads in Next-gen sequencing • Indel detection algorithms • Pindel • Cancer genome project • 1000 genomes project

  6. Paired-end reads in Next Generation sequencing ~ insert size

  7. Mapping paired-end reads SNP • CNVs: copy number variations; • INDELs: insertions and deletions; • SVs: Structural variations

  8. Gapped alignment for small indels ATCCGTATCACGGTCA-CAGATCAGTCCAGT ATCCGTATCACGGTCAGCAGATCAGTCCAGT indel

  9. Read-depth for CNVs

  10. Read-pair approach for SVs Sample Reference No Indel Sample Reference Deletion Sample Reference Insertion

  11. read-pairs • read-depth Mapping paired-end reads SNP or small indel

  12. Mapping paired-end reads SNP or small indel • read-pairs • read-depth

  13. Pindel: Deletions test ref 1base - 1million bases

  14. Anchor Pindel: Deletions ref

  15. 2 x average distance Anchor Pindel: Deletions ref

  16. 2 x average distance Anchor Expected maximum deletion size + read length (36) Pindel: Deletions ref

  17. Pindel: Deletions sample reference

  18. African male: NA18507 • Bentley et al., Nature 2008 • 135Gb of sequence • ~4 billion paired 35-base reads • After preprocessing: 56,161,333 pairs of one-end mapped reads • Pindel • 142,908 1-16bp insertions • 162,068 1bp-10kb deletions

  19. Deletion size distribution

  20. Applications • Cancer genome project • 1000 genomes project

  21. Cancer genome • COLO-829 cells • Normal ~30x paired-end 100bp reads • Tumor ~40x paired-end 100bp reads • Search for somatic (tumor specific) indels

  22. 1000genomes project • Pilot 1: 180 people of 3 major geographic groups (YRI, CEU, CHB and JPT) at low coverage (~4x) • Pilot 2: the genomes of two families (CEU and YRI, both parents and an adult child) with deep coverage (20x per genome) • Pilot 3: sequencing the coding regions (exons) of 1,000 genes in 1,000 people with deep coverage (20x).

  23. www.ebi.ac.uk/~kye/pindel k.ye@lumc.nl

More Related