240 likes | 398 Views
Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data. Kai Ye k.ye@lumc.nl. Data collection for osteoarthritis, cardiovascular disease and longevity. Serum parameters Cellular characteristics (biobank) Skin ageing Glycosylation Metabonomic
E N D
Bioinformatics at Molecular Epidemiology- new tools for identifying indels in sequencing data Kai Ye k.ye@lumc.nl
Data collection for osteoarthritis, cardiovascular disease and longevity Serum parameters Cellular characteristics (biobank) Skin ageing Glycosylation Metabonomic Transcriptomic Genetic (GWAS/sequence) Epigenetic Data Integration
Glycosylation Cell responses Joost Kok Erik vd Akker Kai Ye Statistical analysis metabonomic analysis Genetic & Epigenetic analyses Biochem analyses Expression analysis
About me • 1995 – 2003 B.S. and M.S. in biology and pharmaceutical science • 2004 – 2008 PhD with Cum Laude at Leiden University. Thesis title: Novel algorithms for protein sequence analysis • 2008 – 2009 Postdoc at European Bioinformatics Institute, collaborating with scientists in Sanger Institute • Currently assistant professor at MolEpi
A Pindel approach for identifying indels in Next-Gen sequencing data • Paired-end reads in Next-gen sequencing • Indel detection algorithms • Pindel • Cancer genome project • 1000 genomes project
Paired-end reads in Next Generation sequencing ~ insert size
Mapping paired-end reads SNP • CNVs: copy number variations; • INDELs: insertions and deletions; • SVs: Structural variations
Gapped alignment for small indels ATCCGTATCACGGTCA-CAGATCAGTCCAGT ATCCGTATCACGGTCAGCAGATCAGTCCAGT indel
Read-pair approach for SVs Sample Reference No Indel Sample Reference Deletion Sample Reference Insertion
read-pairs • read-depth Mapping paired-end reads SNP or small indel
Mapping paired-end reads SNP or small indel • read-pairs • read-depth
Pindel: Deletions test ref 1base - 1million bases
Anchor Pindel: Deletions ref
2 x average distance Anchor Pindel: Deletions ref
2 x average distance Anchor Expected maximum deletion size + read length (36) Pindel: Deletions ref
Pindel: Deletions sample reference
African male: NA18507 • Bentley et al., Nature 2008 • 135Gb of sequence • ~4 billion paired 35-base reads • After preprocessing: 56,161,333 pairs of one-end mapped reads • Pindel • 142,908 1-16bp insertions • 162,068 1bp-10kb deletions
Applications • Cancer genome project • 1000 genomes project
Cancer genome • COLO-829 cells • Normal ~30x paired-end 100bp reads • Tumor ~40x paired-end 100bp reads • Search for somatic (tumor specific) indels
1000genomes project • Pilot 1: 180 people of 3 major geographic groups (YRI, CEU, CHB and JPT) at low coverage (~4x) • Pilot 2: the genomes of two families (CEU and YRI, both parents and an adult child) with deep coverage (20x per genome) • Pilot 3: sequencing the coding regions (exons) of 1,000 genes in 1,000 people with deep coverage (20x).
www.ebi.ac.uk/~kye/pindel k.ye@lumc.nl