1 / 38

T axon diversity analysis for bulk insect samples using Illumina Hi- seq platform

T axon diversity analysis for bulk insect samples using Illumina Hi- seq platform. Xin ZHOU, Shanlin LIU, Yiyuan LI, Qing YANG, and Xu SU Department of Science and Technology Environmental Genomics Research Group BGI, China. Adelaide, Australia, 3 December 2011 . Problem.

ivan
Download Presentation

T axon diversity analysis for bulk insect samples using Illumina Hi- seq platform

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Taxon diversity analysis for bulk insect samples using Illumina Hi-seq platform Xin ZHOU, Shanlin LIU, Yiyuan LI, Qing YANG, and Xu SU Department of Science and Technology Environmental Genomics Research Group BGI, China Adelaide, Australia, 3 December 2011

  2. Problem Solutions? Opt.1: ......zzzzZZZZZ Opt.2: morph sorting  indiv. ID  …  Opt.1 Opt.3: morph sorting  indiv. barcoding …  Opt.1 Opt.4: grinding up  NGS  CLUSTERING/BLAST  DIVERSITY! Zhou et al. 2011, 4th International Barcode of Life Conference

  3. Environmental barcodingof bulk insects • aquatic insects • mini-barcode (130bp) • 454 • bat diet (insects) • COI fragment, 157 bp • 454 • Malaise trap (insects) • COI fragment, ~400 bp • 454 Biodiversity soup: metabarcoding of arthropods for rapid biodiversity assessment and biomonitoring, Yu D.W. et.al., in review Zhou et al. 2011, 4th International Barcode of Life Conference

  4. Major NGS platforms applicable in environmental barcoding Illumina Hi-Seq • higher through-put • less $ / bp • increasing reading length • variety of bioinformatics tools available from genomic pipelines Zhou et al. 2011, 4th International Barcode of Life Conference

  5. Sequencing capacity at BGI • 28 IlluminaGAIIx • 137 IlluminaHi-Seq2000 • 25 Life Tech SOLiD 4 • 16 ABI 3730XL • 110 MegaBACEs • 2 IlluminaiScan • 1 Roche 454 • 1 Ion Torrent • 1 Illumina Mi-Seq • Data production: • 100Gb / day (2009) • >5 Tb / day (end of 2010) • >1500X human genome / day Zhou et al. 2011, 4th International Barcode of Life Conference

  6. What I am NOT going to talk about: • Primer optimization • Systematic comparisons of NGS platforms • Quantitative diversity analysis What I AM going to talk about: • Can Illumina NGS be used in diversity analysis? Zhou et al. 2011, 4th International Barcode of Life Conference

  7. Can Illumina NGS be used in diversity analysis? • Sequencing error rate • Read-length Zhou et al. 2011, 4th International Barcode of Life Conference

  8. Sequencing error rate • No indel issue in homopolymers • Sequencing quality keeps increasing • Rare nucleotide error can be easily corrected by: • increasing sequencing depth • pair-end (PE) sequencing • setting stringent matching criteria in the overlapping fragment by allowing only >99% identity Insert-size 250nt Recent improvement in sequencing quality using Illumina’s V3 chemical • (even at 100 bp, only about 10% of the base callings has error rate >1%) 150bp 150bp PE sequencing enables forming sequence contigs Zhou et al. 2011, 4th International Barcode of Life Conference

  9. Read length • Read length keeps increasing • Short-gun reads can be further assembled into longer fragments (“short-gun” assembly strategy used in genome sequencing projects) Insert-size 250nt • 150PE enables contigreadof 250bp 150bp 150bp • Option of scaffoldassembly Zhou et al. 2011, 4th International Barcode of Life Conference

  10. Illuminaenvironmental barcoding • Illumina • e-barcoding • Full length COI • Mitochondrial shotgun PE sequencing • Full length COI without PCR bias • PCR based • PCR free Lib2 (200bp, 150PE) Lib1 (658bp, 150PE) • COI amplicons shotgun PE sequencing • Full length COI barcode PE sequencing Zhou et al. 2011, 4th International Barcode of Life Conference

  11. Approach #1: PCR-based Sample information Zhou et al. 2011, 4th International Barcode of Life Conference

  12. Approach #1: PCR-based Pre-analysis data filtering Zhou et al. 2011, 4th International Barcode of Life Conference

  13. OTU cluster (98%) OTU filtering workflow • Unique reads (abundance > 1) • Compared to reads of Lib 2 • Remove Chimera • Alignment Zhou et al. 2011, 4th International Barcode of Life Conference

  14. Sanger Reference Results Blast at 100% identity • NGS OTUs LepF1/R1 Mock XSBN 32 4 198 8 197 36 Customized primers Zhou et al. 2011, 4th International Barcode of Life Conference

  15. Sanger Reference Mock • NGS OTUs 31 can be found in our total sample, from which our mock samples were assembled “False positive”? False negative Not found in raw data (likely due to primer failure) 4 8 36 5 likely to be PCR errors Zhou et al. 2011, 4th International Barcode of Life Conference

  16. Sanger Reference XSBN Cross-sample contamination? • NGS OTUs 17 not found in raw data (primer failure) Mean + SE (group1) (group2) 32 198 197 15 were lost in data filtering Zhou et al. 2011, 4th International Barcode of Life Conference

  17. Sanger Reference • NGS OTUs Significantly less false positives after removal of sequences with abundance <10 49 32 181 198 197 84 Slight drop of true positives Zhou et al. 2011, 4th International Barcode of Life Conference

  18. Approach #1: PCR-based What’s next? Illuminae-barcoding • Obtaining full-length barcodes via short-gun reads assembly (new program in development – “SOAPbarcode”) • New algorithm to filter out false positive OTUs Zhou et al. 2011, 4th International Barcode of Life Conference

  19. Individual barcoding Approach #2: PCR-free method • Total MT isolation • & • DNA extraction • Shotgun sequencing • Reference • based method • Reference independent method Zhou et al. 2011, 4th International Barcode of Life Conference

  20. Building reference library: individual barcoding 89 individuals; 84 reference barcodes; 39 OTUs (2%); Zhou et al. 2011, 4th International Barcode of Life Conference

  21. Total MT isolation & DNA extraction Zhou et al. 2011, 4th International Barcode of Life Conference

  22. Shotgun sequencing • Insert size: 200bp; • Read length: 100bp PE; Zhou et al. 2011, 4th International Barcode of Life Conference

  23. Pre-analysis • Data filtering: • Adaptor contamination removal; • Quality control: • in each read, only allowing <10bp with seq. error rate >1% Zhou et al. 2011, 4th International Barcode of Life Conference

  24. Approach #2: PCR-free method Method 1: Reference based Blast reads to reference barcodes, confident identification is made only when: Best BLAST hit >98% identity; Reference coverage > 90%; Coverage: 100% Reference 1 Correct mapping Reference 2 Coverage: 30% Incorrect mapping Zhou et al. 2011, 4th International Barcode of Life Conference

  25. Potential sources of failure in detecting taxa ? Taxon specific or Bio-mass (size & number) Zhou et al. 2011, 4th International Barcode of Life Conference

  26. Failures in taxon detection Taxon bias? Zhou et al. 2011, 4th International Barcode of Life Conference

  27. Failures in taxon detection OR bio-mass (body size, # individuals)? Readily detected Average length> 5mm Missing Average length < 5mm Zhou et al. 2011, 4th International Barcode of Life Conference

  28. Approach #2: PCR-free method Method 2: Reference independent (Will we be able to identify diversity without reference MT genomes for the targeted species?) Workflow: • Assembly of COI gene using genome assembly program (SOAPdenovo); • Annotation using ~240 MT genomes downloaded from Genbank; Zhou et al. 2011, 4th International Barcode of Life Conference

  29. PCR-Free reference-independent: results Zhou et al. 2011, 4th International Barcode of Life Conference

  30. Reference independent Number of individuals we collected 89 individuals References independent 23 OTUs Barcode references 39 OTUs (84 individuals) References based 26 OTUs • 5 individuals failed in Sanger sequencing 3 OTUs not detected in reference independent method because: (1) sequencing depth is too low (<10X) to allow for reliable assembly (2) relatively small body-size Zhou et al. 2011, 4th International Barcode of Life Conference

  31. PCR-free method Zhou et al. 2011, 4th International Barcode of Life Conference

  32. PCR-free method Barcode region Zhou et al. 2011, 4th International Barcode of Life Conference

  33. Approach #2: PCR-free method What’s next? Currently: • MT DNA 5-10% after isolation; • Non-targeting DNA affects MT assembly (e.g., bacteria & genomic DNA); • Taxonomic/biomass bias Potential solutions: • Wet-lab protocol optimization • Pre-sorting insects by body-size • Alternative MT isolation methods • Increase sequencing depth Zhou et al. 2011, 4th International Barcode of Life Conference

  34. Conclusions • IlluminaHi-Seq delivers compatible performance as other NGS platforms in analyzing bulk insect samples, with potential advantages in achieving higher sensitivity at lower cost; • Deep sequencing capacity enables a novel PCR-free approach, which may eventually solve biases caused by DNA amplification; • It shares issues with other NGS platforms (non-quantitative, inflation of OTUs, etc.) • Methodology optimization is much needed in many details of the pipeline; • Collaborative and synergistic efforts made by the community would greatly advance the progress. Zhou et al. 2011, 4th International Barcode of Life Conference

  35. Acknowledgements Funder: Collaborators: Douglas W. Yu Kunming Institute of Zoology, Chinese Academy of Sciences MehrdadHajibabaei, ShadiShokralla University of Guelph Owain Edwards CSIRO Ecosystem Sciences LU Jianliang WU Qiong AN Sainan ZHOU Yizhuang ZHAO Jing Zhou et al. 2011, 4th International Barcode of Life Conference

  36. Thanks for your attention! 36 Zhou et al. 2011, 4th International Barcode of Life Conference

  37. Environmental barcoding

  38. Recovering biodiversity patterns in ecological studies Zhou et al. 2011, 4th International Barcode of Life Conference

More Related