380 likes | 536 Views
T axon diversity analysis for bulk insect samples using Illumina Hi- seq platform. Xin ZHOU, Shanlin LIU, Yiyuan LI, Qing YANG, and Xu SU Department of Science and Technology Environmental Genomics Research Group BGI, China. Adelaide, Australia, 3 December 2011 . Problem.
E N D
Taxon diversity analysis for bulk insect samples using Illumina Hi-seq platform Xin ZHOU, Shanlin LIU, Yiyuan LI, Qing YANG, and Xu SU Department of Science and Technology Environmental Genomics Research Group BGI, China Adelaide, Australia, 3 December 2011
Problem Solutions? Opt.1: ......zzzzZZZZZ Opt.2: morph sorting indiv. ID … Opt.1 Opt.3: morph sorting indiv. barcoding … Opt.1 Opt.4: grinding up NGS CLUSTERING/BLAST DIVERSITY! Zhou et al. 2011, 4th International Barcode of Life Conference
Environmental barcodingof bulk insects • aquatic insects • mini-barcode (130bp) • 454 • bat diet (insects) • COI fragment, 157 bp • 454 • Malaise trap (insects) • COI fragment, ~400 bp • 454 Biodiversity soup: metabarcoding of arthropods for rapid biodiversity assessment and biomonitoring, Yu D.W. et.al., in review Zhou et al. 2011, 4th International Barcode of Life Conference
Major NGS platforms applicable in environmental barcoding Illumina Hi-Seq • higher through-put • less $ / bp • increasing reading length • variety of bioinformatics tools available from genomic pipelines Zhou et al. 2011, 4th International Barcode of Life Conference
Sequencing capacity at BGI • 28 IlluminaGAIIx • 137 IlluminaHi-Seq2000 • 25 Life Tech SOLiD 4 • 16 ABI 3730XL • 110 MegaBACEs • 2 IlluminaiScan • 1 Roche 454 • 1 Ion Torrent • 1 Illumina Mi-Seq • Data production: • 100Gb / day (2009) • >5 Tb / day (end of 2010) • >1500X human genome / day Zhou et al. 2011, 4th International Barcode of Life Conference
What I am NOT going to talk about: • Primer optimization • Systematic comparisons of NGS platforms • Quantitative diversity analysis What I AM going to talk about: • Can Illumina NGS be used in diversity analysis? Zhou et al. 2011, 4th International Barcode of Life Conference
Can Illumina NGS be used in diversity analysis? • Sequencing error rate • Read-length Zhou et al. 2011, 4th International Barcode of Life Conference
Sequencing error rate • No indel issue in homopolymers • Sequencing quality keeps increasing • Rare nucleotide error can be easily corrected by: • increasing sequencing depth • pair-end (PE) sequencing • setting stringent matching criteria in the overlapping fragment by allowing only >99% identity Insert-size 250nt Recent improvement in sequencing quality using Illumina’s V3 chemical • (even at 100 bp, only about 10% of the base callings has error rate >1%) 150bp 150bp PE sequencing enables forming sequence contigs Zhou et al. 2011, 4th International Barcode of Life Conference
Read length • Read length keeps increasing • Short-gun reads can be further assembled into longer fragments (“short-gun” assembly strategy used in genome sequencing projects) Insert-size 250nt • 150PE enables contigreadof 250bp 150bp 150bp • Option of scaffoldassembly Zhou et al. 2011, 4th International Barcode of Life Conference
Illuminaenvironmental barcoding • Illumina • e-barcoding • Full length COI • Mitochondrial shotgun PE sequencing • Full length COI without PCR bias • PCR based • PCR free Lib2 (200bp, 150PE) Lib1 (658bp, 150PE) • COI amplicons shotgun PE sequencing • Full length COI barcode PE sequencing Zhou et al. 2011, 4th International Barcode of Life Conference
Approach #1: PCR-based Sample information Zhou et al. 2011, 4th International Barcode of Life Conference
Approach #1: PCR-based Pre-analysis data filtering Zhou et al. 2011, 4th International Barcode of Life Conference
OTU cluster (98%) OTU filtering workflow • Unique reads (abundance > 1) • Compared to reads of Lib 2 • Remove Chimera • Alignment Zhou et al. 2011, 4th International Barcode of Life Conference
Sanger Reference Results Blast at 100% identity • NGS OTUs LepF1/R1 Mock XSBN 32 4 198 8 197 36 Customized primers Zhou et al. 2011, 4th International Barcode of Life Conference
Sanger Reference Mock • NGS OTUs 31 can be found in our total sample, from which our mock samples were assembled “False positive”? False negative Not found in raw data (likely due to primer failure) 4 8 36 5 likely to be PCR errors Zhou et al. 2011, 4th International Barcode of Life Conference
Sanger Reference XSBN Cross-sample contamination? • NGS OTUs 17 not found in raw data (primer failure) Mean + SE (group1) (group2) 32 198 197 15 were lost in data filtering Zhou et al. 2011, 4th International Barcode of Life Conference
Sanger Reference • NGS OTUs Significantly less false positives after removal of sequences with abundance <10 49 32 181 198 197 84 Slight drop of true positives Zhou et al. 2011, 4th International Barcode of Life Conference
Approach #1: PCR-based What’s next? Illuminae-barcoding • Obtaining full-length barcodes via short-gun reads assembly (new program in development – “SOAPbarcode”) • New algorithm to filter out false positive OTUs Zhou et al. 2011, 4th International Barcode of Life Conference
Individual barcoding Approach #2: PCR-free method • Total MT isolation • & • DNA extraction • Shotgun sequencing • Reference • based method • Reference independent method Zhou et al. 2011, 4th International Barcode of Life Conference
Building reference library: individual barcoding 89 individuals; 84 reference barcodes; 39 OTUs (2%); Zhou et al. 2011, 4th International Barcode of Life Conference
Total MT isolation & DNA extraction Zhou et al. 2011, 4th International Barcode of Life Conference
Shotgun sequencing • Insert size: 200bp; • Read length: 100bp PE; Zhou et al. 2011, 4th International Barcode of Life Conference
Pre-analysis • Data filtering: • Adaptor contamination removal; • Quality control: • in each read, only allowing <10bp with seq. error rate >1% Zhou et al. 2011, 4th International Barcode of Life Conference
Approach #2: PCR-free method Method 1: Reference based Blast reads to reference barcodes, confident identification is made only when: Best BLAST hit >98% identity; Reference coverage > 90%; Coverage: 100% Reference 1 Correct mapping Reference 2 Coverage: 30% Incorrect mapping Zhou et al. 2011, 4th International Barcode of Life Conference
Potential sources of failure in detecting taxa ? Taxon specific or Bio-mass (size & number) Zhou et al. 2011, 4th International Barcode of Life Conference
Failures in taxon detection Taxon bias? Zhou et al. 2011, 4th International Barcode of Life Conference
Failures in taxon detection OR bio-mass (body size, # individuals)? Readily detected Average length> 5mm Missing Average length < 5mm Zhou et al. 2011, 4th International Barcode of Life Conference
Approach #2: PCR-free method Method 2: Reference independent (Will we be able to identify diversity without reference MT genomes for the targeted species?) Workflow: • Assembly of COI gene using genome assembly program (SOAPdenovo); • Annotation using ~240 MT genomes downloaded from Genbank; Zhou et al. 2011, 4th International Barcode of Life Conference
PCR-Free reference-independent: results Zhou et al. 2011, 4th International Barcode of Life Conference
Reference independent Number of individuals we collected 89 individuals References independent 23 OTUs Barcode references 39 OTUs (84 individuals) References based 26 OTUs • 5 individuals failed in Sanger sequencing 3 OTUs not detected in reference independent method because: (1) sequencing depth is too low (<10X) to allow for reliable assembly (2) relatively small body-size Zhou et al. 2011, 4th International Barcode of Life Conference
PCR-free method Zhou et al. 2011, 4th International Barcode of Life Conference
PCR-free method Barcode region Zhou et al. 2011, 4th International Barcode of Life Conference
Approach #2: PCR-free method What’s next? Currently: • MT DNA 5-10% after isolation; • Non-targeting DNA affects MT assembly (e.g., bacteria & genomic DNA); • Taxonomic/biomass bias Potential solutions: • Wet-lab protocol optimization • Pre-sorting insects by body-size • Alternative MT isolation methods • Increase sequencing depth Zhou et al. 2011, 4th International Barcode of Life Conference
Conclusions • IlluminaHi-Seq delivers compatible performance as other NGS platforms in analyzing bulk insect samples, with potential advantages in achieving higher sensitivity at lower cost; • Deep sequencing capacity enables a novel PCR-free approach, which may eventually solve biases caused by DNA amplification; • It shares issues with other NGS platforms (non-quantitative, inflation of OTUs, etc.) • Methodology optimization is much needed in many details of the pipeline; • Collaborative and synergistic efforts made by the community would greatly advance the progress. Zhou et al. 2011, 4th International Barcode of Life Conference
Acknowledgements Funder: Collaborators: Douglas W. Yu Kunming Institute of Zoology, Chinese Academy of Sciences MehrdadHajibabaei, ShadiShokralla University of Guelph Owain Edwards CSIRO Ecosystem Sciences LU Jianliang WU Qiong AN Sainan ZHOU Yizhuang ZHAO Jing Zhou et al. 2011, 4th International Barcode of Life Conference
Thanks for your attention! 36 Zhou et al. 2011, 4th International Barcode of Life Conference
Recovering biodiversity patterns in ecological studies Zhou et al. 2011, 4th International Barcode of Life Conference