160 likes | 349 Views
A Hybrid Assembly System in Zebrafish Pooled Clones. Zemin Ning The Wellcome Trust Sanger Institute. Insert. extended long reads of 1-2Kb. ~300 bp. 30-75 bp. 30-75 bp. Solexa assembly. Genome/Chromosome Assembly. Solexa Reads. WGS Reads 5X. Fishing WGS Reads. FuzzyPath.
E N D
A Hybrid Assembly System in Zebrafish Pooled Clones Zemin Ning The Wellcome Trust Sanger Institute
Insert extended long reads of 1-2Kb ~300 bp 30-75 bp 30-75 bp Solexa assembly Genome/Chromosome Assembly Solexa Reads WGS Reads 5X Fishing WGS Reads FuzzyPath Combined Reads Phusion or Phrap Phusion
Pileup of other reads like 454, Sanger etc at a repeat junction Kmer Extension & Repeat Junctions A2 A1 Consensus Means to handle repeats: - Base quality - Read pair - Fuzzy kmers - Closely related reference - 454 or Sanger reads
Boundary of Solexa Contigs Solexa contigs WGS DH reads and contigs
Zfish and “Pig” Clone Assemblies Solexa reads: Number of reads: 4.3 million;Estimated size of covered region: 1.72 Mbp; Read length: 2x36bp; Estimated read coverage: ~180X; Insert size: 260/50-400 bp; Zfish DH reads: 12,539 Assembly features: - contig statsSolexa Hybrid_Ctg Hybrid_Super N contigs: 496 152 95 Bases: 1.25 Mbp 1.68 Mbp 1.69 Mbp N50 size: 4,975 25,817 74,598 Largest 23,906 79,730 144,808 Averaged: 2,513 11,072 17,815 Coverage: ~72.6 % ~73% ~73% Errors: ? ? ?
Second Set with 50 Zfish Clones Solexa reads: Number of reads: 17.5 million;Estimated size of covered region : ~9.0 Mbp; Read length: 2x54bp; Estimated read coverage: ~190X; Insert size: 260/50-400 bp; Zfish DH capillary reads: 112,583 Assembly features: - contig statsSolexa Hybrid_Ctg Hybrid_Super N contigs: 3,143 688 359 Bases: 4.01 Mbp 8.39 Mbp 8.43 Mbp N50 size: 3,189 24,448 70,703 Largest 23,018 108,090 274,224 Averaged: 1,275 12,194 23,493 Coverage: ~50% ~93% ~94% Errors: ? ? ?
maq ssaha2
maq ssaha2
Contig of Zv8 Contig of hybrid assembly
Acknowledgements: • Yong Gu • James Bonfiled • Hannes Ponstingl • Helen Beasley • Siobhan Whitehead • Michael Quail • Tony Cox