50 likes | 217 Views
Genome Sequencing and Assembly Progress. İnanç Birol Shaun Jackman Tony Raymond. Forest Health Project – 14 November 2011. 1. Sequencing Progress. Sequenced 8 lanes of 4 paired-end libraries using the Illumina HiSeq for 13x coverage
E N D
Genome Sequencing and Assembly Progress İnanç Birol Shaun Jackman Tony Raymond Forest Health Project – 14 November 2011 1
Sequencing Progress Sequenced 8 lanes of 4 paired-end libraries using the Illumina HiSeq for 13x coverage Sequenced one lane of a mate-pair library using the Illumina HiSeq for 2x coverage
Assembly Progress Completed assembly stages De Bruijn graph assembly: 429 million sequences using 384 processors on 32 machines Find overlapping sequences: 627 million overlaps Remove redundant sequences: 128 million seq. Identify and remove variant sequences: 3.1 million Tony Raymond
Map the Reads to the Assembly Mapped 44% of the reads to contigs larger than 200 bp (1,178 million of 2,659 million reads) 13.2 Gbp assembled / 30 Gbp estimated = 44%
Ongoing Assembly Work Map the reads to the initial assembly Mapping to a 13 Gbp assembly is challenging Initially mapping to contigs at least 200 bp Developing methods to index a larger assembly Use a large-memory machine or solid-state drive Estimate distances between contigs Merge initial contigs to build larger contigs Use mate-pair reads to order and orient contigs into scaffolds