290 likes | 384 Views
Quality of assemblies—mouse. Terminology: N50 contig length If we sort contigs from largest to smallest, and start Covering the genome in that order, N50 is the length Of the contig that just covers the 50 th percentile. 7.7X sequence coverage. Quality of assemblies—dog. 7.5X
E N D
Quality of assemblies—mouse Terminology:N50 contig length If we sort contigs from largest to smallest, and start Covering the genome in that order, N50 is the length Of the contig that just covers the 50th percentile. 7.7X sequence coverage
Quality of assemblies—dog 7.5X sequence coverage
Quality of assemblies—chimp 3.6X sequence Coverage Assisted Assembly
History of WGA 1997 • 1982: -virus, 48,502 bp • 1995: h-influenzae, 1 Mbp • 2000: fly, 100 Mbp • 2001 – present • human (3Gbp), mouse (2.5Gbp), rat*, chicken, dog, chimpanzee, several fungal genomes Let’s sequence the human genome with the shotgun strategy That is impossible, and a bad idea anyway Phil Green Gene Myers
$985 deCODEme (November 2007) $399 Personal Genome Service (November 2007) $2,500 Health Compass service (April 2008) Genetic Information Nondiscrimination Act (May 2008) $350,000 Whole-genome sequencing (November 2007)
Applications Whole-genome sequencing Comparative genomics Genome resequencing Structural variation analysis Polymorphism discovery Metagenomics Environmental sequencing Gene expression profiling Genotyping Population genetics Migration studies Ancestry inference Relationship inference Genetic screening Drug targeting Forensics
New sequencing applications Sequencing applications Increase in sequencing data output Demand for more sequencing Sequencing technology improvement
Sequencing technology Sanger sequencing $10.00 $1.00 Cost per finished bp: $0.10 $0.01 1975 1980 1990 2000 2008 Fred Sanger Read length: 15 – 200 bp 500 – 1,000 bp Throughput: “grad-student years” 2 ∙ 106 bp/day
Sequencing technology Sanger sequencing 3 ∙ 109 bp 1x coverage 10x coverage × 3 ∙ 109 bp = 40 years 2 ∙ 106 bp/day = $30 million 10x coverage × 3 ∙ 109 bp × $0.001/bp
Pyrosequencing on a chip • Mostafa Ronaghi, Stanford Genome Technologies Center • 454 Life Sciences
Sequencing technology Next-generation sequencing “short reads” Read length: 250 bp Throughput: 300 Mb/day Cost: ~ 10,000 bp/$ De novo: yes Genome Sequencer / FLX
Sequencing technology Next-generation sequencing Genome Analyzer SOLiD Analyzer “microreads” Read length: ~ 35 bp Throughput: 300 – 500 Mb/day Cost: ~ 100,000 bp/$ De novo: yes
Sequencing technology Next-generation sequencing Genome Analyzer SOLiD Analyzer reads Read length: ~ 50-150 bp Throughput: 3 Gb/day Cost: ~ 3,000,000 bp/$ De novo: yes
Complete Genomics • $5,000 this summer • Quality?... • 1,000 genomes in 2009 • 20,000 genomes in 2010
So, how fast is cost going down? • 2006: $10 million • 2008: $100,000 • 2009: $10,000 • ? $1,000 • ??? $100
Sequencing technology Next-generation sequencing “SNP chips” Infinium Assay GeneChip Array genotypes Read length: 1bp Throughput: 1 – 2 Mb/day Cost: 5,000 bp/$ De novo: no
Nanopore Sequencing http://www.mcb.harvard.edu/branton/index.htm
Sequencing technology Next-generation sequencing
Evolution at the DNA level Deletion Mutation …ACGGTGCAGTTACCA… SEQUENCE EDITS …AC----CAGTCCACCA… REARRANGEMENTS Inversion Translocation Duplication
Evolutionary Rates next generation OK OK OK X X Still OK?