1 / 34

Today

Today. Please read… S cience 291: 1304-1315. Human Genome Project Dissenters My Brush with Greatness?. 1992 : Two years into the HGP, two of the projects biggest critics were…

zola
Download Presentation

Today

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Today • Please read… Science 291: 1304-1315

  2. Human Genome Project DissentersMy Brush with Greatness? • 1992: Two years into the HGP, two of the projects biggest critics were… • Sydney Brenner: believed that the HGP should focus on human EST collections, and sequence the genome of a simple vertebrate (Fugu). • Craig Venter: believed that the clone-by-clone approach was not the most efficient way to proceed, suggested that shotgun approaches, and even a whole genome approach was feasible. …they were both right.

  3. Sydney Brenner 2002 Nobel Prize (Medicine/Physiology) Sydney Brenner and John E. Sulston, Britain H. Robert Horvitz, United States • for discoveries concerning how genes regulate organ development and a process of programmed cell death.

  4. Brenner was right…. Expressed Sequence TagsESTs End sequenced cDNAs (complementary DNA) • cDNA: synthetic DNA transcribed from a mRNA template, • through the action of an RNA dependant DNA polymerase called reverse transcriptase. Online Primer: est.html

  5. Still Sequencing cDNAs, • first and easiest look into any genome, • useful in understanding genomic sequence (gene finding), • helps determine splice site variants, • shorter than genomic clones, fits in plasmids, • etc.

  6. Used for microarrays… …an array of DNA that can be hybridized with probes to study patterns of gene expression. …tissue specific ESTs are very useful.

  7. Venter was right…. J. Craig Venter Whole Genome Assembly • 1995: 1.8 Mbp Haemophilus influenza genome sequenced, • 1996 - on : Mycoplasma, E. coli and others*, • 1999: Chromosome 2 of Arabidopsis, • 2000: Drosophila (120 Mbp) genome, …Human, Mosquito, etc… • Lots of genomes, several applications... *WGA of bacterial, viral populations...

  8. 1 year, 120 megabases, • Assembly algorithms could generate accurate genomic sequences, • Interim assemblies (or mapping) were not necessary. 24 MARCH 2000 VOL 287 SCIENCE

  9. Big Biology

  10. Think About This… …the plasmid library construction is the first critical step in WGA sequencing, • “if the DNA libraries are not uniform in size, non-chimeric, and do not randomly represent the genome, then the subsequent steps cannot accurately reconstruct the genome sequence.” • “We used automated high-throughput DNA sequencing and the computational infrastructure to enable efficient tracking of enormous amounts of sequence information (27.3 million sequence reads; 14.9 billion bp of sequence).”

  11. Who’s DNA? • 21 enrolled donors, • age, sex, ethnographic group, • one African-American, • one Asian-Chinese, • one Hispanic-Mexican, • two Caucasions*.

  12. J. Craig Venter Who’s Mostly?

  13. …back to humans… Individuals, Libraries, Sequence coverage, Clone coverage, Other? What to know? 543 bp average sequence read 8, September 1999 - 25, June 2000

  14. Online Primer: snps.html WGA Outline

  15. DNA in sized libraries… sequencing primers insert vector DNA sequence in mate-pairs… cartoons sequenced ends ~543 bp unsequenced insert ~ known size = 5’- actgtacgtgtagctgaca… - 3’ 5’- tagcgtagttattttgc… - 3’ = 5’- actgtacgtgtagctgaca actgtacgtgtagctgaca - 3’

  16. …back to humans… Individuals, Libraries, Sequence coverage, Clone coverage, Other? What to know? 543 bp average sequence read 8, September 1999 - 25, June 2000

  17. What does Shredder Do? Why? Whole Genome Assembly 1. Screener 2. Overlapper 3. Unitigger/Discriminator, 4. Scaffolder, 5. Repeat Resolver.

  18. atgacttacttactgcatatttatttatttatttatttatttatttatttatttatttatttatttatttatttatttgacgtgtacgtgtacgtgtagctgtacgtgtacgtgacgggccgcattatcgtgatgctacgtgtacgttatatctgatcgtgcatgtgaatgacttacttactgcatatttatttatttatttatttatttatttatttatttatttatttatttatttatttatttgacgtgtacgtgtacgtgtagctgtacgtgtacgtgacgggccgcattatcgtgatgctacgtgtacgttatatctgatcgtgcatgtga read: atgacttacttactgcatatttatttatttatttatttatttatttatttatttatttatttatttatttatttatttgacgtgtacgtgtacgtgtagctgtacgtgtacgtgacgggccgcattatcgtgatgctacgtgtacgttatatctgatcgtgcatgtga masked: atgacttacttactgcatatttatttatttatttatttatttatttatttatttatttatttatttatttatttatttgacgtgtacgtgtacgtgtagctgtacgtgtacgtgacgggccgcattatcgtgatgctacgtgtacgttatatctgatcgtgcatgtga marked: Screener ...finds and “masks” microsatellite repeats, known repeated regions and ribosomal DNA, • “masked” regions not used to make contigs, • “marks” the rest for overlapping.

  19. <--tactgtacgtagctgtgatgttcctcggatatagcgggcatatttattacgctattgtacgtgt-3’<--tactgtacgtagctgtgatgttcctcggatatagcgggcatatttattacgctattgtacgtgt-3’ 5’- gttcctcggatatagcgggcatatttattacgctattgtacgtgtaaagtatcgt--> > 40 bp, < 6% mismatch Overlapper ...looks for end-to end overlaps of at least 40 bp with no more than 6% differences in match, What’s the significance? ...a one in 1017 event. …given perfect randomness.

  20. Good News ... uniquely assembled contigs (unitigs) are readily identifiable, • all of the assembled sequences match over all of the known sequence, - and - ...are consistent with an 8x sequence coverage.

  21. What does Shredder Do? Why? Whole Genome Assembly 1. Screener 2. Overlapper 3. Unitigger/Discriminator, 4. Scaffolder, 5. Repeat Resolver.

  22. ...contig cluster is consistent with expected size (+8), ...no dissimilar sequences between any members. Unitigs But(t): • ...the Screener doesn’t include all of the “low frequency” level repeats, • ...so, a majority of the Overlapper outputs turned out to be bogus.

  23. What Now? • “over-collapsed” assemblies are identified and broken down into unitigs when possible... • …these “too-large” contig sets are sent to the Unitigger/Discriminator.

  24. ...in a world where real data matches expected data, each locus would have 8X coverage, ...if there are genomic repeats, then sequences would be “over-represented”, on average, 8 more per repeat, per contig. ...over-collapsed. Unitigger...differentiates between a true overlap, and an overlap that includes more than one loci.

  25. ...parses the “over-collapsed” contig by using sequence outside of the overlap region Discriminator

  26. Discriminator ...may yield u-unitigs. Unitigger/Discriminator Output: correctly assembled contigs covering 73.6% of the genome.

  27. Scaffolder ...contigs the contigs, • uses mate-pair information, two or more consistent mate-pair matches yields 1 in 1010 odds of being chance.

  28. confirm matches Repeat Resolver...most of the remaining gaps were due to repeats. • “Rocks” • Use “low Discriminator Value” contig sets to fill gaps, • - find two or more mate pairs with unambiguous matches in the scaffold near the gap (2 kb, 10kb or 50 kb), (1 in 107), • “Stones” • - find mate pair matches 2 kb, 10 kb, and 50 kb from gap, place the mate in the gap, check to see if it’s consistent with other “placed” sequences.

  29. Repeat Resolver...most of the remaining gaps were due to repeats. • “Rocks” • Use “low Discriminator Value” contig sets to fill gaps, • - find two or more mate pairs with unambiguous matches in the scaffold near the gap (2 kb, 10kb or 50 kb), (1 in 107), • “Stones” • - find mate pair matches 2 kb, 10 kb, and 50 kb from gap, place the mate in the gap, check to see if it’s consistent with other “placed” sequences.

  30. ...make sequencing primer from BES... If that Doesn’t Work ...find a mate-pair that spans the gap, and sequence it, Chromosome Walking

  31. Today/Friday • Questions about WGA, • CSA, • Comparisons, • Quality Control, etc.

More Related