1 / 28

ASHG Redux 2008

Session -- Using DNA sequence to detect variation related to disease Richard Wilson – WashU – deep sequencing of cancer tumors (AML) identified variations in 8 genes Richard Gibbs – Baylor College of Medicine – "Complete Genomics" – genome for < $5,000

sancha
Download Presentation

ASHG Redux 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Session -- Using DNA sequence to detect variation related to disease • Richard Wilson – WashU – deep sequencing of cancer tumors (AML) identified variations in 8 genes • Richard Gibbs – Baylor College of Medicine – "Complete Genomics" – genome for < $5,000 • Accurate sequencing by hybridization for DNA diagnostics and individual genomics, Drmanac, et al., Nature Biotechnology ASHG Redux 2008

  2. Session -- Using DNA sequence to detect variation related to disease • Micahel Stratton – Wellcome Trust Cancer Institute – genomic sequencing of breast cancer cell lines • Copy number variations ("structural variants") • "genomic shards" – 305 rearrangements in breast cancer cell line • Difficult to assemble with short reads technology ASHG Redux

  3. Session – Genomics I • Sharp – whole genome screen for novel imprinting genes • Bisulphite treatment – convert all un-methylated C's to U (uracil) -- then sequence and all methylated C's sites are ID'ed • Drawback – harsh, fragments DNA • High density HapMap of Humans, Dogs, and Cattle • Genotypes 900 dogs /w Affy 2.0 array at 61,344 SNPs • Dogs have very uniform phylogenetic tree with bread specific recombination rates ASHG Redux

  4. Session – Genomics I • Biesecker – ClinSeq – effort to map phenotypic features to genotypes for atherosclerosis • 1000 subjects Rare Mendelian Variants Common Mendelian Variants Clinical data Penetrance Desired Data Unknown Territory Subjects Common SNPs 0.5 SNP Freq Genome ASHG Redux

  5. Session – Genomics II • BGI (Beijing Genomics Institute) • First Asian genome sequenced • 100 bioinformaticians (-> 300) • 18 Solexas • 5 454's • 4 Solids (?) • Altshuler (1000 Genomes Project) – effort to sequence 1000 genomes to catalogue variations in genome • www.1000genomes.org • Duplicated amount of sequence in GenBank in Sept. • Again in October • Data release – Jan 2009 ASHG Redux

  6. Reference: "Discovering Genomics, Proteomics, and Bioinformatics." Second Edition 2/e. Campbell and Heyer. 2007. ISB: 0-8053-8219-4. Chapter 2: Genome Sequence

  7. reduction -- for a very long time molecular methods where primarily tools to dissect cells and understand how parts work in isolation • expansion -- genomics, in theory, enables science to begin piecing together how parts work together as a system (systems biology?) genomics

  8. What is Genomics? • How to sequence a genome? • Annotating (annotation) • Protein function • Gene Ontology Overview

  9. "involves large data sets" • human genome -- 3 billion nucleotides • hundreds of genomes have been finished • "high-throughput methods" • sequencing • measuring the expression of all genes • genotyping (1,000,000 SNPs on 1 chip) • other -omes • proteome, transcriptome, metabolome, variome?, exome • http://cancergenome.nih.gov/media/process_textonly.asp Genomics

  10. preliminary sequencing • finishing (not always performed -- coverage) • annotating • The "dideoxy method" • Need (for DNA replication): • DNA, DNA polymerase, primers, deoxyribonucleotide triphosphates (dNTPs) (G,T,A,C)'s (one with radioactive atoms), dideoxyribonucleotide triphosphates (ddNTPs) How do we sequence a genome?

  11. Next-generation sequencing technology • Cost per nucleotide down by factor of 100-1000 • Cost per run is still very high • Expen$ive for validation on an individual basis • Dideoxy method is very mature, very well understood Dideoxy Method Obsolete?

  12. Under normal DNA polymerization, dNTPs are added to the end of the elongating strand of DNA. • If an ddNTP is incorporated, the elongation terminates -- also carries "label" -- radioactive isotope or fluorescent dye • This is performed in 4 different containers (test tubes), with each test tube having ddATP, ddGTP, ddCTP, and ddGTP. • Therefore, each tube terminates with the same ddNTP • Run these out on a gel, and smallest migrate fastest. • Expose to x-ray film (or scan with laser), read gel dideoxy method

  13. Figure 2.1

  14. Figure 2.2

  15. Note -- this is pretty awful work • The gel material is toxic • Working with radioactive molecules • Slow and tedious • reading bands on glass • capturing/entering data • 500 bases took 24 hours (16,438 years to do the human genome with this method) Comment

  16. Leroy Hood -- developed nonradioactive dideoxy method • ddNTP's are "labeled" with a different fluorescent dye • 1 lane could be used instead of 4 (why?) • A laser fluoresces the dye, the band can be "read", indicating which ddNTP terminated the sequence • The intensities of these bands are now captured and graphed -- in what is called a chromatogram • Lane in a gel is replaced with a capillary • Can run 96, or 384 capillaries at a time (Applied Biosystems) • A run is approximately 1 hour • 500 bases * 384 cap ==> 651 years Automated sequencing

  17. Box 2.1 Table

  18. Big 7 • human, mouse, yeast, E. coli, fly, worm, arabidopsis • medical applications • Pseudomonas aeruginosa (CF infection), mosquito, trypanosomes, HIV • evolutionary significance • microbes, archaea, chimp, gorilla, fugu fish • environmental impact • microbes • food production • wheat, rice, bovine, pig, yeast Choosing genomes

  19. Figure 2.3

  20. Figure 2.3 (detail)

  21. Automated sequencing almost requires automated base-calling • PHRED • reads chromatograms • quality assessment (for re-sequencing) • peak height and spacing • assemble multiple reads (PHRAP) into a "contig" • What about mutations, variations, SNPs? • Gaps • requires human intervention -- techniques to try and span specific DNA regions • ex) chromosome walking Automated Reads

  22. 2001 draft sequence published • 147,821 gaps • pressure to publish a sequence because of Celera and Craig Venter • 2004 • 341 gaps • Usually repeats (but may be epigenetic) • Very expensive to completely finish • many genomes never "finished" Gaps

  23. Figure 2.4

  24. Figure MM2.1 Show BL2SEQ example

  25. "functionally" important sections of a genome • exons, introns, promoters, enhancers, splice sites, UTR's, • pseudogenes, SNPs, markers, repeats, Alus, gene duplications, gene families, micro-RNAs, methylation, phosphorylation, tissue specific alternative splicing, copy number variations, (CNVs, also called "structural variations") differential expression, gene function, ???? Annotation

  26. Gene prediction (ORF finding) • was a hot topic • cooled when it became clear that EST sequencing was far superior • EST sequencing in human (and some model organisms -- rat, mouse, others) was very extensive -- millions of sequencing reads • The most effective approach to gene finding was the overlaying of EST sequences to genomic sequence (but note you need both). • Gene prediction was 40-60% at best • Gene prediction has made a bit of resurgence because of the cost savings of "in silico" gene finding Gene Identification

  27. text -- mammalian genome contains approximately 225 BP per KB of pseudogenes • What are pseudogenes? Pseudogenes

More Related