550 likes | 681 Views
Genome 351, May 5 th , 2014. II. Human Genetics: The Individual and Society. Human Genome(s) Human Molecular Evolution Population Genetics Genetic Traits—Complex vs. Simple Mechanisms and Models of Disease Therapy Cancer. Evan Eichler, Foege S413C, eee@gs.washington.edu. Reminders.
E N D
Genome 351, May 5th, 2014 II. Human Genetics: The Individual and Society • Human Genome(s) • Human Molecular Evolution • Population Genetics • Genetic Traits—Complex vs. Simple • Mechanisms and Models of Disease • Therapy • Cancer Evan Eichler, Foege S413C, eee@gs.washington.edu
Reminders • Midterm exam (125 points) • Final exam (125 points) – focused on Eichler lectures, but…. Monday, June 9, 2014, 830-1020, GNOM S060 • Two remaining problem sets (20 total points) – handed out on Fridays, due the following Friday. • 2-3 Discussion section debates (40 points) – you must attend quiz section to earn these points.
Lecture 1: Human Genome Project Objectives • What was it and how was it done? • What did we learn? • How will it be applied? • Future?
6 25 24 23 22.3 22.2 15 22.1 21.3 13 21.2 12 21.1 11.2 11.1 12 11.1 11.2 11.2 12 11.1 13 11 14 12 13 15 21.1 14 21.2 15 21.3 16.1 22.1 16.2 22.2 16.3 22.3 13 21 24 22.1 22.2 25 22.3 26.1 23.1 26.2 23.2 26.3 23.3 24 25.1 25.2 25.3 26 27 Version 1: Chromosome Landscape: 1970s–1990s E • Centromere—site of primary constriction on a chromosome =Heterochromatin: condensed chromatin • Euchromatin—more diffuse chromatin and corresponds to regions where most genes and functional elements reside A C C E E
Version 2: Sequence of the Human Genome. 2001 p CCATCCAGCTTTGTTCCATTGCTCGCAAGGAGCTGCAATCCTTTGGAGGAGAAGCGGCGCTCTGGTTTTT TGAATTTTCAGCTTGTCTGCTCTGGTTTCCCCCCATATTTGTGGTTTTATCTACCTTTGGTCTTTAATGA TGGTGACCTACAGATGTGGTTTTGGTGTGGATGTCCTTTTTGTTGATGCTGTTCCTTTCTGTTTGCTAGT TTTCCTTCTAACAATCAGGACCCTCAGCTGCAGGTCTGTTGGAGTTTGTTGGAGGTCCACTCCAGACCCT GTTTGCCTGAGTGTCACCAGTGGAGGCTGCAGAACAGCAAATATTGCTGCCTGATCCTTCTTCTGGAAGC TTTCCTTCTAACAATCAGGACCCTCAGCTGCAGGTCTGTTGGAGTTTGTTGGAGGTCCACTCCAGACCCT GTTTGCCTGAGTGTCACCAGTGGAGGCTGCAGAACAGCAAATATTGCTGCCTGATCCTTCTTCTGGAAGC TTCATCTCAGAGGGACACCTGGCTGTATGAGGTGTCAGTAAATCCCTACGGGCAGCTCTGTCTATTCTCA GAGTTCAAACTCCATGCTGGAGAATGACTGCTCTCTTCAGAGCTGTCAGACAGGGATGTTTAAGTCTGCA GAAGTTTCTGCTGCCTTTTATTCAGCTATACCCTGCCCCTAGAGGTGGAGTCTACAGAGGCTTCCAGGGC TCCTTGAGCTGCAGTGAGCTCCACCCAGTTCAGGCTTCCCAGCTGCTTTGTTTAACTATTCAAGCCTCAG CAATGGTGGACGCCCCTCCCCCAGCCCAGGCTGCCACCTTGCAGTTCGATCTCGGACTGCTGCACTAGCA GTAAGCAAGGCTGTGTGGGCATGGGACCCGCCAAGCCATGCAAGGGATATAATCTCCTGGTGTGCCGCTT GCTAAGACCATTGGAAAAGCACAGTATTAGGGTGGGAATGTCTGGATTTTCCAGGTGCCGTCTGTCACGG CTTCCCTTGGCTAGGAAAGGGAAATCCCCCGACCACTTGTGCTGCTTCCCAGATGAGGTGACACCCTGCC CTGCTTCGGCTCACCCTCTGTGGGCTGCACCCACTGTCCGACCCGTCTCAGTGTGATGAACTAAGTACCT CAGATGGAAATACAGAAATCACCTGTCTTCTACGTCAATTATGCTGAGAGCTGCAGACAGGAGCTGTTCC TATTCGGCCATCTTGGAAAAATCCTCTCTTTTCATTTATTTAAGAAATATTTGAAAAGCAAAGATTTCAT CATTTTGGTGCAGTCCAATTTATCTGTTTTTCTTTTATGGAACATGTTTTTGATATTATATCTAAGAAAA CTTTTCTTAGTCCAAGGTCATAAATATTTTCTCCTATTTTTTTTCCTAGAAGTTTTACAGTTTTAGCTCA TACAATTAGGTCTATGATCCATTTTAGTTAATTTTCGTATATGACCTAAGGATCTAGGCTTAGTTTTTGT 1 page = 3000 characters 1 chapter = 50 pages 20,000 chapters CCATCCAGCTTTGTTCCATTGCTCGCAAGGAGCTGCAATCCTTTGGAGGAGAAGCGGCGCTCTGGTTTTT TGAATTTTCAGCTTGTCTGCTCTGGTTTCCCCCCATATTTGTGGTTTTATCTACCTTTGGTCTTTAATGA TGGTGACCTACAGATGTGGTTTTGGTGTGGATGTCCTTTTTGTTGATGCTGTTCCTTTCTGTTTGCTAGT TTTCCTTCTAACAATCAGGACCCTCAGCTGCAGGTCTGTTGGAGTTTGTTGGAGGTCCACTCCAGACCCT GTTTGCCTGAGTGTCACCAGTGGAGGCTGCAGAACAGCAAATATTGCTGCCTGATCCTTCTTCTGGAAGC TTCATCTCAGAGGGACACCTGGCTGTATGAGGTGTCAGTAAATCCCTACGGGCAGCTCTGTCTATTCTCA GAGTTCAAACTCCATGCTGGAGAATGACTGCTCTCTTCAGAGCTGTCAGACAGGGATGTTTAAGTCTGCA GAAGTTTCTGCTGCCTTTTATTCAGCTATACCCTGCCCCTAGAGGTGGAGTCTACAGAGGCTTCCAGGGC TCCTTGAGCTGCAGTGAGCTCCACCCAGTTCAGGCTTCCCAGCTGCTTTGTTTAACTATTCAAGCCTCAG CAATGGTGGACGCCCCTCCCCCAGCCCAGGCTGCCACCTTGCAGTTCGATCTCGGACTGCTGCACTAGCA GTAAGCAAGGCTGTGTGGGCATGGGACCCGCCAAGCCATGCAAGGGATATAATCTCCTGGTGTGCCGCTT GCTAAGACCATTGGAAAAGCACAGTATTAGGGTGGGAATGTCTGGATTTTCCAGGTGCCGTCTGTCACGG CTTCCCTTGGCTAGGAAAGGGAAATCCCCCGACCACTTGTGCTGCTTCCCAGATGAGGTGACACCCTGCC CTGCTTCGGCTCACCCTCTGTGGGCTGCACCCACTGTCCGACCCGTCTCAGTGTGATGAACTAAGTACCT CAGATGGAAATACAGAAATCACCTGTCTTCTACGTCAATTATGCTGAGAGCTGCAGACAGGAGCTGTTCC TATTCGGCCATCTTGGAAAAATCCTCTCTTTTCATTTATTTAAGAAATATTTGAAAAGCAAAGATTTCAT CATTTTGGTGCAGTCCAATTTATCTGTTTTTCTTTTATGGAACATGTTTTTGATATTATATCTAAGAAAA CTTTTCTTAGTCCAAGGTCATAAATATTTTCTCCTATTTTTTTTCCTAGAAGTTTTACAGTTTTAGCTCA TACAATTAGGTCTATGATCCATTTTAGTTAATTTTCGTATATGACCTAAGGATCTAGGCTTAGTTTTTG 1,000,000 pages q
Human Genome Project Goal Understand the heredity instructions of humankind by • Identification of the all human genes • To read the entire script of our sequence (all 3 billion bits of information) in their correct order by 2005. Purpose of HGP: To revolutionize medicine AND biology.
Timeline of HGP • 1985-1988 first proposals • 1990 Officially launched • 1998 Private competing project launched by Celera • 2000 Two versions of the human genome completed (public and private) • 2001 Publications • 2004 Publication of “finishing” version
Clone library Map and organize Select clone Public Hierarchical Shotgun Sequencing
Cloning human DNA in bacteria fragment human DNA Human DNA join DNAs cut vector DNA Vector DNA + Bacteria (E. coli)
Cloning human DNA in bacteria With antibiotic RIP clone
DNA sequencing Methods developed in the ’70s by Gilbert and Sanger independently. Both methods start with cloned fragment and can determine the order (sequence) of a few hundred bases. Fred Sanger The challenge becomes putting the pieces together to get long sequences. Wally Gilbert
Sequence and assembly: putting the pieces together AGTAGG 4 GATTCG 3 TTCGAC CGACAG 5 2 ACAGTA 1 AGGATT 6
AGTAGG AGGATT 4 GATTCG 3 TTCGAC CGACAG 5 2 ACAGTA 1
The challenge is made more complex by repeats of the same sequence AGCACG AGTAGG GATTCG TTCGAC CGACAG CGACAG ACAGTA AGGATT ACAGCA
AGCACG AGTAGG AGGATT GATTCG TTCGAC CGACAG CGACAG ACAGTA ACAGCA
The Size of the Human Genome • Human genome contains ~3,000,000,000 bases (letters) • If each base (A, G, C, T) is represented as a one mm wide letter, the human genome would stretch from St. Louis to Los Angeles. • It is about 1,000 times larger than bacterial genomes and 30 times larger than worm and fly genomes. • Sequencing of the genome required about 60 million sequencing experiments.
Magnitude of the Task • 15 months–time in which 90% of the first human genome sequence was generated • 1000 bp/second (24 hours per day, 7 days a week) • ~3000 individuals working in 16 major centers to generate data (249 listed on the publication) • ~48 individuals working for 8 months to analyze data • An international effort—a decade of planning, organization and coordination
Computational Capacity ~In 2000 • 32 processors P3 (450 MHz) • 0.5 Terabyte Storage ~In 2014 • ~2000 processors • >1 Petabyte Storage
1 kb 150kb 10 kb End Scaffold Assembly Private Whole-Genome Shotgun Sequencing
Clone library Map and organize Select clone Public Hierarchical Shotgun Sequencing
Milestone in genetics ushers in new era of discovery, responsibility June 26, 2000 Web posted at: 12:09 p.m. EDT (1609 GMT) Declaring a new era of medical discovery, U.S. President Bill Clinton and British Prime Minister Tony Blair on Monday praised the efforts of an international team of scientists to decode the genetic makeup of humans.
NY Times Article June 26 2000Scientists Complete Rough Draft of Human Genome • What was the hype? • What was different between the two efforts? • Why was there concern? NY Times Article April 27, 2005Celera to Quit Selling Genome Information
Data Release Bermuda Standard: ‘Primary Genomic Sequence Should be in the Public Domain’ ->daily data release BCM- HGSC
I. Human Genome Organization: Where are the genes? base pairs genes base pairs genes
How much encodes proteins? Protein-coding genes account for only about 45,000,000 base pairs (1.5%). Regulatory elements and other functional sequences take up another 90,000,000 (3%). But this still leaves 95% unaccounted for!
Selfish DNA - Transposons We can recognize about 45% of human DNA as so-called selfish DNA. These are short sequences that proliferate in genomes, adding their DNA to the host’s. ~900,000 LINE elements up to 6,000 base pairs long account for >15% of human sequence!! Another group, SINE elements, has more than 1,500,000 copies of about 100-300 bases and takes up >10% of the genome.
A typical human gene GENE LINES SINES RPTS GC 0.5 • ~1.5% of the genome encodes genes • ~45%-50% repeat DNA
Homo sapiens Yeast 5,767 genes Homo sapiens ~22,000 genes >20,000 pseudogenes C. elegans 18,266 genes D. melanogaster ~14,000 II. Gene Density (1 per 100, 000 bp) *Human has fewer genes than anticipated (60-120,000 genes)
III. Cross-species homology *>95% of our genes share homology with other species.
IV. Genome Organization: Complex Pattern of Duplication chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 • ~5% high-identity duplications • >1 kb, >90% • Organized into 350 regions chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY
Impact of the Sequence • Platform for understanding human variation • Details our evolutionary history • Speeds discovery of genes associated with genetic diseases • Almost 2,000 genes by 2005 (~1200 due to HGP) • 2000: Screening at birth for 32 or more such diseases • 2013: 500-2000 genes diagnostically screened • At the research level, allows view of entire genome • Microarray and gene expression analyses • Routine CNV diagnostic for children with intellectual disability
Implications: Ethical, Legal and Social • Pre-implantation testing, e.g., selection of embryos without genetic disease • Disease variation vs. normal, e.g., dwarfism and embryo selection • Genetic testing and elitism—genetic tests are expensive; who can afford? • Genetic information and discrimination—who will have access and how it will be used? • Genetic determinism—concept of genetic “races” • Religion vs. Science—genetic definition of relationship of species
Implications: Biology • An “unbiased” view of the diversity of life • Conservation vs. rapid evolution • Mechanisms of selection and genome evolution Between the years 2000-2005 • Human genome complete* • Mouse, rat, chimp and dog genome drafts • C. elegans, Fugu, Tetraodon • Chicken and Xenopus • Drosophila, mosquito, honeybee • S. cerevisae and 16 yeast genomes • >100 bacterial genomes • Arabidopsis, rice
~3,800 organisms have their genomes “completed” as of March 2010 according to deposition in ENSEMBL nucleotide archive ~1000 bacterial genomes completed as of April 2010 ~10,000 bacterial genomes drafted or near completion as of May 2013 Complex genomes (vertebrates and plants) are no longer difficult to sequence but are difficult to assemble Biology: Genomics Revolution
2007: Next-Generation Sequencing • 2006-2007 fundamental shift in sequencing technology • One of the major bottlenecks for sequencing complex genomes (the cost and difficulty of generating large amount of sequence data) lifted • NGS technology—one of several genome technologies that allows massive parallelization of sequencing reactions producing orders of magnitude more sequence data when compared to capillary-based sequencing—sequence-by-synthesis
Human Genome Speed Reading Genomes can be sequenced 50,000 X faster and 5,000 X cheaper in 2010 than in 2000 SOURCE: M. STRATTON/WELLCOME TRUST SANGER INST.
Personalized Genomes • By 2010: Eleven human genomes (including anonymous and known donors) have now been sequenced with at least 20-fold coverage and published since 2008 • By 2013: another 5000 genomes at 4-fold coverage and another 2000 genomes and deeper coverage • BUT sequencing is NOT at a level of quality commensurate with the first human genome
CEU FIN GBR CHB AJM TSI JPT IBS CDX PUR MXL CHS YRI ASW GWD ACB KHV LWK CLM GHN MAB PEL • 2500 Genomes at 4X Seq Coverage http://www.1000genomes.org/page.php?page=about R. Durbin, 1000 Genomes
Why Sequence More Human Genomes? • ….advances in sequencing technology • Understand genetic variation at an individual, family & population level • Discovery—most of the heritability of human disease is not understood—individual rare variation now thought to be more important • Advancing medical treatment
Health Benefits vs. Life Insurance • Sequenced his genome along with RNAseq over a 14-month period • Discovered rare (but two impactful variants) one related to diabetes and the other for aplastic anemia • At day 300 of his monitoring—suffered a viral infection—led to increase in interferon expression levels and other immune response • Glucose imbalance created but knowledge of genetic predisposition to diabetes—allowed him to act quickly (lifestyle/diet changes) to get diabetes quickly into check • Life insurance costs became prohibitive Mike Snyder Chen et al., Cell. 2012 Mar 16;148(6):1293-307.
GINA: Genetic Non-Discrimination Act of 2008 • The law enables people to take part in research studies without fear that their DNA information might be used against them in health insurance or the workplace. • Protects people from discrimination by health insurers and employers on the basis of DNA information. • The law does not cover life insurance, disability insurance and long-term care insurance.
The Beery Twins • Twins Alexis and Noah originally diagnosed with cerebral palsy, but at age of five diagnosed with dopa-responsive dystonia—causes abnormal movements • At age 13, Alexis developed severe breathing problems—baby monitor, daily shots of adrenaline—not usually associated with DRD Alexis and Noah Beery
A Win for Genome Sequencing! Bainbridge et al., Sci Transl Med 2011 Jun 15;3(87):87re3. “Sequencing has brought her life back” --Retta Berry • Sequencing of genomes found mutation in SPR—sepiapterin reductase—specific type of dopa-responsive dystonia • Sepiapterin reductase important in formation of neurotransmitters—mutations suggested that taking a precursor of serotonin might help • After 1 month of starting the treatment, Alexis’ breathing problem disappeared, Noah’s handwriting improved and he was able to concentrate in school