350 likes | 466 Views
ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG. 010101100010010100001010101010011011100110001100101000100101. Welcome to CS374! A survey of computer science in genomics today. CS374 – Course Goals. Survey of current research in computational genomics
E N D
ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG 010101100010010100001010101010011011100110001100101000100101 Welcome to CS374! A survey of computer science in genomics today
CS374 – Course Goals • Survey of current research in computational genomics • Practice giving a stellar presentation • Practice reading literature
CS374 – Course Requirements • Presentation • Critique of one topic • Summaries of two topics • Class attendance
ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG 010101100010010100001010101010011011100110001100101000100101 Introduction: DNA sequencing
DNA – what is a genome? A G C G A C U G messenger-RNA DNA, ~3x109 long in humans Contains ~ 22,000 genes RNA folding transcription translation folding
Human Genome Project 1990: Start “most important scientific discovery in the 20th century” 2000: Bill Clinton: 2001: Draft 3 billion basepairs 2003: Finished $3 billion now what?
There is never “enough” sequencing Somatic mutations (e.g., HIV, cancer) 100 million species Sequencing is a functional assay 7 billion individuals
Sequencing Growth Cost of one human genome • 2004: $30,000,000 • 2008: $100,000 • 2010: $10,000 • 2011: $4,000 (today) • 2012-13: $1,000 • ???: $300 How much would you pay for a smartphone?
DNA Sequencing – Gel Electrophoresis • “Ancient” method, used for the human genome • Start at primer (restriction site) • Grow DNA chain • Include dideoxynucleoside (modified a, c, g, t) • Stops reaction at all possible points • Separate products with length, using gel electrophoresis
Uses of Genomes • Medicine • Mendelian diseases • Cancer • Drug dosage (eg. Warfarin) • Disease risk • Diagnosis of infections • … • Ancestry • Genealogy • Nutrition? • Psychology? • Baby Engineering???...
Ethical Issues GINA: • Genetic information cannot be used by insurance & employers • Covers relatives up to 4th degree • Excludes life & disability insurance • Overdiagnosis • Bad news you’d rather not find out • Paternity testing • Genetic engineering of babies? • …
How soon will we all be sequenced? • Cost • Killer apps • Roadblocks? Applications Cost Time 2013? 2018?
ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics
Human population migrations • Out of Africa, Replacement • Single mother of all humans (Eve) ~150,000yr • Single father of all humans (Adam) ~70,000yr • Humans out of Africa ~50000 years ago replaced others (e.g., Neandertals) • Multiregional Evolution • Generally debunked, however, • ~5% of human genome in Europeans, Asians is Neanderthal, Denisova
Coalescence Y-chromosome coalescence
Why humans are so similar A small population that interbred reduced the genetic variation Out of Africa ~ 50,000 years ago Out of Africa
Migration of Humans http://info.med.yale.edu/genetics/kkidd/point.html
Migration of Humans http://info.med.yale.edu/genetics/kkidd/point.html
Some Key Definitions Mary: AGCCCGTACG John: AGCCCGTACG Josh: AGCCCGTACG Kate: AGCCCGTACG Pete: AGCCCGTACG Anne: AGCCCGTACG Mimi: AGCCCGTACG Mike: AGCCCTTACG Olga: AGCCCTTACG Tony: AGCCCTTACG G/G G/G G/T G/G G/G G/G G/G T/T T/G T/G Mom Dad Recombinations: At least 1/chromosome On average ~1/100 Mb • Heterozygosity: • Prob[2 alleles picked at random with replacement are different] • 2*.75*.25 = .375 • H = 4Nu/(1+4Nu) Alleles: G, T Major Allele: G Minor Allele: T Linkage Disequilibrium: The degree of correlation between two SNP locations
Human Genome Variation TGCTGAGA TGCCGAGA TGCTCGGAGA TGC - - - GAGA SNP Novel Sequence Mobile Element or Pseudogene Insertion Inversion Translocation Tandem Duplication TGC - - AGA TGCCGAGA Microdeletion Transposition TGC Novel Sequence at Breakpoint Large Deletion
The Fall in Heterozygosity H – HPOP FST= ------------- H
The HapMap Project ASW African ancestry in Southwest USA 90 CEU Northern and Western Europeans (Utah) 180 CHB Han Chinese in Beijing, China 90 CHD Chinese in Metropolitan Denver 100 GIH Gujarati Indians in Houston, Texas 100 JPT Japanese in Tokyo, Japan 91 LWK Luhyain Webuye, Kenya 100 MXL Mexican ancestry in Los Angeles 90 MKK Maasaiin Kinyawa, Kenya 180 TSI Toscaniin Italia 100 YRI Yoruba in Ibadan, Nigeria 100 Genotyping: Probe a limited number (~1M) of known highly variable positions of the human genome
Linkage Disequilibrium & Haplotype Blocks Minor allele: A G pA pG Linkage Disequilibrium (LD): D = P(A and G) - pApG
Population Sequencing – 1000 Genomes Project The 1000 Genomes Project Consortium et al.Nature467, 1061-1173 (2010) doi:10.1038/nature09534
Association Studies Control Disease
Fixation, Positive & Negative Selection How can we detect negative selection? How can we detect positive selection? Negative Selection Neutral Drift Positive Selection
Conservation and Human SNPs Neutral CNS CNSs have fewer SNPs SNPs have shifted allele frequency spectra
How can we detect positive selection? Ka/Ks ratio: Ratio of nonsynonymous to synonymous substitutions Very old, persistent, strong positive selection for a protein that keeps adapting Examples: immune response, spermatogenesis
Long Haplotypes –iHS test • Less time: • Fewer mutations • Fewer recombinations