1 / 35

010101100010010100001010101010011011100110001100101000100101

ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG. 010101100010010100001010101010011011100110001100101000100101. Welcome to CS374! A survey of computer science in genomics today. CS374 – Course Goals. Survey of current research in computational genomics

sasson
Download Presentation

010101100010010100001010101010011011100110001100101000100101

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG 010101100010010100001010101010011011100110001100101000100101 Welcome to CS374! A survey of computer science in genomics today

  2. CS374 – Course Goals • Survey of current research in computational genomics • Practice giving a stellar presentation • Practice reading literature

  3. CS374 – Course Requirements • Presentation • Critique of one topic • Summaries of two topics • Class attendance

  4. ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG 010101100010010100001010101010011011100110001100101000100101 Introduction: DNA sequencing

  5. DNA – what is a genome? A G C G A C U G messenger-RNA DNA, ~3x109 long in humans Contains ~ 22,000 genes RNA folding transcription translation folding

  6. Human Genome Project 1990: Start “most important scientific discovery in the 20th century” 2000: Bill Clinton: 2001: Draft 3 billion basepairs 2003: Finished $3 billion now what?

  7. There is never “enough” sequencing Somatic mutations (e.g., HIV, cancer) 100 million species Sequencing is a functional assay 7 billion individuals

  8. Sequencing Growth Cost of one human genome • 2004: $30,000,000 • 2008: $100,000 • 2010: $10,000 • 2011: $4,000 (today) • 2012-13: $1,000 • ???: $300 How much would you pay for a smartphone?

  9. DNA Sequencing – Gel Electrophoresis • “Ancient” method, used for the human genome • Start at primer (restriction site) • Grow DNA chain • Include dideoxynucleoside (modified a, c, g, t) • Stops reaction at all possible points • Separate products with length, using gel electrophoresis

  10. DNA Sequencing - Illumina

  11. Uses of Genomes • Medicine • Mendelian diseases • Cancer • Drug dosage (eg. Warfarin) • Disease risk • Diagnosis of infections • … • Ancestry • Genealogy • Nutrition? • Psychology? • Baby Engineering???...

  12. Ethical Issues GINA: • Genetic information cannot be used by insurance & employers • Covers relatives up to 4th degree • Excludes life & disability insurance • Overdiagnosis • Bad news you’d rather not find out • Paternity testing • Genetic engineering of babies? • …

  13. How soon will we all be sequenced? • Cost • Killer apps • Roadblocks? Applications Cost Time 2013? 2018?

  14. ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics

  15. The Hominid Lineage

  16. Human population migrations • Out of Africa, Replacement • Single mother of all humans (Eve) ~150,000yr • Single father of all humans (Adam) ~70,000yr • Humans out of Africa ~50000 years ago replaced others (e.g., Neandertals) • Multiregional Evolution • Generally debunked, however, • ~5% of human genome in Europeans, Asians is Neanderthal, Denisova

  17. Coalescence Y-chromosome coalescence

  18. Why humans are so similar A small population that interbred reduced the genetic variation Out of Africa ~ 50,000 years ago Out of Africa

  19. Migration of Humans

  20. Migration of Humans http://info.med.yale.edu/genetics/kkidd/point.html

  21. Migration of Humans http://info.med.yale.edu/genetics/kkidd/point.html

  22. Some Key Definitions Mary: AGCCCGTACG John: AGCCCGTACG Josh: AGCCCGTACG Kate: AGCCCGTACG Pete: AGCCCGTACG Anne: AGCCCGTACG Mimi: AGCCCGTACG Mike: AGCCCTTACG Olga: AGCCCTTACG Tony: AGCCCTTACG G/G G/G G/T G/G G/G G/G G/G T/T T/G T/G Mom Dad Recombinations: At least 1/chromosome On average ~1/100 Mb • Heterozygosity: • Prob[2 alleles picked at random with replacement are different] • 2*.75*.25 = .375 • H = 4Nu/(1+4Nu) Alleles: G, T Major Allele: G Minor Allele: T Linkage Disequilibrium: The degree of correlation between two SNP locations

  23. Human Genome Variation TGCTGAGA TGCCGAGA TGCTCGGAGA TGC - - - GAGA SNP Novel Sequence Mobile Element or Pseudogene Insertion Inversion Translocation Tandem Duplication TGC - - AGA TGCCGAGA Microdeletion Transposition TGC Novel Sequence at Breakpoint Large Deletion

  24. The Fall in Heterozygosity H – HPOP FST= ------------- H

  25. The HapMap Project ASW African ancestry in Southwest USA 90 CEU Northern and Western Europeans (Utah) 180 CHB Han Chinese in Beijing, China 90 CHD Chinese in Metropolitan Denver 100 GIH Gujarati Indians in Houston, Texas 100 JPT Japanese in Tokyo, Japan 91 LWK Luhyain Webuye, Kenya 100 MXL Mexican ancestry in Los Angeles 90 MKK Maasaiin Kinyawa, Kenya 180 TSI Toscaniin Italia 100 YRI Yoruba in Ibadan, Nigeria 100 Genotyping: Probe a limited number (~1M) of known highly variable positions of the human genome

  26. Linkage Disequilibrium & Haplotype Blocks Minor allele: A G pA pG Linkage Disequilibrium (LD): D = P(A and G) - pApG

  27. Population Sequencing – 1000 Genomes Project The 1000 Genomes Project Consortium et al.Nature467, 1061-1173 (2010) doi:10.1038/nature09534

  28. The Cancer Genomes Atlas – TCGA

  29. Association Studies Control Disease

  30. Global Ancestry Inference

  31. Fixation, Positive & Negative Selection How can we detect negative selection? How can we detect positive selection? Negative Selection Neutral Drift Positive Selection

  32. Conservation and Human SNPs Neutral CNS CNSs have fewer SNPs SNPs have shifted allele frequency spectra

  33. How can we detect positive selection? Ka/Ks ratio: Ratio of nonsynonymous to synonymous substitutions Very old, persistent, strong positive selection for a protein that keeps adapting Examples: immune response, spermatogenesis

  34. How can we detect positive selection?

  35. Long Haplotypes –iHS test • Less time: • Fewer mutations • Fewer recombinations

More Related