1 / 55

Genomics: Looking at Life in New Ways

Genomics: Looking at Life in New Ways. Mark D. Adams Department of Genetics Center for Computational Genomics Center for Human Genetics. Genome publications Feb/2001 ~30,000 genes, 3 million SNPs. Computing the Genome - Assembly. Mask heterochromatin and ribo-DNA,

eliza
Download Presentation

Genomics: Looking at Life in New Ways

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genomics: Looking at Life in New Ways Mark D. Adams Department of Genetics Center for Computational Genomics Center for Human Genetics

  2. Genome publications Feb/2001 • ~30,000 genes, 3 million SNPs

  3. Computing the Genome - Assembly Mask heterochromatin and ribo-DNA, Tag known interspersed repeats. Screener 8:37 Find all overlaps  40bp allowing 6% mismatch. (1000X Blast) Overlapper 86:25 • ASSEMBLER CORE: • Compute all consistent sub-assemblies = unitigs • Identify those that cover unique DNA = U-unitigs • Scaffold U-unitigs with confirmed shorts & longs • Then with BAC ends • Fill repeat gaps with: • I. Doubly anchored mates Unitiger 38:29 Scaffolder 4:12 Repeat Rez I Repeat Rez I, II, III II. O-path confirmed singly-anchored mates III. Greedy path completion using QVs 5:44+4:21+19:53 Consensus Bayesian “SNP” consensus using quality values. Occurs throughout assembler core. (~25)

  4. Computing the Genome - Analysis • Gene Prediction • Repeat Elements • Large-scale structure • Did a genome-wide duplication occur in the evolution of the human genome?

  5. After the genome…. • Define ‘finished’…. • Challenging regions • Centromeres • Annotation of genes (protein-coding and non-protein-coding) • Annotation of non-genic functional elements • More Genomes! • Identification of functionally important regions through analysis of conservation through evolution • ‘Comprehensive’ parts list • High-throughput mentality

  6. Protein Structure Prediction/Comparison F28C12.5 ------------MSQLTAEELDSQKCASEGLT-SVLTSITMKFNFLFITTVILLSYC-FT F28C12.7 ------------MNKTAEDLLDSLKCASKDLS-SALTSVTIKFNCIFISTIVLISYC-FI T06G6.1 ------------MNKTAEELLDSLKCASDGLA-SALTSVTLKFNCAFISTIVLISYC-FS F28C12.2 ------------MNKTAEELD-SRNCASESLT-NALISITMKFNFIFIITVVLISYC-FT F28C12.3 ------------MNKTAEELLDSRKCASEGLT-NALTSFMMKMNFSFIVT---------- F28C12.4 ------------MNKTAEELVESLRCASEGLT-NALTSITVKVSFVFLATVILLSYY-FA T06G6.2 ------------MNKTAEEIVESRRCASEGLT-NALTSITVKMSSVLVVTVILLSYY-FA F28C12.1 --------------MNQTELLESLKCASEGMV-KAMTSTTMKLNFVFIATVIFLSFY-FA T26E3.9 ----------------MNELIDGPKCASEGIV-NAMTSIPVKISFLIIATVIFLSFY-FA F18C5.6 ---------------------MSSECARSDVH-NVLTSDSMKFNHCFIISIIIISFF-TT F18C5.8 ------------------MENLNPACASEDVK-NALTSPIMMLSHGFILMIIVVSFI-TT AH6.7 --------------------MSSQKCASHLEI-ARLESLNFKISQLIYFVLIITTLF-FT AH6.11 --------------------MSAPNCARKYDI-ARLSSLNFQISQYVYLSLISLTFI-FS AH6.8 --------------------MSLTKCASKLEI-DRLISLNFRINQIIVLIPVFITFI-FT AH6.14 --------------------MATIACASIIEQ-QRLRSSNFVIAQYIDLLCIVITFV-TT 1B0B 1O1O

  7. Systems Biology DNA Protein Pathway/Partners Cell Organ/Tissue Organism Measurement Variation/Stimulus

  8. Systems Biology Causality Complexity Coordination Robustness Resilience Systems Theory “The study of organization and behavior per se” (Wolkenhauer, Brief. Bioinform. 2:258, 2001)

  9. Outline • Functional variation in the human genome • Extent of common protein variation • Genes that have evolved faster in human lineage • Mouse models of complex disease • Use of natural variation to infer a model of normal heart function

  10. Aren’t there enough SNPs already? Yes! No! Yes! No! • Depends on disease mapping strategy infer direct Disease causing allele Genetic Marker • Deficiency of missense SNPs Risch 2000. Nature 405:847.

  11. Identifying Common Sites of Variation March, 2001 <6,500 missense SNPs in 3,500 of 10,000 RefSeq genes

  12. Identifying Common Sites of Variation SNP Discovery in: 20 Female Caucasians 19 Female African-Americans 1 Male chimpanzee

  13. Why 39 people?

  14. Re-sequencing Workflow • Primer design • Unique primers are designed around coding exons and human-mouse conserved segments in 1 kbp upstream of transcript • Splice sites should be sequenced most of the time 5’ UTR Conserved Regions with TF binding sites coding exons • Amplification & Sequencing • Re-arrayed primer and DNA plates are mixed to generate PCR and sequencing plate • Both strands are sequenced using the M13 tails on the primers • SNP detection • Polyphred analysis  SNP scoring by expert system  Manual QA • SNP annotation • SNPs mapped to the Celera reference genome and annotated with regards to gene location, mutation type, allele frequency, genotypes…..

  15. Data Source: Human and Chimp 25K genes 23K genes 20K genes

  16. Summary of SNPs found • >18 million lanes run (compare to 36 million for shotgun sequencing human genome) • 23,363 genes assayed from 30,115 in the genome • 265,978 Total SNPs • ~75% are novel • 36,900 missense SNPs • Doubled the number that were previously known

  17. Why are we different from chimpanzees? Proteins are 97-100% identical King and Wilson, Science 188:107-116 (1975) • The differing 1-3% is important • The important differences are in gene regulation • A small number of genes of divergent function with a disproportionate impact

  18. Goal • Identify genes that have shaped a particular species • Identify human genes that may be more likely to be involved in human disease Random drift mouse human chimp 4.6 – 6.2 MY 112 MY

  19. Goal • Identify genes that have shaped a particular species • Identify human genes that may be more likely to be involved in human disease Natural Selection mouse human chimp 4.6 – 6.2 MY 112 MY

  20. Metric • dN – Non-synonymous substitution rate • Nucleotide differences that CHANGE the amino acid sequence in orthologous proteins CGC (Arg)  GGC (Gly) • dS – Synonymous substitution rate • Nucleotide changes that do not change the amino acid CGC (Arg)  CGG (Arg) • dN/dS Ratio • dN = dS indicates neutral change • dN/dS < 1 indicates constraint/negative selection • dN/dS > 1 indicates possible positive selection

  21. Caveats • Low dS causes problems • Divide by ~0 problem • Must match true orthologs • Paralogous genes are subject to differing evolutionary pressures • Annotation and alignment must be correct

  22. List of human genes Human gene Chimp traces Determine mouse ortholog Determine coding sequence Build chimp transcript Build mouse transcript Determine what was “covered” Align to human Align to human QC alignment QC alignment Chimp Gene Passes Mouse Gene Passes Alignment files (2 or 3 species) Analysis

  23. Data Set HUMAN CHIMP 7,645 coding sequence alignments MOUSE ORTHOLOG

  24. Evidence Distribution 7645 MH Orthologs Evidence • Tblastx (+/-) • Syntenic anchor (+/-) • Syntenic block (+) • Shared protein family (+/-/0)

  25. Selected human chromosomes and their mouse orthologs

  26. Nonsynonymous and synonymous divergence: human-chimp

  27. Nonsynonymous and synonymous divergence: human-mouse

  28. Nonsynonymous and synonymous divergence

  29. Correlation between dN and dS

  30. Method • Generate three-species (human-chimp-mouse) coding sequence alignments • Apply models of sequence divergence • Identify genes that violate null hypothesis Gene with accelerated evolution on the human branch Null hypothesis mouse mouse human human chimp chimp 4.6 – 6.2 MY 112 MY

  31. Yang and Nielsen Evolutionary Model • Allows variation in the dN/dS ratio among lineages and among sites at the same time • Tests what is more likely: • all sites are either neutral (dN/dS =1) or evolve under negative selection (dN/dS < 1) • some sites are evolving under positive selection in the human (or chimp) lineage only Adapted from Mol. Biol. Evol. 19:908, 2002

  32. Evolutionary Model

  33. List of top 22 human accelerated genes (model 1)

  34. a-Tectorin and hearing Tectorial membrane Hair cells • Protein plays a vital role in the tectorial membrane of the inner ear • Single amino acid polymorphisms are associated with familial high frequency hearing loss • Knockout mice are deaf

  35. FOXP2 • Molecular evolution of FOXP2, a gene involved in speech and language • Enard, et al. Nature, 418:869, 2002 • “The ability to develop articulate speech relies on capabilities, such as fine control of the larynx and mouth, that are absent in chimpanzees and other great apes” • “FOXP2 seem to be required for acquisition of normal spoken language” • “FOXP2 … has been the target of selection during recent human evolution”

  36. Enrichment of biological processes *= significant in 1 species **=significant in 2 species Model: dN/dS > 1 and >1 nonsyn sub, binomial test

  37. Olfaction: human genes  pseudogenes? Blue = pseudogene Red = gene Pseudogene status from HORDE: http://bioinformatics.weizmann.ac.il/HORDE/

  38. Over-Representation of Certain Families

  39. Correlation between diversity and divergence

  40. Comparative Genomics

  41. Photo: 1997 Purina Mills Calendar

  42. C57BL/6J and A/J mice: Models for study of the metabolic syndrome On a high fat, high sucrose diet: C57BL/6J A/J ü Obesity X ü Hypertension X ü Hyperglycemia X ü Hypertriglyceridemia X ü Low HDL Cholesterol X ü Indicates that the strain develops the condition X Indicates that the strain does not develop the condition

  43. Functional Networks Computational and genomic synthesis of complex systems from assays of components traits • Attributes • applicable to all kinds of biological traits • here - subtle, naturally-occurring, non-pathologic variation • quantitative and qualitative biological properties • monogenic and polygenic traits • additive and epistatic traits • uses results from all kinds of assays • healthy individuals to learn about normal biological functions • abnormal conditions to learn about disease processes

  44. Perturbation tests • Traditional approach • Single gene mutations (endogenous challenge) • Drug treatments (exogenous challenges) • Both establish causal relations • But • How do we interpret networks derived from perturbations that have dramatic effects? • Alternative: Factorial design (after Fisher) • Segregating populations • Reference network based on normal variation • 3. Use to evaluate single gene mutations, • modifier genes and drug perturbations Nadeau, et al. Genome Research 13:2082, 2003

  45. Heart: Proof-of-concept study transducer Echocardiography CW AWRV RV SW Aorta Aorta LV LV LA LA PW Abbreviations AWRV - anterior wall, right ventricle PW - posterior wall CW - chest wallRV - right ventricle LA - left atrium SW - septal wall LV - left ventricle

  46. Echocardiography: Measures and calculations CW AWRV RV Cavity SW SWTh LV Cavity ESD EDD PW PWTh AbbreviationsCalculations EDD - end diastolic dimension FS (fractional shortenting) = (EDD - ESD) / EDD ESD - end systolic dimension LV mass = 1.06 x [(EDD + PWTh + SWTh)3 – (EDD)3] PWTh - posterior wall thickness Th/r = (PWTH + SWTh) / EDD SWTh - septal wall thickness SV (stroke volume) = EDD3 - ESD3 HR = beats per min CO (cardiac output) = SV x HR Time

  47. Summary of cardiovascular traits C57BL/6J A/J LV mass (g) 46.2 +- 14.1 32.7 +- 11.5 * LV EDD (mm) 3.31 +- 0.42 2.83 +- 0.31 * LV ESD (mm) 2.01 +- 0.32 1.49 +- 0.25 * Exercise time (min) 9.6 +- 3.4 4.4 +- 1.9 * LV frac. shortening (%) 39.1 +- 6.2 47.1 +- 6.9 * Vcf(s-1) 8.8 +- 1.9 11.7 +- 2.6 * SW Th (mm) 0.49 +- 0.06 0.47 +- 0.07 PW Th (mm) 0.49 +- 0.05 0.45 +- 0.08 LV mass / BW (mg/g) 1.96 +- 0.38 1.54 +- 0.43 Rel wall thickness 0.30 +- 0.04 0.32 +- 0.04 HR (echo; bpm) 433 +- 55 524 +- 45 HR (tail cuff; bpm) 615 +- 79 694 +- 75 Systolic BP (mm Hg) 122 +- 13 123 +- 20.8 Cardiac output (ml/min) 0.58 +- 0.19 0.50 +- 0.17 These strains were not constructed to differ in CV functions Perturbations Subtle Naturally-occurring Non-pathologic B6: ‘athlete’s heart’, physiologic hypertrophy, exercise endurance Alternative genetic solutions to the same cardiovascular problem

  48. Randomizing genomes in recombinant inbred strains A/J B6 AXB1 AXB2 AXB3 BXA30 Chr 1 Chr 2 Chr 3 Chr 4 Chr 5 Chr 6 Chr7 ’’ Chr X Ht rate: 680 590 691 585 597 666 Exer time: 233 582 540 597 241 255 Key features Probability of coincidental match for 2 strains: 0.50 (50% chance of fixing A or B allele). Probability of coincidental match for 30 strains: (0.50)29 = <2 x10-9 !!! (These results apply a single gene trait; probabilities are lower for polygenic traits)

  49. Methods: building functional networks Strain (randomized genetics) TraitS1 S2 S3 . . Sn T1 # # # # T2 # # # # T3 # # . . Tn # Trait TraitT1 T2 T3 . . Tn T2 r12 -- -- -- T3 r13 r23 T4 r14 r24 r34 . . Tn r1n 2. Estimate cosegregation 1. Type traits Trait TraitT1 T2 T3 . . Tn T2 +r12 T3 -- -- T4 +r14 +r24 -r34 . . Tn -- -- -- . . -- 2a. Cluster analysis Trait 1 Trait 3 Trait 4 Trait 2 2b. Identify significant relations Trait n 3. Identify networks Trait 1 Trait 3 Trait 4 Trait 2 Trait n

  50. Segregation of CV traits in AXB / BXA RI strains • multigenic variation • positive cosegregation r = 0.88 r2 = 0.77 • transgressive variation • (traits exceeding parental values)

More Related