1 / 71

Exploring Human Population Genomics: Insights on Evolutionary Forces and Chromosomal Duplications

Delve into the intricate world of human population genomics, covering topics such as population bottlenecks, allele dynamics, Y chromosome evolution, and segmental duplications. Discover the forces shaping genetic variation and methods for mapping genomic DNA, alongside insights into chromosomal disorders and cancer development associated with duplications.

edwardj
Download Presentation

Exploring Human Population Genomics: Insights on Evolutionary Forces and Chromosomal Duplications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Human Population Genomics Man,Woman, Birth,  Death,  Infinity,Plus  Altruism,  Cheap Talks,  Bad Behavior,¥ Money, God and  Diversity on Steroids

  2. Jack Schwartz (1930 – 2009)

  3. Lord Jeffrey (misattributed; badly paraphrased) “Damn the Human Genomes. Small populations; Genes too distant; Pestered with duplications; Feeble contrivance; Could make a better one myself!”

  4. Small Populations Non-equilibrium Models Population Bottlenecks Not Well-mixed Migration/Colonization Patterns Catastrophic Infections Heterozygous Advantages

  5. Ancestral allele Derived allele Wright-Fisher Process Derived allele extinction! mutation N individuals generation Mickey (Coalescent talk)

  6. Moran Process death time • Overlapping generations • Distribution of time to replication

  7. Forces in Population Genetics • How to understand forces that produce and maintain inherited genetic variation • Forces • Mutation • Recombination • Natural Selection • Population Structure/Migration • Random birth/death (drift)

  8. Genes Too Distant 20,000 Genes (Estimate in 80’s 120,000) Occurring about every 150 Kb Many more functional ncRNA snoRNA, siRNA, piRNA, etc. Uncharacterized

  9. Y • “From a gene’s point of view, reshuffling is a great restorative… • “The Y, in its solitary state disapproves of such laxity. Apart from small parts near each tip which line up with a shared section of the X, it stands aloof from the great DNA swap. Its genes, such as they are, remain in purdah as the generations succeed. As a result, each Y is a genetic republic, insulated from the outside world. Like most closed societies it becomes both selfish and wasteful. Every lineage evolves an identity of its own which, quite often, collapses under the weight of its own inborn weaknesses. • “Celibacy has ruined man’s chromosome.” • Steve Jones, Y: The descent of Men, 2002.

  10. DAZ locus on Y Chromosome

  11. Optical Mapping • Capture and immobilize whole genomes as massive collections of single DNA molecules Cells gently lysed to extract genomic DNA DNA captured in parallel arrays of long single DNA molecules using microfluidic device Genomic DNA, captured as single DNA molecules produced by random breakage of intact chromosomes

  12. Overlapping single molecule maps are aligned to produce a map assembly covering an entire chromosome ⌘⌘⌘

  13. Sizing Error (Bernoulli labeling, absorption cross-section, PSF) Partial Digestion False Optical Sites Orientation Spurious molecules, Optical chimerism, Calibration ⌘⌘⌘⌘ Image of restriction enzyme digested YAC clone: YAC clone 6H3, derived from human chromosome 11, digested with the restriction endonuclease Eag I and Mlu I, stained with a fluorochrome and imaged by fluorescence microscopy.

  14. ⌘⌘⌘⌘⌘ Various combinations of error sources lead to NP-hard Problems

  15. Pestered with duplications Complex Genome Structures Segmental Duplications Many types of Polymorphisms (SNPs, CNVs, SVs, etc.) Models of Genome Dynamics GOD (Genome Organizing Devices) Models of Coalescence

  16. Segmental Duplications • Segmental duplications have been found to be associated with genomic disorders. • Deletions: Williams-Beuren syndrome • Duplications: Charcot-Marie-Tooth disease type 1A • Inversions: Haemophilia A • Translocations: Derivative 22 [der(22)] syndrome. • Segmental duplications may be related to cancer development by causing copy number fluctuations • Duplication of myc in lung cancer, and ERBB2 in breast cancer.

  17. Recent Segmental Duplications Human • 3.5% ~ 5% of the human genome is found to contain • segmental duplications, with length > 5 or 1kb, identity > 90%. • August, 2001 assembly, • [Bailey, et al. 2002]. • April, 2003 assembly, • [Cheung, et al. 2003]. • These duplications are estimated to have emerged about 40Mya under neutral assumption. • The duplications are mostly interspersed (non-tandem), and happen both inter- and intra-chromosomally. From [Bailey, et al. 2002]

  18. Recent Segmental Duplications Mouse • 1.2% of the mouse genome is found to contain segmental duplications, with length > 5kb, identity > 90%. • February, 2003 mouse assembly, • [Cheung, et al. 2003]. • These duplications are estimated to have emerged about 25Mya under neutral assumption. • The duplications happen both inter- and intra-chromosomally. From [Cheung, et al. 2003]

  19. Duplication Flanking Sequences • What are the molecular mechanisms that caused the recent segmental duplications in the human and mouse genomes? • Thermodynamic instability in the DNA sequences; • Recombination between homologous repeat elements; • Other unknown mechanisms.

  20. Thermodynamics Control Data 5’-breakpoint 3’-breakpoint 5’ 3’ -512bp +512bp duplicated region

  21. SINE ** ** * * ** Alu-Jb Alu-Sc~Sx Alu-Y Alu-Ya~Yb MIR FLAM/FRAM Alu-Jo Divergence: 14% 8% 5% >1% 30% 20% 14% LINE ** ** ** ** ** ** ** ** ** L2 L1M4 L1M3 L1M2 L1M1 L1P5 L1P4 L1P3 L1P2 L1P1 L1Hs Divergence: 30% 22% 21% 19% 18% 12% 11% 7% 4% 2% <1% ⌘ Frequencies of the repeats Control set Data set

  22. f - - f - - deletion or mutation insertion f + - f + - Duplication by recombination between other repeats or other mechanisms deletion or mutation insertion f ++ f ++ Duplication by recombination between repeats Mutation accumulation in the duplicated sequences The Model

  23. The Mathematical Model Time after duplication 1-α-2β 1-α-2β 1-α-2β h0-- α α α α f - - h1-- γ 2β γ 2β 2β γ h0+- h0 α α α α H0 f + - 1-α-β/2-γ 1-α-β/2-γ 1-α-β/2-γ 2γ β/2 2γ β/2 2γ β/2 h0++ α α α α H1 f ++ h1 h1++ 1-α-2γ 1-α-2γ 1-α-2γ 0 ≤ d < ε ε ≤ d < 2ε (k-1)ε ≤ d < kε h1: proportion of duplications by repeat recombination; h1++: proportion of duplications by recombination of the specific repeat; h1- -: proportion of duplications by recombination of other repeats; h0: proportion of duplications by other repeat-unrelated mechanism; h0++: proportion of h0 with common specific repeat in the flanking regions; h0+-: proportion of h0 with no common specific repeat in the flanking regions; h0- -: proportion of h0 with no specific repeat in the flanking regions; α: mutation rate in duplicated sequences; β: insertion rate of the specific repeat; γ: mutation rate in the specific repeat; d: divergence level of duplications; ε: divergence interval of duplications.

  24. Model Fitting Alu L1 f - - f - - f + - f + - f ++ f ++ Diversity: Diversity: The model parameters (αAlu, βAlu, γAlu, αL1, βL1, γL1) are estimated from the reported mutation and insertion rates in the literature. The relative strengths of the alternative hypotheses can be estimated by model fitting to the real data. h1++Alu≈0.3; h1++L1≈0.35.

  25. Chr1 Ns ATs Reps MER57A L1P CDs ΔG Dup Copy# Mer Freq Mer Frequencies

  26. Copy Number Variation Data HapMap data China46 people Japan45 people Utah European origin: 90 people Yoruba89 people Made available to us by Drs. Evan Eichler and Andy Sharp

  27. CNVs in Unique regions OR

  28. CNVs in Unique regions

  29. CNVs in SD regions AND

  30. CNV in SD regions Unique and SD regions show completely different behavior of CNVs!

  31. Distance-dependent recombination The chance of recombination depends on the distance between Allele A and its copy

  32. Simulation (probabilistic model)

  33. Observations & Conclusions • Mutation rate of 0.0001 and recombination rate of 0.001 in SD regions constitute the best fit to observed real life data. • Single mutations cannot explain observed data, but can be explained by convergence via recombination. • Evolution-by-Duplication (EBD) appears to play a crucial role in evolution and molds the genetic circuitry in a rather constrained way, before it is subject to selection pressure

  34. Feeble Contrivance GWAS (Genome-Wide Association Studies) Common Variants vs. Rare Variants Haplotype Phasing/Linkage Analysis Poor Experiment Design Reference Sequences Genotypic vs. Haplotypic References Weak Technologies

  35. Common vs. Rare Disease Variants • From Ionita-Laza (2009) • There are two disease models: • CDCV - common disease, common variants • CDRV - common disease, rare variants • The current genome-wide association studies only consider common variants (frequency at least 5%). • Feasible with available resources • The common loci identified so far have small effects (ORs 1:1 -1:5) and only explain a small percentage of the estimated heritability. • Rare susceptibility variants are expected to play an important role: • population genetics theory (Pritchard, 2001) • empirical evidence (BMI, blood pressure, autism, Mendelian diseases etc.)

  36. Effect Size Distribution

  37. Capture-Recapture Model • Suppose we have sequence data on Nind individuals in a genomic region. • An individual shows variation at a position if the corresponding allele is different from the ancestral one. • A position is variable or is a variant if there is at least one individual in the dataset with a variation at that position. • Let xs be the number of individuals with variation at position s: xs > 0. • What is N: the total, unknown number of variants in the region.

  38. One can estimate the following: • Δ(t) = # NEW variants expected to be found in a FUTURE dataset of size t . Nind. • t is a multiplier of initial dataset size, Nind. • Δf(t) = # new variants with frequency at least f . . .

  39. ENCODE dataset • Ten 500Kb genomic regions were sequenced in several unrelated DNA samples: • 8 Yoruba (YRI) • 16 CEPH European (CEPH) • 7 Han Chinese (CHB) • 8 Japanese (JPT) • To make results comparable across the four populations (YRI, CEPH, CHB and JPT), they considered only 7 of the sequenced individuals for each dataset.

  40. ENCODE - Δf(t) • From Ionita-Laza et al. 2009

  41. How to Make a Better Human? Debugging a human better Sequencing a genome Sequencing a population

  42. Single Molecule Approach to Sequencing-by- Hybridization S ★M ★ A ★ S ★ H

  43. S*M*A*S*H • Sequence a human size genome of about 6 Gb—include both haplotypes. • Integrate: • Optical Mapping (Ordered Restriction Maps) • Hybridization (with short nucleobase probes [PNA or LNA oligomers] with dsDNA on a surface, and • Positional Sequencing by Hybridization (efficient polynomial time algorithms to solve “localized versions” of the PSBH problems)

  44. Fig 1 ⌘ • Genomic DNA is carefully extracted

  45. Fig 2 ⌘⌘ • LNA probes of length 6 – 8 nucleotides are hybridized to dsDNA (double-stranded genomic DNA) • The modified DNA is stretched on a 1” x 1” chip.

  46. Fig 3 ⌘⌘⌘ • DNA adheres to the surface along the channels and stretches out. • Size from 0.3 – 3 million base pairs in length. • Bright emitters are attached to the probes and imaged (Fig 3).

  47. Fig 4 ⌘⌘⌘⌘ • A restriction breaks the DNA at specific sites. • The cut fragments of DNA relax like entropic springs, leaving small visible gaps

  48. Fig 5 ⌘⌘⌘⌘⌘ • The DNA is then stained with a fluorogen (Fig 5) and reimaged. • The two images are combined in a composite image • suggesting the locations of a specific short word (e.g., probes) within the context of a pattern of restriction sites.

  49. Fig 6 ⌘⌘⌘⌘⌘⌘ • The integrated intensity measures the length of the DNA fragments. • The bright-emitters on probes provides a profile for locations of the probes. The restriction sites are represented by a tall rectangle & The probe sites by small circles

More Related