1 / 62

Paralogs

Paralogs. Inbal Yanover Reading Group in Computational Molecular Biology. Homologs. Orthologs : Homologous sequences are orthologous if they were separated by a speciation event Paralogs : paralogous if they were separated by a gene duplication event. Genomic duplication. Can involve :

tamera
Download Presentation

Paralogs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

  2. Homologs • Orthologs: Homologous sequences are orthologous if they were separated by a speciation event • Paralogs: paralogous if they were separated by a gene duplication event

  3. Genomic duplication Can involve: Individual genes • Genomic segments • Whole genome duplication (WGD) Gene duplication has a major role in evolution.

  4. Whole genome duplication • Large scale adaptation • Polyploidy  instability • Back to stability: • gene loss • mutation • genomic rearrangements

  5. Fate of duplicated genes Find specialized ‘niche’: • Localization • Temporal expression • Expression level Another classification: • Sub – functionalization • Neo – functionalization (lowest probability) • Non – functionalization (70%)

  6. First article Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae Kellis M, Birren BW, Lander ES. Nature. Apr 2004.

  7. Main ideas • S. Cerevisiae genome arose from ancient whole-genome duplication of K. waltii • Analyzing post duplication divergence of paralogs

  8. Expected signature for genome duplication: • After duplication, usually, one paralog would be lost (random local deletions) • Both copies will be retained only if they acquire distinct functions • Eventually: a few paralog genes in the same order and same orientation • Those regions should be short since chromosomal rearrangements will disrupt gene order over time

  9. Model for WGD followed by massive gene loss Common ancestor

  10. Proving existence of an ancient WGD • Look for a species (Y) in the lineage of S.cerevisiae (S). • Y and S should have 1:2 mapping and: • Nearly every region in Y would correspond to 2 regions in S (‘sister region’). • Each sister region in S would contain an ordered subsequence of the genes in Y. • Each sister region in S would contain ~half of Y genes. • Together, the two sister region account for nearly all Y’s genes. • Every region of S would correspond to one region in Y.

  11. Y = K. Waltii • Sequencing and assembling into 8 complete chromosomes (16 in S. cerevisiae). • 5,230 likely protein-coding genes (5,714 genes in S. cerevisiae). • 7% of it’s genes shows no protein similarity to S. Cerevisiae • Identifying orthologs regions: • Matching genes (based on protein similarity) • Regions with numerous matching genes in the same order. • Most local regions in K. waltii mapped to two regions in S. cerevisiae. • Each of those regions matched subset of K. waltii genes.

  12. Quantify observations DCS – Doubly Conserved Synteny: maximal regions in K. Waltii that map across their entire length to two distinct regions in S. cerevisiae.

  13. Gene and region correspondence

  14. Results • 253 DCS blocks containing most of both genomes. (75% of K. waltii genes and 81% of S. cerevisiae genes) • DCS blocks tile 85% of each K. waltii chromosome -> as expected in WGD • Typical DCS block: • 27 genes. • Separated by small segments (~3 genes), that match one conserved region in S. cerevisiae.

  15. Duplicate mapping of centromers Note: no paralogs here !

  16. Duplicated blocks in S. cerevisiae • Using the DCS blocks: define 253 sister regions in S. cerevisiae. • Many of those could not be recognized without K. waltii mediation.

  17. Duplicated blocks in S. cerevisiae

  18. Zooming in on one sister region

  19. Conclusion WGD event occurred in the Saccharomyces lineage after the divergence from K. waltii.

  20. Pattern of gene loss • Number of chromosomes was doubled. • Despite WGD, current S. cerevisiae genome: • 13% larger than K. waltii genome. • 10% more genes. • Gene loss: • large segmental deletions <-> individual gene deletions. • Balanced between two paralogs <-> act primarily on one of them. • Analysis of DCS blocks show: • average size of lost segment: 2 genes. • average balance: 43%-57%.

  21. Two models – what happens after duplication event • One copy preserves original function while the other one is free to diverge (Ohno) • Both copies would diverge more rapidly and acquire new functions

  22. Evolutionary analysis Study the evolution of the 457 gene pairs that arose by WGD: • Use synteny to distinguish them from pairs which arose by local duplication events. • Compute divergence rates for them, using sequences of K. waltii, S. cerevisiae and S. bayanus. (both amino acid and nucleotides).

  23. Results • 17% of gene pairs (76 of 457) showed acceleratedprotein evolution relative to K. waltii. • In 95% of them, accelerated evolution was confined to only one paralog • Supports Ohno’s model: one paralog retains ancestral function, the other one gains a derived function

  24. Ancestral <-> derived paralogs • 115 gene pairs consisting of one paralog which has evolved >50% faster than the other. • Often, derived paralogs are specialized in: • Cellular localization (Acc1 - Hfa1) • Temporal expression (Skt5 – Shc1)

  25. Ancestral <-> derived paralogs, cont. • Functional distinction confirmed with knockout experiments (in rich medium) of all 115 genes: • Deletion of ancestral paralog was lethal in 18%. • Deletion of derived paralog was never lethal. • Explanation: • Derived paralog is not essential under this conditions. • Ancestral paralogue compensate. (but not vice versa)

  26. more results • 60 of the 457 pairs (13%) showed decelerated proteinevolution. • Including highly constrained proteins: • ribosomal proteins (25) • Histone proteins (2) • Translation factors (4) • In 90% of them both paralogs were very similar (98% amino acid identity versus 55% for all pairs)

  27. However… • ~70% of the gene pairs had neither accelerated protein evolution nor decelerated evolution (321/457) • Possible explanations: • Too strict criteria • Divergence in regulatory regions will not be seen here. • Sometimes it’s nice to have two copies.

  28. summary • S. cerevisiae arose from an ancient WGD. • Massive loss of ~90% of duplicated genes in small deletions. • Preserving at least one copy of each ancestral gene. • divergence of paralogs: • Accelerated evolution (17%) • Derived genes tend to be specialized in function, expression level and localization. • Derived genes tend to lose essential aspects of their ancestral function.

  29. Second article Transcription control reprogramming in genetic backup circuits. Kafri R, Bar-Even A, Pilpel Y. Nat Genet. Mar 2005.

  30. Introduction • Severe mutationsoften don’t result in abnormal phenotype • Partially ascribed to redundant paralogs, that provide backup to each other in case of mutation • Suggested mechanism: transcriptional reprogramming

  31. Definitions • Working on S. cerevisiae. • Paralog pairs defined by BLASTing their DNA sequences. • Dispensable genes = non essential.

  32. Expression parameters • For each pair of paralog: • Calculate 40 correlation coefficients of mRNA expression. • Define: mean expression similarity <= mean. • Define: partial co regulation (PCoR) <= standard deviation.

  33. Summary of observations + : backup enabled

  34. In close paralogs: • Backup increases with co-expression. Close paralogs • Backup increases with co-expression. • Similar sequences: • Similar expression • Enable backup

  35. Remote paralogs • Backup is optimal in non-co expressed pairs. • co-expression (little backup): • interaction • sub-functionalization

  36. Suggestion for backup mechanism • A, B - genes which are expressed differently. • Upon mutation in A: expression of gene B is reprogrammed. • Result: wild type expression profile of A.

  37. Experimental verifier: reprogramming in Acs1/Acs2 Wild-type D Acs2 Glucose Glucose Acs2 Acs1 Acs1 Acs1 Acs2

  38. What is the mechanism enabling this change? • Suggestion: backup occurs among paralogs with partially co regulation. • Enable switching from different expression profile to similar one. • Observation: PCoR predicts backup.

  39. Partial motif content overlap is optimal for backup 1 |m1 ∩ m2| Backup measure O= 0.8 Proportion of dispensable genes |m1 U m2| 0.6 0 0.2 0.4 0.6 Motif content overlap (O)

  40. suggestion • Unique motifs -> different expression level. • Shared motifs -> enable responding to the same conditions. Hypothesis: PCoR underlies reprogramming and backup.

  41. In high PCoR paralogs one gene is upregulated when other is deleted 10 9 8 7 6 Fold change (Hughes et al. Cell 2000) 5 4 3 2 1 0 <0.35 0.35 – 0.45 >0.45 Partial co-regulation (predicted backup capacity)

  42. M1 E1 G1 T E2 G2 M2 What controls reprogramming? • Kinetic model: G1, G2 – paralog genes. E1, E2 – their products. T – TF which is generated by M1 and has binding site in both genes.

  43. Conclusions • In remote paralogs: Genes which express differently but has partial common regulation tends to backup each other. • In close paralogs: Backup increases with co-expression.

  44. Third article Gene regulatory network growth by duplication Teichmann SA, Babu MM. Nat Genet. May, 2004

  45. Main questions • What is the role of gene duplication in regulatory network evolution? • Determine the extent to which duplicated genes inherit interactions from their ancestors. • Describe possible mechanisms which leads to the formation of a new interaction.

  46. Basic unit of gene regulation • Transcription factor • DNA binding site • Target gene (or transcription unit) Complex network: • 1 gene is regulated by few transcription factors. • 1 transcription factor controls more than one gene. Transcription factor Target gene

  47. Gene regulatory network in E. coli Gene regulatory network in Yeast 795 proteins (121 TFs + 674 TGs) 1423 interactions 477 proteins (109 TFs + 368 TGs) 901 interactions Guelzim et.al. Nat. Gen. (2002) Shen-orr et.al. Nat. Gen. (2002) and RegulonDB Research subjects E. Coli and yeast known regulatory networks: > 100 transcription factors regulate several hundreds genes.

  48. Duplication (reminder) • duplication event: • Inherit regulatory interaction • Lose regulatory interaction • Also, a new interaction may arise.

  49. Homology detecting • structural protein homology • Detects more distant relationships than sequence • > 65% of the genes are the result of gene duplication • Same domain architecture -> common ancestor.

  50. Duplication of transcription factor Loss and gain Inheritance Transcription factor Duplication of TF Target gene

More Related