1 / 26

Comparative genomics in flies and mammals

Comparative genomics in flies and mammals. Manolis Kellis. Broad Institute of MIT and Harvard. MIT Computer Science & Artificial Intelligence Laboratory. Post-duplication. 9 Yeasts. Pre-dup. P. Diploid. P. P. P. 8 Candida. Haploid. P. P. Resolving power in mammals, flies, fungi.

morey
Download Presentation

Comparative genomics in flies and mammals

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparative genomics in flies and mammals Manolis Kellis Broad Institute of MIT and Harvard MIT Computer Science & Artificial Intelligence Laboratory

  2. Post-duplication 9 Yeasts Pre-dup P Diploid P P P 8 Candida Haploid P P Resolving power in mammals, flies, fungi 12 flies 32 mammals ~20 fungi Many species lead to high resolving power in close distances

  3. Comparative genomics and evolutionary signatures • Comparative genomics can reveal functional elements • For example: exons are deeply conserved to mouse, chicken, fish • Many other elements are also strongly conserved: exons / regulatory? • Can we also pinpoint specific functions of each region? Yes! • Patterns of change distinguish different types of functional elements • Specific function  Selective pressures  Patterns of mutation/inse/del • Develop evolutionary signatures characteristic of each function

  4. 1. Evolutionary signature of protein-coding genes • Revise protein-coding gene catalogue

  5. High conservation, but not protein-coding •  Evolutionary signatures highly specific • High protein-coding signal, low conservation •  Evolutionary signatures highly sensitive Annotated FlyBase gene Existing cDNA data New predicted exon cDNA validation (iPCR) Protein-coding evolution vs. nucleotide conservation

  6. 2. Evolutionary signatures of RNA genes • Typical substitutions • Compensatory changes • G:C G:U … G:U  A:U • Prediction methodology • Jakob Pedersen: EvoFold with very stringent parameters

  7. Reveal novel RNA genes and structures • Intronic: enriched in A-to-I editing, also novel ncRNAs • Coding: A-to-I editing, also translational regulation • 3’UTRs: enriched in regulators of mRNA localization • 5’UTRs: translational regulation, ribosomal proteins - 3’ & 5’UTR structures mostly on coding strand (75% & 80%)

  8. 3. Structural and evolutionary signatures of miRNAs • Recognize miRNA hairpin • Length of hairpin & length of arms • Fold stability, symm/assym bulges • Conservation profile: high|low|high • Pinpoint mature miRNA 5’end • Perfect 8mer conservation at start • Predominance of 5’U (78%) • Number of paired bases is bound • Complementary to 3’UTR motifs Discover novel miRNAs Revise existing miRNAs

  9. 4. Evolutionary signatures for regulatory motifs Known engrailed site (footprint) D.mel CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC D.sim CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC D.sec CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC D.yak CAGC--TAGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC D.ere CAGCGGTCGCCAAACTCTCTAATTAGCGACCAAGTC-CAAGTC D.ana CACTAGTTCCTAGGCACTCTAATTAGCAAGTTAGTCTCTAGAG ** * * *********** * **** * ** D.mel D. ere D. ana D. pse. • Motifs discovered • - Recover known regulators • - Many novel motifs • Evidence for novel motifs • Tissue-specific enrichment • Functional enrichment • In promoters & enhancers • Surprises • Core promoter elements • miRNA motifs in coding ex.

  10. Functions of discovered motifs Positional biases Tissue-specific enrichment and clustering miRNA targeting in coding regions

  11. 5. Evolutionary signatures of motif instances • Allow for motif movements • Sequencing/alignment errors • Loss, movement, divergence • Measure branch-length score • Sum evidence along branches • Close species little contribution BLS: 25% BLS: 83% Mef2:YTAWWWWTAR

  12. Motif confidence selects functional instances Transcription factor motifs Confidence Confidence Increasing BLS  Increasing confidence Confidence selects functional regions Confidence selects in vivo bound sites High sensitivity microRNA motifs Confidence selects positive strand Increasing BLS  Increasing confidence Confidence selects functional regions

  13. 6. Initial regulatory network for an animal genome • ChIP-grade quality • Similar functional enrichment • High sens. High spec. • Systems-level • 81% of Transc. Factors • 86% of microRNAs • 8k + 2k targets • 46k connections • Lessons learned • Pre- and post- are correlated (hihi/lolo) • Regulators are heavily targeted, feedback loop

  14. Network captures literature-supported connections

  15. Network captures co-expression supported edges Red = co-expressed Grey = not co-expressed Named = literature-supported Bold = literature-supported

  16. 7. ChIP vs. conservation: similar power / complementary • Together: best  complementary • Bound but not conserved: reduced enrich.  Selects functional • All-ChIP vs. All-cons: similar enr.  Similar power • Cons-only vs. ChIP-all: similar  Additional sites

  17. Recovery of regulatory motif instances in mammals • Performance increases with branch length (requires closely-related species) • Measure number of recovered motif instances at a fixed confidence (80%) / FDR (20%) • Discovery power: 6-fold higher than HMRD (Branch length also ~6-fold higher) • With 20 currently-aligned mammals: • Transcription factor motifs: 47 TFs | 16,000 instances | 340 targets on avg • microRNA motifs 21 miRNAs | 11,000 instances | 523 targets on avg • An initial regulatory network for mammalian genomes (80% confidence) 11,000 instances ~6X 10k 8k 6k miRNA motif instances recovered (80%) Total branch length of inf. species 4k 2k HMRD (0.74) pl-mam (3.36) mamm. (4.33) H+non-mamm. (6.36) HMRD+ non-mam (6.96) All vertebr. (9.66) ~6X

  18. New insights into animal biology

  19. 2nd stop codon 1.Large-scale evidence of translational read-through • New mechanism of post-transcriptional control. • Hundreds of fly genes, handful of human genes. • Enriched in brain proteins, ion channels. • Experiments show ADAR necessary & sufficient (Reenan Lab). • Many questions remain • A-to-I editing of stop codon TAG|TGA|TAA  TGG • Cryptic splice sites? RNA secondary structure? Continued protein-coding conservation Protein-coding conservation No more conservation Stop codon read through

  20. 2. Stop codon read-through in mammals Four candidates found: GPX2, OPRK1, OPRL1, GRIA2, mostly neuronal A look at FOXP2 – Possible 3’UTR function (not in fish, yes in frog)

  21. 3. New insights into miRNA regulation: miRNA* function • Both miRNA arms can be functional • High scores, abundant processing, conserved targets • Hox miRNAs miR-10 and miR-iab-4 as master Hox regulators

  22. 4. New insights into miRNA regulation: miR-AS function • A single miRNA locus transcribed from both strands • Both processed to mature miRNAs: mir-iab-4, miR-iab-4AS (anti-sense) • The two miRNAs show distinct expression domains (mutually exclusive) • The two show distinct Hox targets – another Hox master regulator

  23. 5. New insights into miRNA regulation: miR-AS function wing w/bristles • Mis-expression of mir-iab-4S & AS: altereswingshomeotic transform. • Stronger phenotype for AS miRNA • Sense/anti-sense pairs as general building blocks for miRNA regulation • 9 new anti-sense miRNAs in mouse Sensory bristles haltere wing haltere WT Note: C,D,E same magnification wing sense Antisense

  24. Summary of Contributions • Evolutionary signatures specific to each function • Protein-coding genes: Revised catalogue affects 10% of genes • RNA: hundreds of new high-confidence structures discovered • miRNAs: ~double number of genes, families, targeting density • Motifs: ~double number of motifs, tissue & positional enrichment • Targets: ChIP-grade quality, global scale, experimental support • New insights on animal biology • Genes: Abundant stop codon read-through in neuronal proteins • RNA: Abundant structures in RNA editing, translational regulation • Motifs: Coding regions show miRNA targeting • miRNAs: miR/miR* and sense/anti-sense pairs: building blocks • Networks: TF vs. miRNA targets redundancy and integration • Methods are general, applicable in any species

  25. Next steps: Drosophila and Human ENCODE • modENCODE: White / Ren / Kellis / Posakony • Hundreds of sequence-specific factors • Dozens of chromatin / histone modifications • Dozens of tissues / stages / conditions • humENCODE: Bernstein / Lander / Kellis / Broad • ChIP-seq for dozens of chromatin modifications • Follow differentiation lineages – activation inactivation • Discover tissue-specific regulatory motifs • Many open questions remain • Dynamics of tissue-specific regulatory networks • Sequence determinants of chromatin establ. & maint • Global views of pre- & post-transcriptional regulation • Many open positions remain (postdoc/grad/ugrad)

  26. Acknowledgements Alex Stark Mike Lin Pouya Kheradpour Matt Rasmussen Genes FlyBase, BDGP, Bill Gelbart, Sue Celniker, Lynn Crosby miRNAs Leo Parts, Julius Brennecke, Greg Hannon, David Bartel iab-4AS Natascha Bushati, Steve Cohen, Julius, Greg Hannon 12-fliesAndy Clark, Mike Eisen, Bill Gelbart, Doug Smith 24 mammals Sante Gnerre, Michele Clamp, Manuel Garber, Eric Lander

More Related