260 likes | 520 Views
Comparative genomics in flies and mammals. Manolis Kellis. Broad Institute of MIT and Harvard. MIT Computer Science & Artificial Intelligence Laboratory. Post-duplication. 9 Yeasts. Pre-dup. P. Diploid. P. P. P. 8 Candida. Haploid. P. P. Resolving power in mammals, flies, fungi.
E N D
Comparative genomics in flies and mammals Manolis Kellis Broad Institute of MIT and Harvard MIT Computer Science & Artificial Intelligence Laboratory
Post-duplication 9 Yeasts Pre-dup P Diploid P P P 8 Candida Haploid P P Resolving power in mammals, flies, fungi 12 flies 32 mammals ~20 fungi Many species lead to high resolving power in close distances
Comparative genomics and evolutionary signatures • Comparative genomics can reveal functional elements • For example: exons are deeply conserved to mouse, chicken, fish • Many other elements are also strongly conserved: exons / regulatory? • Can we also pinpoint specific functions of each region? Yes! • Patterns of change distinguish different types of functional elements • Specific function Selective pressures Patterns of mutation/inse/del • Develop evolutionary signatures characteristic of each function
1. Evolutionary signature of protein-coding genes • Revise protein-coding gene catalogue
High conservation, but not protein-coding • Evolutionary signatures highly specific • High protein-coding signal, low conservation • Evolutionary signatures highly sensitive Annotated FlyBase gene Existing cDNA data New predicted exon cDNA validation (iPCR) Protein-coding evolution vs. nucleotide conservation
2. Evolutionary signatures of RNA genes • Typical substitutions • Compensatory changes • G:C G:U … G:U A:U • Prediction methodology • Jakob Pedersen: EvoFold with very stringent parameters
Reveal novel RNA genes and structures • Intronic: enriched in A-to-I editing, also novel ncRNAs • Coding: A-to-I editing, also translational regulation • 3’UTRs: enriched in regulators of mRNA localization • 5’UTRs: translational regulation, ribosomal proteins - 3’ & 5’UTR structures mostly on coding strand (75% & 80%)
3. Structural and evolutionary signatures of miRNAs • Recognize miRNA hairpin • Length of hairpin & length of arms • Fold stability, symm/assym bulges • Conservation profile: high|low|high • Pinpoint mature miRNA 5’end • Perfect 8mer conservation at start • Predominance of 5’U (78%) • Number of paired bases is bound • Complementary to 3’UTR motifs Discover novel miRNAs Revise existing miRNAs
4. Evolutionary signatures for regulatory motifs Known engrailed site (footprint) D.mel CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC D.sim CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC D.sec CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC D.yak CAGC--TAGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC D.ere CAGCGGTCGCCAAACTCTCTAATTAGCGACCAAGTC-CAAGTC D.ana CACTAGTTCCTAGGCACTCTAATTAGCAAGTTAGTCTCTAGAG ** * * *********** * **** * ** D.mel D. ere D. ana D. pse. • Motifs discovered • - Recover known regulators • - Many novel motifs • Evidence for novel motifs • Tissue-specific enrichment • Functional enrichment • In promoters & enhancers • Surprises • Core promoter elements • miRNA motifs in coding ex.
Functions of discovered motifs Positional biases Tissue-specific enrichment and clustering miRNA targeting in coding regions
5. Evolutionary signatures of motif instances • Allow for motif movements • Sequencing/alignment errors • Loss, movement, divergence • Measure branch-length score • Sum evidence along branches • Close species little contribution BLS: 25% BLS: 83% Mef2:YTAWWWWTAR
Motif confidence selects functional instances Transcription factor motifs Confidence Confidence Increasing BLS Increasing confidence Confidence selects functional regions Confidence selects in vivo bound sites High sensitivity microRNA motifs Confidence selects positive strand Increasing BLS Increasing confidence Confidence selects functional regions
6. Initial regulatory network for an animal genome • ChIP-grade quality • Similar functional enrichment • High sens. High spec. • Systems-level • 81% of Transc. Factors • 86% of microRNAs • 8k + 2k targets • 46k connections • Lessons learned • Pre- and post- are correlated (hihi/lolo) • Regulators are heavily targeted, feedback loop
Network captures co-expression supported edges Red = co-expressed Grey = not co-expressed Named = literature-supported Bold = literature-supported
7. ChIP vs. conservation: similar power / complementary • Together: best complementary • Bound but not conserved: reduced enrich. Selects functional • All-ChIP vs. All-cons: similar enr. Similar power • Cons-only vs. ChIP-all: similar Additional sites
Recovery of regulatory motif instances in mammals • Performance increases with branch length (requires closely-related species) • Measure number of recovered motif instances at a fixed confidence (80%) / FDR (20%) • Discovery power: 6-fold higher than HMRD (Branch length also ~6-fold higher) • With 20 currently-aligned mammals: • Transcription factor motifs: 47 TFs | 16,000 instances | 340 targets on avg • microRNA motifs 21 miRNAs | 11,000 instances | 523 targets on avg • An initial regulatory network for mammalian genomes (80% confidence) 11,000 instances ~6X 10k 8k 6k miRNA motif instances recovered (80%) Total branch length of inf. species 4k 2k HMRD (0.74) pl-mam (3.36) mamm. (4.33) H+non-mamm. (6.36) HMRD+ non-mam (6.96) All vertebr. (9.66) ~6X
2nd stop codon 1.Large-scale evidence of translational read-through • New mechanism of post-transcriptional control. • Hundreds of fly genes, handful of human genes. • Enriched in brain proteins, ion channels. • Experiments show ADAR necessary & sufficient (Reenan Lab). • Many questions remain • A-to-I editing of stop codon TAG|TGA|TAA TGG • Cryptic splice sites? RNA secondary structure? Continued protein-coding conservation Protein-coding conservation No more conservation Stop codon read through
2. Stop codon read-through in mammals Four candidates found: GPX2, OPRK1, OPRL1, GRIA2, mostly neuronal A look at FOXP2 – Possible 3’UTR function (not in fish, yes in frog)
3. New insights into miRNA regulation: miRNA* function • Both miRNA arms can be functional • High scores, abundant processing, conserved targets • Hox miRNAs miR-10 and miR-iab-4 as master Hox regulators
4. New insights into miRNA regulation: miR-AS function • A single miRNA locus transcribed from both strands • Both processed to mature miRNAs: mir-iab-4, miR-iab-4AS (anti-sense) • The two miRNAs show distinct expression domains (mutually exclusive) • The two show distinct Hox targets – another Hox master regulator
5. New insights into miRNA regulation: miR-AS function wing w/bristles • Mis-expression of mir-iab-4S & AS: altereswingshomeotic transform. • Stronger phenotype for AS miRNA • Sense/anti-sense pairs as general building blocks for miRNA regulation • 9 new anti-sense miRNAs in mouse Sensory bristles haltere wing haltere WT Note: C,D,E same magnification wing sense Antisense
Summary of Contributions • Evolutionary signatures specific to each function • Protein-coding genes: Revised catalogue affects 10% of genes • RNA: hundreds of new high-confidence structures discovered • miRNAs: ~double number of genes, families, targeting density • Motifs: ~double number of motifs, tissue & positional enrichment • Targets: ChIP-grade quality, global scale, experimental support • New insights on animal biology • Genes: Abundant stop codon read-through in neuronal proteins • RNA: Abundant structures in RNA editing, translational regulation • Motifs: Coding regions show miRNA targeting • miRNAs: miR/miR* and sense/anti-sense pairs: building blocks • Networks: TF vs. miRNA targets redundancy and integration • Methods are general, applicable in any species
Next steps: Drosophila and Human ENCODE • modENCODE: White / Ren / Kellis / Posakony • Hundreds of sequence-specific factors • Dozens of chromatin / histone modifications • Dozens of tissues / stages / conditions • humENCODE: Bernstein / Lander / Kellis / Broad • ChIP-seq for dozens of chromatin modifications • Follow differentiation lineages – activation inactivation • Discover tissue-specific regulatory motifs • Many open questions remain • Dynamics of tissue-specific regulatory networks • Sequence determinants of chromatin establ. & maint • Global views of pre- & post-transcriptional regulation • Many open positions remain (postdoc/grad/ugrad)
Acknowledgements Alex Stark Mike Lin Pouya Kheradpour Matt Rasmussen Genes FlyBase, BDGP, Bill Gelbart, Sue Celniker, Lynn Crosby miRNAs Leo Parts, Julius Brennecke, Greg Hannon, David Bartel iab-4AS Natascha Bushati, Steve Cohen, Julius, Greg Hannon 12-fliesAndy Clark, Mike Eisen, Bill Gelbart, Doug Smith 24 mammals Sante Gnerre, Michele Clamp, Manuel Garber, Eric Lander