110 likes | 258 Views
Integrative fly analysis: specific aims. Aim 4: Predictive models of gene expression How do motifs -> binding -> chromatin -> expr /splicing, where ‘->’ = ‘predicts’ Aim 5: Regulatory and functional networks Regulatory network inference Functional network validation
E N D
Integrative fly analysis: specific aims • Aim 4: Predictive models of gene expression • How do motifs -> binding -> chromatin -> expr/splicing, where ‘->’ = ‘predicts’ • Aim 5: Regulatory and functional networks • Regulatory network inference • Functional network validation • Aim 6: Comparative / evolutionary analysis • Using conservation to assess: • Function / coverage • Aim 1: Comprehensive data collection • Data QC / data standards / • consistent pipelines • Aim 2: Integrative annotation • Systematically annotate functional elements based on combined experimental information • Aim 3: Clusters of activity • Find genes / enhancers / chromatin regions / domains of coordinated activity across conditions
1. Supervised learning for enhancer annotation • Logistic regression classifier recovers known CRMs • Combinations of features in each class outperform individual members of that class • Combinations of features across classes even stronger
2. Functions of 20 distinct chromatin states in fly Chromatin marks DV enhancers AP enhancers General TFs Insulators Replication Motifs
3. Clusters of activity (e.g. CBP binding vs. TFs) Early regulators (kr, cad, hb) Trx Trx Polycomb • Confirmed by distinct enrichments for • Chromatin mark combinations • Regulatory motifs • GO functional categories • Developmental anatomical terms Component parameters
3. Clusters of TFs vs. chromatin states AP-state 60-fold enriched in enhancers Trx in enhancer states Polycomb states enriched for enhancers Ubiquitous genes enriched for multiple states BEAF/Chro in TSSfor ubiquitous genes Strong Su(Hw) in Negativeoutside promoter states
4. Motif combinations for TF binding prediction Transcription factor binding • Many motifs enriched in binding of corresponding TF (diagonal) • However, extensive cross-enrichment suggests extensive cross-talk across binding of factors Motif enrichment • Indeed, predictive power for binding increases with motif combinations • Both synergistic and antagonistic effects 2-4 24 Fold enrichment
5. Data integration for stage-specific regulators H3K27me3 • abd-A motif is enriched in new H3K27me3 regions at L2 • Coincides with a drop in the expression of abd-A • Model: sites gain H3K27me3 as abd-A binding lost • Additional intriguing stories found, to be explored Fold enrichment or over expression
6. Evolutionary signatures for diverse functions Protein-coding genes - Codon Substitution Frequencies - Reading Frame Conservation RNA structures - Compensatory changes - Silent G-U substitutions microRNAs - Shape of conservation profile - Structural features: loops, pairs - Relationship with 3’UTR motifs Regulatory motifs - Mutations preserve consensus - Increased Branch Length Score - Genome-wide conservation Stark et al, Nature 2007; Clark et al, Nature 2007
Assessing fraction of conserved bases ‘explained’ Fly Cumulative 80% +CNV +ORC +Marks % of conserved bases +TF +Pol2 +newCDS 40% +CDS Per element +new3’UTR +new5’UTR +3’UTR
CTCF, check GAF, check Su(Hw), check BEAF-32, variant CP190, novel Mod(mdg4), novel The challenge ahead Binding sites of every developmental regulator Sequence motifs for every regulator Annotations & images for all expression patterns Dorsal-Ventral Expression domain primitives reveal underlying logic Anterior-Posterior Understand regulatory logic specifying development
Fly AWG team Sue Celniker BrentonGraveleySteve BrennerMichael BrentGary Karpen Sarah Elgin Mitzi Kuroda Vince PirrottaPeter Park Peter Kharchenko Michael Tolstorukov Eric BishopKevin White Casey Brown Nicolas Negre Nick Bild Bob Grossman Eric LaiNicolas RobineDavid MacAlpineMatthew EatonSteve HenikoffPeter BickelBen Brown Lincoln Stein GroupSuzanna LewisGosMicklemNicole WashingtonEO StinsonMarc PerryPeter Ruzanov MIT CompBio GroupChris BristowPouya KheradpourMike Lin Rachel Sealfon Rogerio Candeias Fly modEncode compbio.mit.edu AWG