520 likes | 629 Views
Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets. Wyeth Wasserman Jan. 18, 2012. opossum.cisreg.ca/oPOSSUM3. Welcome. If you encounter any technical difficulties during the webinar Type a report using the chat option Slide presentation ~20 min
E N D
Motif Enrichment Analysis in Co-Expressed Gene Sets and High-Throughput Sequence Sets Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3
Welcome • If you encounter any technical difficulties during the webinar • Type a report using the chat option • Slide presentation ~20 min • Compile Questions as they are submitted and answer them during the final Q&A/discussion period • During the discussion session, we’ll allow audience speaking
Webinar Format • Introduction • Walk-Through • Summary • Q&A
Overview • Given co-expressed gene sets, what are the key mediators of co-expression? • Focus on TFs • Web-based software system for motif enrichment analysis • Co-expressed genes or sequences • Multiple sets of analysis methods • Available for human, mouse, fly, worm, yeast
p=0.66 p=0.55 p=0.04 Motif Enrichment Analysis Background Target Finds over-represented TFBS in co-expressed gene sets
What do we need? • Region selection • Where to look for enriched binding sites • Use conservation filter to restrict search space • TFBS profiles to search for • Need a pool of validated profiles • Scoring metrics for enrichment • How to measure motif over-representation
Conserved Region Selection Gene CR1 CR2 CR3 CR4 Threshold phastCons Score Genomic Position
TFBS Profiles • JASPAR 2010: Portales-Casamaret al. Nucleic Acids Research 2009. • Expanded collection of TFBS profiles • 130 vertebrate profiles • 105 insect profiles • 5 nematode profiles • 177 yeast profiles • PBM (104), PBM_HOMEO (176), PBM_BHLH (19) • Standardized 2-level TF classification (class, family)
Scoring Metrics • Z scores • Based on the number of occurrences of the TFBS relative to background • Normalized for sequence length • Simple binomial distribution model • Fisher scores • Fisher exact probability test • Fisher score = -log(Fisher p-value) • Based on the number of genes containing the TFBS relative to background
Additional Metric for Seq-Based • KS scores • Kolmogorov-Smirnoff test • Compares the empirical distribution of the distances of the binding sites from the maximum point of confidence (MPC) to the background • Expect real binding sites to be centered around the MPC Foreground Background KS score = -log(KS test p-value) MPC
TFBS Cluster Analysis TFBS Profile Cluster
TFBS Cluster Analysis (TCA) Gene CR1 CR2 CR3 CR4 TFBSs Merge TFBS Cluster Hits Overrepresentation Analysis based on merged TFBS cluster hits
oPOSSUM-3 • Web-based system for motif enrichment analysis in co-expressed gene sets and sequences from high-throughput experiments • Important functionalities • Gene-based vs. Sequence-based • Single site vs. Anchored combination site • Individual vs. clusters of TFBS profiles • Human, mouse, fly, worm and yeast