220 likes | 345 Views
Identifying Active Transcription Factors from Expression Data using Pathway Queries. Florian Sohler, Ralf Zimmer. Outline. Pathway Queries Query network information and functional annotations to find relevant contexts for experimental (expression) data Query language structure
E N D
Identifying Active Transcription Factors from Expression Data using Pathway Queries Florian Sohler, Ralf Zimmer Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler
Outline • Pathway Queries • Query network information and functional annotations to find relevant contexts for experimental (expression) data • Query language structure • Matching algorithm • Visualization • Application: Transcription factor activities • Scoring methods • Data sets • Results Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler
Context information is necessary • Basic analysis steps for gene expression data • Image analysis • Normalization • Calculation of gene-wise features: • Fold changes • P-values for differential expression • Lists of these features need to be interpreted • What is the biological mechanism causing the regulation? • What is the effect of the observed regulation, e.g. on the metabolism? Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler
Regulation of Transcription: Signaling Pathways Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler
Metabolic Pathways Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler
Biological Networks Networks contain relevant information, but hard to screen manually… Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler
Biological networks Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler
Pathway Queries • Researchers often want to look at certain aspects of the data • Mechanisms explaining the data (hypotheses) • Effect on the metabolism or another biological process • Links to known disease-relevant genes • A natural formulation for many of these aspects is in terms of network-like structures • Hypotheses of biological mechanisms • Network context • Provide a language to formulate network templates and an algorithm that finds all instances Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler
Pathway Query Language Example Query Graph Kinase max. distance: 1 TranscriptionFactor max. distance: 1 RegulatedGenes … Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler
Pathway Query Language • Specifying genes/proteins: • Boolean expressions based on available annotations • Examples: • Gene appears differentially expressed (p-value<0.01) • GO classification is Transcription Factor • Specifying connections: • Network distance • Proteins and interactions on the path • Multiplicity of nodes • Aggregate e.g. all regulated targets of the transcription factor • Scoring: • Different scoring methods can be indicated in the query • Visualization: • Visualization layers can be defined in the query • Recursive structure: • Other queries can be used as building blocks Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler
Pathway Queries • Specification • Input: • Network, annotations (expression data, functional annotations) • Query description (network template, hypothesis of biological mechanisms) • Output • All instances of the query in the network • Scores • Visualization • Framework for implementation • ToPNet: A Toolbox for Protein Networks • Sanofi-Aventis Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler
Matching Algorithm Query Graph Instance Graph Clique Search Kinase Kinase Verifiedconnections Fully connected(unrestricted) TranscriptionFactor TranscriptionFactor RegulatedGenes RegulatedGenes … Pathway Instances Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler
Visualized Result Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler
Application: Activity of Transcription Factors (Higher level) regulators Activation/Inhibition Transcription factors, Interactions Expression • Why infer transcription factor activities? • Expression levels of genes can be measured using microarrays • Expression is directly mediated by transcription factors • Transcription factor activity not determined by expression level Inference of transcription factor activity is a first step in a causal analysis of gene expression data Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler
Transcription Factors Molecular function: Transcription Factor max. distance: 1, regulation … Target genes Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler
Scoring a Transcription Factor • Given a transcription factor T with M being the set of potential target genes of T • Strategy 1: • Compare the set of genes M with some other (relevant) set of genes (overrepresentation analysis) • Fisher’s exact test • Strategy 2: • Look at the expression data on M and test if the data have the same distribution on M as on the rest of the proteins • Wilcoxon rank test, t-test … Other possibilities: Tian et al., PNAS 102(39), Sep. 2005 Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler
Scoring a Transcription Factor Transcription Factor Targets Regulated Genes All Genes Fisher’s exact test computes significance of the intersection Question: Which genes are ‘regulated’? Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler
Scoring a Transcription Factor Transcription Factor Targets All Genes Wilcoxon rank test to compute significance of the target regulation Distribution-free Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler
Data Sets • Gene expression data • Hughes et al. 2000 conducted ~300 yeast knockout experiments and measured RNA expression levels. • For each gene, p-values and ratios of differential expression were computed. • Biological networks • Genome-wide location analysis of Lee et al. 2003 gives network of ~100 transcription factors and putative regulatees. • For kinase activity:Database of Interacting Proteins (DIP) with ~4700 Proteins and ~15000 interactions for yeast Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler
Results: Activity Scores of TFs Ste12 is an important regulator for mating functions Bas1 is an important regulator for purine biosynthesis Arg80 and Arg81 are regulators for argenine metabolism Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler
Correlation of Activity Scores Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler
Summary • Rich sources of network data are available and contain relevant context information for gene expression data • Pathway Queries provide a mechanism to exploit this context information • Interesting queries must be developed • Scoring methods must be applied • Successful application of the method • Transcription factor activity • Other examples in the paper: • Co-operating transcription factors • Kinases Identifying Active Transcription Factors from Expression Data using Pathway Queries, Florian Sohler