170 likes | 335 Views
Functional Coherence in Domain Interaction Networks. Prof. Ananth Grama. Outline. Motivation Protein and Domain Interaction Networks Formal framework Properties for term-, set- similarity measures New Similarity Metric Results Comparison of measures Comparison of PPI, DDI networks. d 4.
E N D
Functional Coherence in Domain Interaction Networks Prof. Ananth Grama
Outline • Motivation • Protein and Domain Interaction Networks • Formal framework • Properties for term-, set- similarity measures • New Similarity Metric • Results • Comparison of measures • Comparison of PPI, DDI networks Dept. of Computer Science, Purdue University
d4 d2 d2 d3 d1 Motivation • Extracting functional information from protein-protein interactions • Noisy, incomplete, generic, static data from high throughput experiments • Typical proteins are composed of multiple domains • Independent unit (function, evolution, folding) • Behind protein-protein interactions there are protein domains interacting physically with one another. p1 p2 Domain-domain interaction Dept. of Computer Science, Purdue University
Motivation • How does functional modularity manifests itself in a network of molecular interactions? • Explore relationship between functional similarity and network proximity • Functional annotations available for domains and proteins vastly differ • Do current similarity measures work in unbiased manner? (due to incompleteness of annotation) • Are they statistically meaningful and biologically interpretable? Dept. of Computer Science, Purdue University
C0 = r C1 C2 C3 C4 C5 C6 Formal Framework • C = { ci | 0 ≤ i < N } is a finite partially ordered set of concepts (Ontology). • Concepts are related by binary relationship, denoted by eg: c3 c1, c6 c3, c5 r • Set of Ancestors Ai = { ck | ci ck } • Two concepts (ci, cj) are comparable (~) if either ci cj or cj ci • All concepts in Ai may not be comparable as the ontology is a DAG (as opposed to a tree) Dept. of Computer Science, Purdue University
Properties for term-similarity • Similarity (δ) of two terms based on underlying taxonomical relationship Existing measures • Distance based: • Count the number of edges between the nodes δE(ci,cj)=2*MAX-min[len(ci,cj)] Fails property (4) as distance is uniform over all edges Dept. of Computer Science, Purdue University
Existing metrics for term-similarity • Information Content: • If Gc be set of molecules associated with concept c, then IC(c) = - log2 (|Gc|/|Gr|) • δR(ci,cj)= max [ -log2 ( c ) ], c Є Ai and c Є Aj (c is common Ancestor) • Normalization: δL(ci,cj)= 2 * δR(ci,cj) / (IC(ci) + IC(cj)) • Hybrid approach: δJC(ci,cj)= (1 - 2 * δR(ci,cj) + IC(ci) + IC(cj))-1 • All three satisfy term-similarity properties Dept. of Computer Science, Purdue University
Properties for set-similarity • Let S be set of concepts, we want a measure ρ(Si, Sj) to access the semantic similarity of two sets Dept. of Computer Science, Purdue University
Existing metrics for set-similarity • Average • Violates properties (ii), (iii) and (iv) • Maximum • Weakly satisfies (ii) • Average of Maximums • Fails properties (ii), (iii) and (iv) Dept. of Computer Science, Purdue University
IC based set similarity • Extend the notion of minimum common ancestor (λ) to sets of terms as • Information content of a set is defined as: Where is set associated with all terms in MCA of Si, Sj • This satisfies all 4 properties, can be extended and Dept. of Computer Science, Purdue University
Datasets • Protein-Protein interactions • Extract physical interactions from BioGRID database • Binary data (no reliability score) • Domain-Domain interactions • DOMINE database • Confidence score used to split dataset • Struct: Only structure based interactions • HC+NA : High Confidence (HC) and Structure based interactions • HC+MC : High Confidence (HC) and Medium Confidence (MC) interactions • Comp-2: Interactions predicted by at least two computational approaches • Comp-1: Interactions predicted by at least one computational approach Dept. of Computer Science, Purdue University
Comparison of Semantic Similarity Measures • Negative relation between network distance and functional similarity • The proposed information content based measure (ρJC) provides the sharpest decline in semantic similarity for distance<4 C. elegans PPI network Dept. of Computer Science, Purdue University
Comparison of Semantic Similarity Measures • Proposed metric (ρJC) provides large similarity score for larger fraction of pairs at close distances (1,2), • and low similarity score for large fraction at distance>2 Structural DDI network Dept. of Computer Science, Purdue University
Comparison of PPI and DDI Networks Relation between network proximity and semantic similarity with respect to molecular function Dept. of Computer Science, Purdue University
Comparison of PPI and DDI Networks Relation between network proximity and semantic similarity with respect to biological process Dept. of Computer Science, Purdue University
Comparison of PPI and DDI Networks • Immediate and Indirect neighbors perform similar functions • Functional similarity is stronger in Struct DDI network • After normalization, the relationship between functional similarity and network distance is stronger in computationally inferred DDI networks than that in PPI networks • network proximity in DDI networks is likely to be a better indicator of functional modularity, than that in PPI networks. Dept. of Computer Science, Purdue University
Summary • We present necessary properties for any admissible metric for term- and set-similarity • Current metrics are not admissible, develop new metric for set-similarity • Proposed metric provides highly intuitive biological interpretation • Comprehensive comparative analysis of PPIs and DDIs validates the role of DDIs in quantifying functional coherence Dept. of Computer Science, Purdue University