1 / 26

Clustering of Interaction Network

Clustering of Interaction Network. Definition Process to detect densely connected sub-graphs Determines protein complexes or functional modules Difficulties Noisy data (too many false positives or false negatives) C annot be solved by traditional clustering techniques

ayersr
Download Presentation

Clustering of Interaction Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering of Interaction Network • Definition • Process to detect densely connected sub-graphs • Determines protein complexes or functional modules • Difficulties • Noisy data (too many false positives or false negatives) • Cannot be solved bytraditional clustering techniques • Difficult to define the pair-wise distance between proteins in the network. • Protein complexes may overlap. • Disparate sources of data • Different reliabilities • 17%~50% • Small overlaps • <17%

  2. Protein Interaction Network • Undirected, unweighted graph • Node represents protein, edge represents interaction • Example of Yeast protein interaction network • Importance • Provide a global view of cellular organizations and biological functions • Applicable to systematic approaches for functional knowledge discovery • Problem • Large scale • Complex connectivity

  3. high modularity hub existence Structural Property • Small-world Phenomenon ( Watts & Strogatz ) • Appearance of networks in the middle of regular and random networks • Higher average clustering coefficient than expected by random chance • Significantly small average shortest path length • Scale-free Distribution ( Barabasi & Albert ) • Network growth by preferential attachment • Power law degree distribution – a few high degree nodes, many low degree nodes • Clustering coefficient distribution independent to degree

  4. Conventional Graph Clustering Approaches • Density-based Clustering • Finding densely connected sub-graphs ( e.g. Maximal clique algorithm ) • Hierarchical Clustering • Top-down approach: iteratively partitioning a graph ( e.g. Minimum cut algorithm ) • Bottom-up approach: iteratively merging nodes ( e.g. Node merging by common neighbors ) • Problems • Computationally inefficient • Unable to detect overlapping clusters • Discard sparsely connected nodes

  5. Functional Influence Model • Functional Flow • treat each protein of known functional annotation as a ‘source’ of ‘functional flow’ for that function • simulating the spread of this functional flow through the neighborhoods surrounding the sources with random walk. • ‘functional score’: the amount of ‘flow’ that the protein has received for that function u v Func(a)

  6. Functional Influence • Functional Influence based on Distance. • Weibull Distribution • Curve Fitting d is the distance between two nodes

  7. Functional Influence Model • Information Flow Simulation • Computation of functional influence infs(x) of s on x ∈ V based on Shortest Path • Input: a weighted interaction network and a source node s • Output: functional influence pattern of s • Measurements • PathRatio • PathRatiois the natural “aging” or “losing” of information propagation in the network. • SPath(s,y) is all the shortest paths between node s and node y. • PR(s,y) is the PathRatio between node s and node y. • PathStrength • PS(P) measures the strength of path P using weights on the edges along the path P.

  8. Framework of functional influence simulation • Algorithm • Initialize inf(s) • Compute initial flow I(s → y) by • Update inf(y) by • Repeat 3 for every node in the network. • Finally, the functional profile, • is generated for every node in the network. F(d) is the functional distribution model. d is the distance between node s and node y. PR(s,y) is the Path Resistance between node s and node y. Inf(s) is the initial functional influence from node s. Infs(y) is the functional influence received by node y from node s.

  9. Functional Module Detection (FMD)

  10. FlowChart for functional module detection

  11. Functional Modularity Detection • Experimental Data • DIP (4935 proteins, 14162 interaction) • Evaluation • Functional categories and annotations from MIPS • Hyper-geometric p-value • Result

  12. Computational Epidemiology • Computational Epidemiology • is a multidisciplinary field utilizing techniques to develop tools and models to aid epidemiologists in their study of the spread of diseases. 4. Analyzing results of the containment strategy (death toll vs. strategies) 1. Developing a virus spread and containment respond model 3. Utilizing this finding into real infectious virus spread 2. Understanding virus spread and identifying critical properties

  13. Virus Spread Network Model • What represent nodes and edges in virus spread network model? • Node • Person (community network) • Town or place (road network) • Edge • Interaction (community network) • Pathway (road network) • Weight of nodes and edges • Changed by time t based on virus spread dynamics model • Node weight: Status of health (0 ~ 1) • Edge weight: Status of strength (0 ~ 1)

  14. Model Scheme • Spread Model • Spreading phase: edges which are in the region of spreading will be damaged • Defense Model • Signaling and propagation phase: nodes which have a certain number of damaged edges will send signals to neighbor nodes • Defense action phase: nodes which have a certain level of signals from neighbor nodes will remove all edges of those nodes Virus progression to neighbor nodes Signaling alarms to neighbor nodes from infected neighbor node Culling nodes to prevent from virus progression

  15. Spread Model • Spreading Model • Simulating disease spreading • Damaging nodes and edges which are in a virus spread radius from center • Virus Spread by r(t)

  16. Defense Model • Defense Model • Simulating defense system of disease spreading and message spreading • Culling interactions from damaged nodes in order to stop spreading (Edge Culling in Green Circles)

  17. Problem / Solution Approach • Which element of virus spread system has the greatest impact on containment campaign? • Identifying critical element of system by computational modeling and stochastic simulation. • How to plan a effective containment campaign for minimizing damages by virus spread? • Mining best combination of critical parameters under certain conditions. Parameters Simulation & Analysis Critical parameter

  18. Application • Virus Spread Simulation on the road network at the city of Oldenburg, German • Green edges: Healthy edges • Red edges:Damaged edges by spread process • Blue edges: Damaged edges bydefense process Uncontrolled  = 0.02 Intermediate  = 0.12 Controlled = 0.22

  19. Osteoporosis • Osteoporosis • Definition: “a systemic skeletal disease characterized by low bone mass and micro-architectural deterioration of bone tissue leading to enhanced bone fragility and a consequent increase in fracture risk” • 25 million people in the United States are suffered. • $10 billion dollars are expended by medical charges including rehabilitation and treatment facilities. • Research Funding will be $200 billion by the year of 2040 Normal Osteoporosis

  20. Challenges • Diagnosis of Osteoporosis? • Traditional method of evaluating bone strength is by assessing bone mineral density (BMD). • Limitations on BMD • A major limitation of BMD is that it incompletely reflects variation in bone strength. • Other factors like bone microarchitecture contribute substantially to bone strength • By evaluating bone microstructure we can improve determination of bone quality and strength Computational Model on Bone Microstructure

  21. Computational Model on Bone Microstructure • Questions • What is the better way to evaluate bone strength? • How can we identify fragile locations of the bone structure? • Why don’t we think this problem in a new direction? • Let me think this problem with the structural point of view. • Graph-based approach of bone microstructure • Bone microstructure contributes on bone strength. • We suppose rod-like mineral fibers represented by edges in a graph. • It is capable of quantitative assessment of bone mineral density and bone micro-architecture

  22. Model Approach • Bone is not a uniformly solid material, but rather has some spaces between its hard elements. • Designing a network approach model for the bone microstructure. • Quantitative assessment of bone mineral density could be successfully done with this approach.

  23. Bone Network Model • Creating Bone Network • A femur bone image from patients with osteoporosis by DXA scan. • By image profiling on DXA scan image, we create bone network based on the bone density. • What represent nodes and edges in bone network model? • Node: fiber binding point for bone cell movements and biochemical interactions • Edge: a group of mineralized fibers • Weight of nodes and edges • Node weight: average weight of directly connected edges • Edge weight: Strength status of mineralized fibers

  24. Problem / Solution Approach • What alternative ways for determining the strength of bone rather than Bone Mineral Density (BMD)? Designing a computational model of bone microstructure. • How can we identify fragile locations of the bone structure? Creating algorithms for mining weak locations from a computational model of bone microstructure. Human Bone Bone Model

  25. Identifying Critical Locations • Information Propagation Model • An algorithm to find critical edges in bone network • Measuring the quantity of stress energy in each edge • Cutting the most critical edge by Information Propagation Model • Iteratively run to find the next critical edges. • It stops at the first isolated network

  26. Conclusions • Various applications are generating data very rapidly and in great volume, demanding data mining approaches. • Network-based approaches look promising to solve complex problems. • This research requires close collaboration among multidisciplinary groups. • Semi-supervised approaches to integrate domain knowledge into data mining tools are important to the success of the research.

More Related