260 likes | 270 Views
Clustering of Interaction Network. Definition Process to detect densely connected sub-graphs Determines protein complexes or functional modules Difficulties Noisy data (too many false positives or false negatives) C annot be solved by traditional clustering techniques
E N D
Clustering of Interaction Network • Definition • Process to detect densely connected sub-graphs • Determines protein complexes or functional modules • Difficulties • Noisy data (too many false positives or false negatives) • Cannot be solved bytraditional clustering techniques • Difficult to define the pair-wise distance between proteins in the network. • Protein complexes may overlap. • Disparate sources of data • Different reliabilities • 17%~50% • Small overlaps • <17%
Protein Interaction Network • Undirected, unweighted graph • Node represents protein, edge represents interaction • Example of Yeast protein interaction network • Importance • Provide a global view of cellular organizations and biological functions • Applicable to systematic approaches for functional knowledge discovery • Problem • Large scale • Complex connectivity
high modularity hub existence Structural Property • Small-world Phenomenon ( Watts & Strogatz ) • Appearance of networks in the middle of regular and random networks • Higher average clustering coefficient than expected by random chance • Significantly small average shortest path length • Scale-free Distribution ( Barabasi & Albert ) • Network growth by preferential attachment • Power law degree distribution – a few high degree nodes, many low degree nodes • Clustering coefficient distribution independent to degree
Conventional Graph Clustering Approaches • Density-based Clustering • Finding densely connected sub-graphs ( e.g. Maximal clique algorithm ) • Hierarchical Clustering • Top-down approach: iteratively partitioning a graph ( e.g. Minimum cut algorithm ) • Bottom-up approach: iteratively merging nodes ( e.g. Node merging by common neighbors ) • Problems • Computationally inefficient • Unable to detect overlapping clusters • Discard sparsely connected nodes
Functional Influence Model • Functional Flow • treat each protein of known functional annotation as a ‘source’ of ‘functional flow’ for that function • simulating the spread of this functional flow through the neighborhoods surrounding the sources with random walk. • ‘functional score’: the amount of ‘flow’ that the protein has received for that function u v Func(a)
Functional Influence • Functional Influence based on Distance. • Weibull Distribution • Curve Fitting d is the distance between two nodes
Functional Influence Model • Information Flow Simulation • Computation of functional influence infs(x) of s on x ∈ V based on Shortest Path • Input: a weighted interaction network and a source node s • Output: functional influence pattern of s • Measurements • PathRatio • PathRatiois the natural “aging” or “losing” of information propagation in the network. • SPath(s,y) is all the shortest paths between node s and node y. • PR(s,y) is the PathRatio between node s and node y. • PathStrength • PS(P) measures the strength of path P using weights on the edges along the path P.
Framework of functional influence simulation • Algorithm • Initialize inf(s) • Compute initial flow I(s → y) by • Update inf(y) by • Repeat 3 for every node in the network. • Finally, the functional profile, • is generated for every node in the network. F(d) is the functional distribution model. d is the distance between node s and node y. PR(s,y) is the Path Resistance between node s and node y. Inf(s) is the initial functional influence from node s. Infs(y) is the functional influence received by node y from node s.
Functional Modularity Detection • Experimental Data • DIP (4935 proteins, 14162 interaction) • Evaluation • Functional categories and annotations from MIPS • Hyper-geometric p-value • Result
Computational Epidemiology • Computational Epidemiology • is a multidisciplinary field utilizing techniques to develop tools and models to aid epidemiologists in their study of the spread of diseases. 4. Analyzing results of the containment strategy (death toll vs. strategies) 1. Developing a virus spread and containment respond model 3. Utilizing this finding into real infectious virus spread 2. Understanding virus spread and identifying critical properties
Virus Spread Network Model • What represent nodes and edges in virus spread network model? • Node • Person (community network) • Town or place (road network) • Edge • Interaction (community network) • Pathway (road network) • Weight of nodes and edges • Changed by time t based on virus spread dynamics model • Node weight: Status of health (0 ~ 1) • Edge weight: Status of strength (0 ~ 1)
Model Scheme • Spread Model • Spreading phase: edges which are in the region of spreading will be damaged • Defense Model • Signaling and propagation phase: nodes which have a certain number of damaged edges will send signals to neighbor nodes • Defense action phase: nodes which have a certain level of signals from neighbor nodes will remove all edges of those nodes Virus progression to neighbor nodes Signaling alarms to neighbor nodes from infected neighbor node Culling nodes to prevent from virus progression
Spread Model • Spreading Model • Simulating disease spreading • Damaging nodes and edges which are in a virus spread radius from center • Virus Spread by r(t)
Defense Model • Defense Model • Simulating defense system of disease spreading and message spreading • Culling interactions from damaged nodes in order to stop spreading (Edge Culling in Green Circles)
Problem / Solution Approach • Which element of virus spread system has the greatest impact on containment campaign? • Identifying critical element of system by computational modeling and stochastic simulation. • How to plan a effective containment campaign for minimizing damages by virus spread? • Mining best combination of critical parameters under certain conditions. Parameters Simulation & Analysis Critical parameter
Application • Virus Spread Simulation on the road network at the city of Oldenburg, German • Green edges: Healthy edges • Red edges:Damaged edges by spread process • Blue edges: Damaged edges bydefense process Uncontrolled = 0.02 Intermediate = 0.12 Controlled = 0.22
Osteoporosis • Osteoporosis • Definition: “a systemic skeletal disease characterized by low bone mass and micro-architectural deterioration of bone tissue leading to enhanced bone fragility and a consequent increase in fracture risk” • 25 million people in the United States are suffered. • $10 billion dollars are expended by medical charges including rehabilitation and treatment facilities. • Research Funding will be $200 billion by the year of 2040 Normal Osteoporosis
Challenges • Diagnosis of Osteoporosis? • Traditional method of evaluating bone strength is by assessing bone mineral density (BMD). • Limitations on BMD • A major limitation of BMD is that it incompletely reflects variation in bone strength. • Other factors like bone microarchitecture contribute substantially to bone strength • By evaluating bone microstructure we can improve determination of bone quality and strength Computational Model on Bone Microstructure
Computational Model on Bone Microstructure • Questions • What is the better way to evaluate bone strength? • How can we identify fragile locations of the bone structure? • Why don’t we think this problem in a new direction? • Let me think this problem with the structural point of view. • Graph-based approach of bone microstructure • Bone microstructure contributes on bone strength. • We suppose rod-like mineral fibers represented by edges in a graph. • It is capable of quantitative assessment of bone mineral density and bone micro-architecture
Model Approach • Bone is not a uniformly solid material, but rather has some spaces between its hard elements. • Designing a network approach model for the bone microstructure. • Quantitative assessment of bone mineral density could be successfully done with this approach.
Bone Network Model • Creating Bone Network • A femur bone image from patients with osteoporosis by DXA scan. • By image profiling on DXA scan image, we create bone network based on the bone density. • What represent nodes and edges in bone network model? • Node: fiber binding point for bone cell movements and biochemical interactions • Edge: a group of mineralized fibers • Weight of nodes and edges • Node weight: average weight of directly connected edges • Edge weight: Strength status of mineralized fibers
Problem / Solution Approach • What alternative ways for determining the strength of bone rather than Bone Mineral Density (BMD)? Designing a computational model of bone microstructure. • How can we identify fragile locations of the bone structure? Creating algorithms for mining weak locations from a computational model of bone microstructure. Human Bone Bone Model
Identifying Critical Locations • Information Propagation Model • An algorithm to find critical edges in bone network • Measuring the quantity of stress energy in each edge • Cutting the most critical edge by Information Propagation Model • Iteratively run to find the next critical edges. • It stops at the first isolated network
Conclusions • Various applications are generating data very rapidly and in great volume, demanding data mining approaches. • Network-based approaches look promising to solve complex problems. • This research requires close collaboration among multidisciplinary groups. • Semi-supervised approaches to integrate domain knowledge into data mining tools are important to the success of the research.