350 likes | 498 Views
Network Archaeology: Uncovering Ancient Networks from Present-Day Interactions. Saket Navlakha , Carl Kingsford. Presented by: Geli Fei. Overview. Motivation Network Reconstruction Algorithms Experiments. Importance of Knowing Network Growing Dynamics.
E N D
Network Archaeology: Uncovering Ancient Networksfrom Present-Day Interactions SaketNavlakha, Carl Kingsford Presented by: GeliFei
Overview • Motivation • Network Reconstruction Algorithms • Experiments
Importance of Knowing Network Growing Dynamics • Many networks are the product of an evolutionary process that guided their growth. • Analyses of network growth dynamics are useful for understanding: • Existing network properties • How networks change in the future
Past Network Unavailable • In many cases, only a static snapshot of a network is available. • Biology domain • Social network domain • Lack of data makes understanding of a network difficult
Network Growth Models • Often, we know a general principle that governs the network’s forward growth. • Preferential attachment (PA) • Duplication-mutation with complementarity (DMC) • Forest fire (FF) • Can be used to understand global changes in a network.
Network Growth Models • However, a randomly grown network will generally not match a target network! • Instead of growing a random graph forward, we decompose the actual observed network backwards in time.
Network Reconstruction Algorithm • Goal is to find G1, G2, …, Gt-1 given Gt under model M
Network Reconstruction Algorithm • Computational issue • Heuristically set • Greedily reverse only a single step of the evolutionary model • First-order Markov model assumption
Network Reconstruction Algorithm • Model M is being run forward as intended! • , the prior, can be used to guide the choice of Gt-1 • Use uniform prior for simplicity
The Duplication-Mutation with Complementarity (DMC) Model • Based on the duplication-divergence principle • Start with a connected two-node graph
To Reverse DMC Model • Given qmod and qconand Gt • Goal is to find a pair of nodes: • <node most recently entered Gt, its anchor node in Gt-1>
To Reverse DMC Model • After (u,v) is found, Gt-1 is formed by: -> Removing either u or v -> Assume we remove v, u gains edges to all nodes in N(u)N(v) • pairs of nodes must be considered
The Forest Fire (FF) Model • Was proposed by Leskovec et al. to grow networks that mimic certain properties of social networks • Probabilistic process: • Fire starts at some node u • Probabilistically move forward to N(u) • Stops when the spreading ceases
FF Model • Start with a connected two-node graph, a burning probability p
To Reverse FF Model • Given burning probability p, current network Gt • As for DMC model, find <most recently entered node, its anchor node> • Difficult to write down Analytic expression computing the likelihood of Gt-1. • Simulation is used to compute the likelihood instead.
The Preferential Attachment (PA) Model • Was proposed as a mechanism to emulate the growth of the Web. • New pages make popular pages more popular by linking to them preferentially. • Only consider linear version of the PA model.
PA Model Start with a clique of k+1 nodes, parameter k
PA Model • No anchor node as in DMC or FF • Most recently added node must be of minimum degree in Gt • Find a node to remove among nodes in C
Algorithm Measures • Likelihood of node/node anchor pair • Spearman’s footrule and Kendall’s measures of arrival-time correlation
Reversibility of Models • In a situation where the evolutionary history is completely known. • For each model, grow a 100-node network forward, then use Gt=100 to reconstruct its history. • Repeat this process 1000 times for each model and average the results.
Reversibility of Models - DMC • DMC • Reversibility varies drastically depending on: • the DMC model parameters to grow the network forward • the match between parameters used to grow and reverse the network
Reversibility of Models - DMC • Performance under noise • Most sensitive to noise among three models
Reversibility of Models - FF FF: p increases the degree of each node increases
Reversibility of Models - FF • Under noise
Reversibility of Models - PA • PA The most easily reversible
Reversibility of Models - PA • Under noise • Most resilient to noise
Recovery of ancient protein interaction network • Use PPI network for the yeast S. cerevisiae from the IntAct database • 2,599 proteins (nodes) and 8,275 physical interactions • Past PPI networks are unavailable, do not have true node arrival times • Node arrival times are inferred using additional information as ground truth
Recovery of ancient protein interaction network • Comparison between three models • Duplication-based model is a better fit for PPI network • For DMC, low-to-medium qmod and medium-to-high qcon give the best performance
Recovery of ancient protein interaction network • Actual likelihood values for DMC also indicate the plausibility of the reconstruction • The ratio of log-likelihoods between inferred history and a random reconstruction is > 5 • The likelihood of reconstruction (qmod=0.4, qcon=0.7) is 2.6 times higher than reconstruction (qmod=0.9, qcon=0.1)
Recovery of past social networks • Music social network data nodes are users edges indicate friends • Predict user-arrival • Best performing model was the worst for PPI network
Conclusion • A novel framework for uncovering past networks given only a growth model • Works in a principled way, and provides a likelihood estimates for ancestral graphs • Using the accuracy of history reconstruction as an optimization criterion, optimal parameters are chosen