190 likes | 365 Views
Finding Modules in Networks with Non-modular Regions. Sharon Bruckner, Bastian Kayser, Tim Conrad Freie Uni. Berlin. What are networks with non-modular regions ?. Can every network be fully partitioned into dense clusters ? We introduce NCC networks . Modular region
E N D
Finding Modules in Networks with Non-modular Regions Sharon Bruckner, Bastian Kayser, Tim Conrad Freie Uni. Berlin
Whatarenetworkswith non-modular regions? Can everynetworkbefullypartitionedintodenseclusters? WeintroduceNCC networks. Modular region Transition region
Where do NCC networksoccur? • The networkisactuallyfullpartitionable, but containsnoise. • The networkstructureis not strictlymodular:Nodes • Overlaps • Outliers • Pathsandnodesconnectingmodules • Example: A protein-protein interactionnetwork
Formalizingthisnotionofmodularity Then: What‘swrongwithsimplytakingNewman Modularity? Answer: Treesandverysparsegraphshave high Newman Modularity Goal: A score thatquantifieshow modular an NCC networkis, analogoustoNewman‘sModularity. 0.75
Transition matricesandgaps • In a discreterandomwalk on a network, therandomwalkermovesateachsteptorandomlychosenneighbor. This isencoded in thetransitionmatrix. • Clustering algorithmsoftenrely on thegap in theeigenvaluespectrumofthetransitionmatrix. • The presenceofeigenvaluescloseto 1, followedby a gap, indicatethepresenceofmodules. • For NCC networks, thisis not enough. Thereforeweintroduce a transitionmatrix Pt, thatcomesfrom a continuousrandomwalk.
First try: thegap score • Compute . • Find thelargestdifferencebetweentwoconsecutiveeigenvalues. • Return as score. Exploittheconnectionbetweenthenumberofeigenvaluescloseto 1 ofthetransitionmatrixandthenumberofmodules.
Gap score: Analysis andresults • Sparse networks do not have a high gap score • Example: roadnetworks, with Newman modularity ~ 0.95 andgap score of ~ 0.002 • Drawback 1: Arbitrariness. Thereisnoone „true“ gap in thespectrum. • Drawback 2: cleargapexistenceofmodules, but: existenceofmodulescleargap Main Advantage: This is a global score, not dependent on a partition
Second try: metastability score • Motivated bytheconceptofmetastabilityofphysicalsystems. • A metastablepartitionof a networkinto a transitionregionTandmodules satisfies: So, the time spent inside the modules should be long, while the time spent inside the transition region should be short.
Metastability score: Analysis For a givennetworkand a givenpartition we can then define the metastability score as . Main Drawback: This is a score for a givenpartition, not global! • The score depends on numberofmodulesm. • Cannotbeoptimized: Foreverypartitionwherewe will have and therefore an optimal score. Main Advantage: Explicitlytakesintoaccountthetransitionregion, moresuitedfor NCC networks.
Experiment results on thetwoscores Fornetworkswithtwomodulesofsize 100 withincreasingdensityand a transitionregionof 1000 nodes.
Intermediate Conclusion • Bothscoreshavesomemajordisadvantages • Currentlydeveloping an improved score: • Takes thetransitionregionintoaccount • isprovably „good“, at least on some well-definednetworkclasses. • dependent on thepartition (so, not global), but thepartitioncanbeoptimized.
Algorithmsforidentifyingmodules in NCC networks Wecomparethebehaviorof 3 algorithms on a benchmarkdatasetof NCC networks: MSM: The Markov State Model clusteringalgorithmfirstidentifiesandremovesthetransitionregion, andthendeterministicallyclusterstheremainingnodes. SCAN: Clusters nodestogetherbased on neighborhoodsimilarityandreachability. Assignsnodestherolesofhubs, outliers. MCL: The Markov Clustering algorithmsimulatesrandomwalks on thenetworkandidentifiesmodulesasregionswheretherandomwalkerstaysfor a long time. Returns a fullpartition.
Adjustingthealgorithms • For SCAN, outliersandhubsareadditionallyassignedtothetransitionregion. Main Adjustment: Nodes in modulesunder a threshold 1% ofnodesareassignedtothetransitionregion.
Howcanweevaluatetheresults? Benchmark networks: • A parameterizedrandomgraphmodelwhere: • modules: ER graphshavingthe same sizeandconnectionprobability • transitionregionis an ER graphwith • The nodes in M andTarethenconnectedw.p • VaryratioofsizeofMtoT, densityofMtoT. • Networks with 1000 nodes, 5 modules.
Howcanweevaluatetheresults? Evaluation Scores: • Comparingthe „groundtruth“ partitionfromtheconstructionofthebenchmarksetwiththepartitionfoundbythealgorithm . • Construct 3 scoresbased on the well-known Rand Index • evaluateshowwellthealgorithms separates thetransitionregionfromthe modular region • measuresthequalityoftheclusteringwithinthe modular region • a combinedscore.
Experiment 1: varyingsizes Comparing score for SCAN, MCL and MSM on networkswithvaryingsizesofT MSM performsbest Metastability score behavessimiliarlytoalgorithm score
Experiment 2: varyingdensitiesofmodulesandtransitionregion Plotting for the 3 algorithms with different combinations of and , along with the gap score
Experiment 3: A PPI network Weidentifiedmodules in theyeast FYI network. Ourmodulescorrespondtoknownproteincomplexesfromthe CYC2008 database. Weareworking on assigningrolestothenodes in thetransitionregion.
Summary and Outlook • Clustering networks such that not all nodesareassignedtomodulesisuseful. • Wepresentedtwoscorestoquantifyhow modular a networkis, andshowedthatthereisroomforimprovement. • Wecomparedtheperformanceof 3 algorithms on thetaskofidentifyingmodules in NCC networks.