1 / 19

Finding Modules in Networks with Non-modular Regions

Finding Modules in Networks with Non-modular Regions. Sharon Bruckner, Bastian Kayser, Tim Conrad Freie Uni. Berlin. What are networks with non-modular regions ?. Can every network be fully partitioned into dense clusters ? We introduce NCC networks . Modular region

jonco
Download Presentation

Finding Modules in Networks with Non-modular Regions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding Modules in Networks with Non-modular Regions Sharon Bruckner, Bastian Kayser, Tim Conrad Freie Uni. Berlin

  2. Whatarenetworkswith non-modular regions? Can everynetworkbefullypartitionedintodenseclusters? WeintroduceNCC networks. Modular region Transition region

  3. Where do NCC networksoccur? • The networkisactuallyfullpartitionable, but containsnoise. • The networkstructureis not strictlymodular:Nodes • Overlaps • Outliers • Pathsandnodesconnectingmodules • Example: A protein-protein interactionnetwork

  4. Formalizingthisnotionofmodularity Then: What‘swrongwithsimplytakingNewman Modularity? Answer: Treesandverysparsegraphshave high Newman Modularity Goal: A score thatquantifieshow modular an NCC networkis, analogoustoNewman‘sModularity. 0.75

  5. Transition matricesandgaps • In a discreterandomwalk on a network, therandomwalkermovesateachsteptorandomlychosenneighbor. This isencoded in thetransitionmatrix. • Clustering algorithmsoftenrely on thegap in theeigenvaluespectrumofthetransitionmatrix. • The presenceofeigenvaluescloseto 1, followedby a gap, indicatethepresenceofmodules. • For NCC networks, thisis not enough. Thereforeweintroduce a transitionmatrix Pt, thatcomesfrom a continuousrandomwalk.

  6. First try: thegap score • Compute . • Find thelargestdifferencebetweentwoconsecutiveeigenvalues. • Return as score. Exploittheconnectionbetweenthenumberofeigenvaluescloseto 1 ofthetransitionmatrixandthenumberofmodules.

  7. Gap score: Analysis andresults • Sparse networks do not have a high gap score • Example: roadnetworks, with Newman modularity ~ 0.95 andgap score of ~ 0.002 • Drawback 1: Arbitrariness. Thereisnoone „true“ gap in thespectrum. • Drawback 2: cleargapexistenceofmodules, but: existenceofmodulescleargap Main Advantage: This is a global score, not dependent on a partition

  8. Second try: metastability score • Motivated bytheconceptofmetastabilityofphysicalsystems. • A metastablepartitionof a networkinto a transitionregionTandmodules satisfies: So, the time spent inside the modules should be long, while the time spent inside the transition region should be short.

  9. Metastability score: Analysis For a givennetworkand a givenpartition we can then define the metastability score as . Main Drawback: This is a score for a givenpartition, not global! • The score depends on numberofmodulesm. • Cannotbeoptimized: Foreverypartitionwherewe will have and therefore an optimal score. Main Advantage: Explicitlytakesintoaccountthetransitionregion, moresuitedfor NCC networks.

  10. Experiment results on thetwoscores Fornetworkswithtwomodulesofsize 100 withincreasingdensityand a transitionregionof 1000 nodes.

  11. Intermediate Conclusion • Bothscoreshavesomemajordisadvantages • Currentlydeveloping an improved score: • Takes thetransitionregionintoaccount • isprovably „good“, at least on some well-definednetworkclasses. • dependent on thepartition (so, not global), but thepartitioncanbeoptimized.

  12. Algorithmsforidentifyingmodules in NCC networks Wecomparethebehaviorof 3 algorithms on a benchmarkdatasetof NCC networks: MSM: The Markov State Model clusteringalgorithmfirstidentifiesandremovesthetransitionregion, andthendeterministicallyclusterstheremainingnodes. SCAN: Clusters nodestogetherbased on neighborhoodsimilarityandreachability. Assignsnodestherolesofhubs, outliers. MCL: The Markov Clustering algorithmsimulatesrandomwalks on thenetworkandidentifiesmodulesasregionswheretherandomwalkerstaysfor a long time. Returns a fullpartition.

  13. Adjustingthealgorithms • For SCAN, outliersandhubsareadditionallyassignedtothetransitionregion. Main Adjustment: Nodes in modulesunder a threshold 1% ofnodesareassignedtothetransitionregion.

  14. Howcanweevaluatetheresults? Benchmark networks: • A parameterizedrandomgraphmodelwhere: • modules: ER graphshavingthe same sizeandconnectionprobability • transitionregionis an ER graphwith • The nodes in M andTarethenconnectedw.p • VaryratioofsizeofMtoT, densityofMtoT. • Networks with 1000 nodes, 5 modules.

  15. Howcanweevaluatetheresults? Evaluation Scores: • Comparingthe „groundtruth“ partitionfromtheconstructionofthebenchmarksetwiththepartitionfoundbythealgorithm . • Construct 3 scoresbased on the well-known Rand Index • evaluateshowwellthealgorithms separates thetransitionregionfromthe modular region • measuresthequalityoftheclusteringwithinthe modular region • a combinedscore.

  16. Experiment 1: varyingsizes Comparing score for SCAN, MCL and MSM on networkswithvaryingsizesofT MSM performsbest Metastability score behavessimiliarlytoalgorithm score

  17. Experiment 2: varyingdensitiesofmodulesandtransitionregion Plotting for the 3 algorithms with different combinations of and , along with the gap score

  18. Experiment 3: A PPI network Weidentifiedmodules in theyeast FYI network. Ourmodulescorrespondtoknownproteincomplexesfromthe CYC2008 database. Weareworking on assigningrolestothenodes in thetransitionregion.

  19. Summary and Outlook • Clustering networks such that not all nodesareassignedtomodulesisuseful. • Wepresentedtwoscorestoquantifyhow modular a networkis, andshowedthatthereisroomforimprovement. • Wecomparedtheperformanceof 3 algorithms on thetaskofidentifyingmodules in NCC networks.

More Related