1 / 38

V3 Matrix algorithms and graph partitioning

V3 Matrix algorithms and graph partitioning. - Dividing networks into clusters - Graph partitioning - The Kernighan -Lin algorithm - Spectral partitioning. Motivation: Cellular networks are modular!. Adam Arkin / UC Berkeley

ginger
Download Presentation

V3 Matrix algorithms and graph partitioning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. V3 Matrix algorithmsandgraphpartitioning - Dividingnetworksintoclusters - Graph partitioning - The Kernighan-Lin algorithm - Spectralpartitioning Mathematics of Biological Networks

  2. Motivation: Cellularnetworksare modular! Adam Arkin / UC Berkeley Modularityis one way to reconcile the seemingly incompatible objectives of complexity and evolvability. Modularity has been shown to underlie biological function e.g. at the levels of transcription and embryonic development. Over half of all functional modules (in the form of transcriptional modules, protein complexes, and metabolic pathways) have coevolving components. Modularity is hierarchical. Singh … Arkin PNAS (2008) 105, 7500-7505 Mathematics of Biological Networks

  3. Evolutionarymodules in chemotaxis Orthologsof 61 B. subtilischemotaxis genes from 207 microbial species. The resulting gene content matrix was hierarchically clustered along both genes (rows) and species (columns). Genes were then colored according to which dynamic-control role they occupy in the network. The clustering reveals that genes group into 5statistically significant evolutionary modules (A–E). (i) flagellargenes (flg, fli, flh) are conserved among motile bacteria but not among motile Archaea; (ii) the full complement of signal transducers (mcp, tlp) and regulators (che) is absent in many intracellular pathogens. Singh … Arkin PNAS (2008) 105, 7500-7505 Mathematics of Biological Networks

  4. FBA-optimized network on glutamate-rich substrate High-fluxbackbonefor FBA-optimizedmetabolicnetworkofE. coli on a glutamate-richsubstrate. Metabolites (vertices) colouredbluehave at least oneneighbour in common in glutamate- andsuccinate-richsubstrates, andthosecolouredredhavenone. Reactions (lines) arecolouredblueiftheyareidentical in glutamate- andsuccinate-richsubstrates, greenif a different reactionconnectsthe same neighbour pair, andredifthisis a newneighbour pair. Black dottedlinesindicatewherethedisconnectedpathways, forexample, folatebiosynthesis, wouldconnecttotheclusterthrough a link thatis not partofthe HFB. Thus, therednodesand links highlightthepredictedchanges in the HFB whenshiftingE. colifromglutamate- tosuccinate-richmedia. Dashedlinesindicate links tothebiomassgrowthreaction. (1) Pentose Phospate (11) Respiration (2) Purine Biosynthesis (12) Glutamate Biosynthesis (20) Histidine Biosynthesis (3) Aromatic Amino Acids (13) NAD Biosynthesis (21) Pyrimidine Biosynthesis (4) Folate Biosynthesis (14) Threonine, Lysine and Methionine Biosynthesis (5) Serine Biosynthesis (15) Branched Chain Amino Acid Biosynthesis (6) Cysteine Biosynthesis (16) Spermidine Biosynthesis (22) Membrane Lipid Biosynthesis (7) Riboflavin Biosynthesis (17) Salvage Pathways (23) Arginine Biosynthesis (8) Vitamin B6 Biosynthesis (18) Murein Biosynthesis (24) Pyruvate Metabolism (9) Coenzyme A Biosynthesis (19) Cell Envelope Biosynthesis (25) Glycolysis (10) TCA Cycle Almaar et al., Nature 427, 839 (2004) Mathematics of Biological Networks

  5. RNA polymerases I, II and III Again: modular decompositon easier to comprehend than graph Gagneur et al. Genome Biology 5, R57 (2004) Mathematics of Biological Networks

  6. Dividingnetworksintoclusters We like todividetheverticesof a graph so thatvertices in onegrouphavemanyedgestootherverticesinsidethe same groupandonly a fewedgestovertices in othergroups. Network ofCo-authorships in a universitydepartment. Vertices arescientistsandedges link pairsofscientistswhohave co-authoredscientific publications. The networkhasclearclusters or „communities“ thatlikely reflectdivisionsofinterests andresearchgroups. Mathematics of Biological Networks

  7. Graph partitioning Graph partitioningandcommunitydetectionaredistinguishedfromoneanotherbywhetherthenumberandsizeofthegroupsisfixedbytheexperimenterorwhetheritisunspecified. Graph partitioningis a classic problemin computerscience, studiedsincethe 1960s. Itistheproblemofdividingtheverticesof a networkinto a givennumberof non-overlappinggroupsofgivensizes such thatthenumberofedgesbetweengroupsisminimized. Oneimportantapplicationisthe optimal distributionofverticesontocoresof a parallel computer in ordertomimimizetheamountofcommunicationrequiredwhensolvinge.g. a systemofcoupledequations. Mathematics of Biological Networks

  8. Community detection In communitydetection, thenumberandsizeofthegroupsintowhichthenetworkisdividedare not specifiedbytheexperimenter but bythenetworkitself. The goalofcommunitydetectionisto find thenatural fault linesalongwhich a network separates. Mathematics of Biological Networks

  9. Algorithms for graph partitioning Whyispartitioninghard? The simplestgraphpartitioningproblemisthedivisionof a networkinto just 2 parts. This issometimescalledgraphbisection. Ifwecandivide a networkinto2 parts, wecan also divide itfurtherbydividingoneorbothoftheseparts … The graphbisectionproblemistheproblemofdividingtheverticesof a networkinto2 non-overlappinggroupsofgivensizes such thatthenumberofedgesrunningbetweenvertices in different groupsisminimized. The numberofedgesbetweengroupsiscalledthecutsize. Thus, onecouldsimplylookthrough all possibledivisionsofthenetwork into 2 partsandchoosetheonewithsmallestcutsize. Mathematics of Biological Networks

  10. Algorithms for graph partitioning But this exhaustive searchisprohibitively expensive! Given a networkofn vertices. Thereare different ways ofdividingit into 2 groupsofn1andn2vertices. The amountof time tolookthrough all thesedivisions will gouproughlyexponentiallywiththesizeofthesystem. Onlyvaluesofupton = 30 arefeasiblewithcurrentcomputers. In computerscience, either an algorithmcanbe clever andrunquickly, but will failto find the optimal answer in some (andperhapsmost) cases, orit will always find the optimal answer, but takes an impracticallengthof time to do it. Mathematics of Biological Networks

  11. The Kernighan-Lin algorithm This algorithmproposedby Brian KernighanandShen Lin in 1970 isoneofthesimplestandbestknownheuristicalgorithmsforthegraphbisectionproblem. (Kernighanis also oneofthedevelopersofthe C language). (a) The algorithmstartswithanydivisionoftheverticesof a networkintotwogroups (shaded) andthensearchesforpairsofvertices, such asthe pair highlightedhere, whoseinterchangewouldreducethecutsizebetweenthegroups. (b) The same network after interchangeofthe 2 vertices. Mathematics of Biological Networks

  12. The Kernighan-Lin algorithm • Dividetheverticesof a givennetworkinto 2 groups (e.g. randomly) • Foreach pair (i,j) ofvertices, wherei belongstothefirstgroupandj tothesecondgroup, calculatehowmuchthecutsizebetweenthegroupswouldchangeifi andjwereinterchangedbetweenthegroups. • Find the pair thatreducesthecutsizebythelargestamount. Ifno pair reducesit, find the pair thatincreasesitbythesmallestamount. Repeat thisprocess, but withtheimportantrestrictionthateachvertex in thenetworkcanonlybemovedonce. Stopwhenthereisno pair ofverticesleftthatcanbeswapped. Mathematics of Biological Networks

  13. The Kernighan-Lin algorithm (II) • Go back througheverystatethatthenetworkpassedthroughduringtheswappingprocedureandchooseamongthemthestate in whichthecutsizetakesitssmallestvalue. • Performthisentireprocessrepeatedly, startingeach time withthebestdivisionofthenetworkfound in the last round. • Stopwhennoimprovement on thecutsizeoccurs. Note thatifthe initial assignmentofverticestogroupisdonerandomly, theKernighan-Lin algorithmmaygive different answerswhenitisruntwice on the same network. Mathematics of Biological Networks

  14. The Kernighan-Lin algorithm (II) (a) A meshnetworkof 547 verticesofthekindcommonlyused in finite elementanalysis. (b) The bestdivisionfoundbytheKernighan-Lin algorithmwhenthetaskistosplitthenetworkinto 2 groupsofalmostequalsize. This divisioninvolvescutting 40 edges in thismeshnetworkandgivespartsof 273 and 274 vertices. (c) The bestdivisionfoundbyspectralpartitioning (second half of V3). Mathematics of Biological Networks

  15. Runtime of the Kernighan-Lin algorithm The numberofswapsperformedduringoneroundofthealgorithmisequaltothesmallerofthesizesofthetwogroups [0, n / 2]. → in theworstcase, thereareO(n) swaps. Foreachswap, wehavetoexamine all pairsofvertices in different groupstodeterminehowthecutsizewouldbeaffectedifthe pair was swapped. In theworstcase, therearen / 2  n / 2 = n2 / 4 such pairs, whichisO(n2). Mathematics of Biological Networks

  16. RuntimeoftheKernighan-Lin algorithm (ii) When a vertexi movesfromonegrouptotheothergroup, anyedgesconnectingittovertices in itscurrentgroupbecomeedgesbetweengroups after theswap. Letussupposethatarekisame such edges. Similarly, anyedgesthati hastovertices in theothergroup, (saykiotherones) becomewithin-group edges after theswap. Thereisoneexception. Ifi isbeingswappedwithvertexj andtheyareconnectedby an edge, thentheedgeis still betweenthegroups after theswap → thechange in thecutsize due tothemovementofi iskiother - kisame – Aij A similarexpressionappliesforvertexj. → the total change in cutsize due totheswapiskiother - kisame +kjother - kjsame – 2Aij Mathematics of Biological Networks

  17. RuntimeoftheKernighan-Lin algorithm (iii) For a networkstored in adjacencylist form, theevaluationofthisexpressioninvolvesrunningthrough all theneighborsofi andj in turn, andhence takes time on theorderoftheaveragedegree in thenetwork, orO (m/n) withm edges in thenetwork. → the total running time isO ( n  n2  m/n ) = O(mn2) On a sparsenetworkwithm  n, thisisO(n3) On a densenetwork (with) , thisisO(n4) This time still needstobemultipliedbythenumberofroundsthealgorithmisrunbeforethecutsizestopsdecreasing. Fornetworksupto a few 1000 ofvertices, thisnumbermaybebetween 5 and 10. Mathematics of Biological Networks

  18. Spectral partitioning This method was presentedby Fiedler (1973) andmakesuseofthematrixpropertiesofthegraphLaplacian. Againwe will applythisalgorithmtothegraphbisectionproblem. Given a networkofn verticesandm edgesand a divsionintogroup 1 andgroup 2. Wecanwritethecutsizeforthedivisionas The factor ½ compensatesforcountingeachedgetwice in thesum. Mathematics of Biological Networks

  19. Spectral partitioning Let usdefine a setofquantitiessi , oneforeachvertexi, whichrepresentthedivisionofthenetworkthus: Then This allowstorewritethecutsizeas wherethesumnowrunsover all valuesofi andj. Mathematics of Biological Networks

  20. Spectral partitioning The firstterm in thesumis wherekiisthedegreeofvertexi , ijistheKroneckersymbol. Wehave also usedthefactthatandsi2 = 1 Substitutingthis back intotheaboveequationgives or in matrix form, wheresisthevectorwithelementssiand Lij = kiij – Aijistheij-thelementofthegraphLaplacianmatrix. Mathematics of Biological Networks

  21. Insert: graph Laplacian The graphLaplacianisdefined in analogytothediffusionprocess. Diffusion processesarenormallytreatedbythediffusionequation But onecan also considerdiffusionprocessesthattakeplace on networks. There, the rate at whichtheamountofsubstancei at vertexichangesis determinedbytheflowfromandtootherverticesjthatareconnectedtoi. This gives . Bysplittingthe 2 termswecanwrite Mathematics of Biological Networks

  22. Insert: graph Laplacian We canwritethisequation in matrix form where isthevectorwithcomponentsi , Aistheadjacencymatrixand D is a diagonal matrixwiththevertexdegrees on the diagonal. Itiscommontodefine a newmatrixL = D - A so thattheaboveequationtakes on the form oftheordinarydiffusionequation Here, theLaplacianoperatorofthesecondspatial derivatives isreplacedbyL. Mathematics of Biological Networks

  23. Insert: eigenvectorsofthegraphLaplacian Consider an undirectednetworkwithnverticesandmedges. Letusdesignateone end ofeachedgetobeend 1andtheothertobeend 2. Nowletusdefinethem nincidencematrix Bwithelementsasfollows Thus, eachrowofBhasexactlyone +1 andone -1 element. Mathematics of Biological Networks

  24. Insert: eigenvectorsofthegraphLaplacian Let usnowconsiderthesumthatrunsover all edgesforthe vertex pair i andj. Ifi  j , theonly non-zero terms in thissumoccurifbothBkiandBkjare non-zero. In thatcase, edgekconnectsverticesiandj, andtheproduct will be -1. For a simple network, thereis at mostoneedgebetweenany pair ofvertices. Thus, theentiresum will be -1 ifthereis an edgebetweeniandjand 0 otherwise. Ifi =j, thenthesumiswhichcontributes a valueof +1 foreveryedgeconnectedtovertexi, so thewholesumis just equaltodegreekiofvertexi. Mathematics of Biological Networks

  25. Insert: eigenvectorsofthegraphLaplacian Thus thesumispreciselyequalto an elementoftheLaplacian The diagonal elementsLiiareequaltothedegreeskiandthe off-diagonal terms Lijare -1 ifthereis an edge(i,j)andzerootherwiwse. In matrix form, wecanwriteL = BT B Nowletvibe an eigenvectorofLwitheigenvaluei. ThenviTBTBvi = vTi Lvi= ivTivi = i whereweassumedthattheeigenvectorviisnormalizedtothat itsinnerproductwithitselfis 1. Mathematics of Biological Networks

  26. Insert: eigenvectorsofthegraphLaplacian Thus anyeigenvalueioftheLaplacianisequalto (viTBT) (Bvi). This quantityis just theproductof a real vectorwithitself.Itisthesumofthesquaresofthe real elementsofthatvectorandhencecannotbe negative. Itfollowsthat all eigenvaluesofthegraphLaplacianare non-negative. The smallestpossibleeigenvalueis 0. Letusconsiderthevector1 = (1,1,1,…). IfwemultiplythisvectorbytheLaplacian, thei-thelementhasthevalue Thus thevector1isalways an eigenvectorofLwiththesmallestpossibleeigenvalue 0. Mathematics of Biological Networks

  27. Back tospectralpartitioning R was thecutsizeforthedivision, i.e., thenumberofedgesrunningbetweenthe 2 groups: wheresisthevectorwithelementssiandLij = kiij – Aijistheij-thelementofthegraphLaplacianmatrix. The matrixL specifiesthestructureofthenetworkandthevectorsdefines a divisionofthatnetworkintogroups. Ourgoalisto find thevectorsthatminimizesthecutsizeforgivenL. In general, thismimizationproblemis not easy tosolvebecausethevaluessiarerestrictedto +1 and -1. Mathematics of Biological Networks

  28. Relaxation method If thevaluessicouldtake on any real value, wecouldcomputethe derivative, setthistozero, and find theminimum. We will solvethisproblemwiththerelaxationmethodwherewe will relax theserestraints. This isoneofthestandardmethodsforfindingapproximatesolutionsofvectoroptimizationproblems. The allowedvaluesofsiareactuallysubjectto 2 constraints. First, each individual elementsi canonlyhavethevalues +1 and -1. Ifweregardsas a vector in Euclidianspacethenthisconstraintmeansthatthevectoralwayspointstooneofthe 2ncornersof an n-dimensional hypercubecentered on theorigin. Italwayshasthe same length. Mathematics of Biological Networks

  29. Back tospectralpartitioning Let usnow relax theconstraint on thevector‘s direction, so thatitcanpoint in anydirection in itsn-dimensional space. We will however still keepitslengththe same. So s will beallowedtotakeanyvalue, subjecttotheconstraintor The secondconstraint on thesiisthatthenumbersofthemthatareequalto +1 and -1 respectively must beequaltothedesiredsizesofthe 2 groups. Ifthese 2 sizesaren1andn2 , thissecondconstraintcanbewrittenas or in vectornotationwhere1T= (1,1,1, …) Wekeepthissecondconstraintunchanged. Mathematics of Biological Networks

  30. Minimizecutsizesubjecttoconstraints Our taskisthereforetomimizethecutsize subjecttothe 2 constraints WedifferentiateRwithrespecttotheelementssiandenforcetheconstraintsusingtwo Lagrange multiplierswhichwedenoteand 2 Mathematics of Biological Networks

  31. Back tospectralpartitioning Performingthe derivatives, wethen find that or in matrixnotation L s =  s +  1 Wecancalculatethevalueof  byrecallingthat1 is an eigenvectoroftheLaplacianwitheigenvalue 0 so thatL  1 = 1T L = 0. Mathematics of Biological Networks

  32. RelationshiptoeigenvectorsofLaplacian MultiplyingL s =  s +  1 fromtheleftby1Tgives with weget or Ifwedefinethenewvector then In otherwords, xis an eigenvectoroftheLaplacianwitheigenvalue. Mathematics of Biological Networks

  33. Whicheigenvectorshouldbechoose? We are still freetochoosewhicheigenvectoritis. WeshouldchoosetheonethatgivesthesmallestvalueofthecutsizeR. Noticethat Thus, xis orthogonal to1. Whilexshouldbe an eigenvectorofL , itcannotbetheeigenvector (1,1,1,…) thathaseigenvalue 0. Whicheigenvectorshouldwechooseinstead? Mathematics of Biological Networks

  34. Chooseeigenvectorwithsmallesteigenvalue We also have andhence Thus, thecutsizeis proportional totheeigenvalue. SinceourgoalistomimizeR, weshouldchoosextobetheeigenvectorcorrespondingtothesmallestallowedeigenvalueoftheLaplacian. Mathematics of Biological Networks

  35. Optimal networkdivision We havealreadyshownthatx must be orthogonal totheeigenvector1withthesmallesteigenvalue 0. The bestthingwecan do ischoosex proportional toeigenvectorv2correspondingtothesecondlowesteigenvalue. Finallywerecoverthebestvalueforthenetworkdivisions : orequivalently This givesusthe optimal relaxed valueofs. Mathematics of Biological Networks

  36. Network partitioningwithconstraints There is, however, an additional constraintthattheelementsofsshouldhavethevaluesof +1 and -1. Moreover, exactlyn1ofthemshouldbe +1 andn2shouldbe -1. Thus, thevaluesofscannotexactlytake on thevalues. Letusdo thebestwecanandchoosesascloseaspossibletothese ideal values subjecttoitscontraintsbymakingtheproduct as large aspossible. This isachievedbyassigningsi = +1 fortheverticeswiththe larges (i.emost positive) valuesofandthuslargestvaluesofxiandsi = -1 totheremainingones. Mathematics of Biological Networks

  37. Algorithmforspectralpartitioning Thereis a furthersubtlety. Itisarbitrarywhichgroupwecallgroup 1 andwhichwecallgroup 2. Thus, ifthesizesofthetwogroupsare different therearetwo different waysofmakingthesplit. Wesolvethis in a pragmaticway, seebelow. Thus our final algorithmisasfollows: • Calculatetheeigenvectorv2correspondingtothesecondsmallesteigenvalueofthegraphLaplacian. • Sorttheelementsoftheeigenvector in orderfromlargesttosmallest. • Puttheverticescorrespondingtothen1largestelements in group 1, therest in group 2 andcalculatethecutsize. • The puttheverticescorrespondingtothen1smallestelementsin group 1, therest in group 2 andcalculatethecutsize. • Betweenthese 2 divisionsofthenetwork, choosetheonethatgivesthesmallercutsize. Mathematics of Biological Networks

  38. Resultsandcomplexityofspectralpartitioning Forthisnetwork, thecut-size oftheKernighan-Lin algorithmis 40, whereasspectralpartitioningcuts 46 edges. Spectralpartitioningtendsto find divisionsof a networkthathave „therightgeneralshape“. Speed: The time-consumingpartofspectralpartitioningiscomputationoftheeigenvectorwhichtakes time O(mn)orO(n2) on a sparsenetwork. This is an orderofmagnitudebetterthantheKernighan-Lin algorithm (O(n3) ). Mathematics of Biological Networks

More Related