1 / 23

Parameter-free Hierarchical Co-Clustering by n -Ary Splits

University of Turin , Italy Department of Computer Science. Parameter-free Hierarchical Co-Clustering by n -Ary Splits. Dino Ienco, Ruggero G. Pensa and Rosa Meo { ienco ,pensa, meo }@di.unito.it. ECML-PKDD 2009 – Bled (Slovenia).

cargan
Download Presentation

Parameter-free Hierarchical Co-Clustering by n -Ary Splits

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UniversityofTurin, Italy Departmentof Computer Science Parameter-freeHierarchicalCo-Clusteringbyn-ArySplits Dino Ienco, Ruggero G. Pensa and Rosa Meo {ienco,pensa,meo}@di.unito.it ECML-PKDD 2009 –Bled (Slovenia)

  2. Parameter-freeHierarchicalCo-Clusteringbyn-ArySplits Outline MotivationsOur IdeaCo-Clustering and BackgroundHierachicalCo-ClusteringResultsConclusions ECML-PKDD 2009 –Bled (Slovenia)

  3. Parameter-freeHierarchicalCo-Clusteringbyn-ArySplits MOTIVATIONS Motivations Co-Clustering: - effectiveapproachthatobtainsinterestingresults - Commonlyinvolvedwithhigh-dimensional data - Partitionsimultaneouslyrows and columns ECML-PKDD 2009 –Bled (Slovenia)

  4. Parameter-freeHierarchicalCo-Clusteringbyn-ArySplits MOTIVATIONS ManyCo-clusteringalgorithms: - Spectralapproach (Dhillonet al. KDD01) - Information theoreticapproach(Dhillonet al. KDD03) - Minimum Sum-Squared Residue approach(Choet al. SDM04 ) - Bayesianapproach(Shanet al. ICDM08) ECML-PKDD 2009 –Bled (Slovenia)

  5. Parameter-freeHierarchicalCo-Clusteringbyn-ArySplits MOTIVATIONS Allprevioustechniques: - requirenum. ofrow/column cluster asparameter - produce flatpartitions, withoutanystructure information ECML-PKDD 2009 –Bled (Slovenia)

  6. Parameter-freeHierarchicalCo-Clusteringbyn-ArySplits MOTIVATIONS In general: - parameters are difficultto set - structured output (likehierarchies) help the usertounderstand dataHierarchicalstructures are usefulto: - indexing and visualize data - explore the parent-childrelationships - derive generalization/specializationconcept ECML-PKDD 2009 –Bled (Slovenia)

  7. Parameter-freeHierarchicalCo-Clusteringbyn-ArySplits OUR IDEA Our Idea CO-CLUSTERING Buildtwohierarchies on bothdimensionssimultaneously ALLOWS + HIERARCHICAL APPROACH • PROPOSED APPROACH: • - Extendpreviousflatco-clusteringalgorithm (Robardet02) • tohierarchicalsetting ECML-PKDD 2009 –Bled (Slovenia)

  8. UniversityofTurin, Italy Departmentof Computer Science CO-CLUSTERING Background τ-CoClust (Robardet02):- Co-Clustering for counting or frequency data - No number of row/column clustering needed - Maximize a statistical measure Goodman and Kruskal τbetween row and column partitions ECML-PKDD 2009 –Bled (Slovenia)

  9. UniversityofTurin, Italy Departmentof Computer Science CO-CLUSTERING Goodman and Kruskal τ : - Measure the proportional reduction in the prediction error of a dep. Variablegivenanindep. Variable CO1 = {O1,O2} CF1 ={F2} CO2 = {O3,O4} CF2= {F1,F3} ECML-PKDD 2009 –Bled (Slovenia)

  10. UniversityofTurin, Italy Departmentof Computer Science CO-CLUSTERING Goodman and Kruskal τ : - Measure the proportional reduction in the prediction error of a dep. Variablegivenanindep. Variable ECO Predictionerror on CO without knowledgeabout CF partition Predictionerror on COwith knowledgeabout CF partition ECO|CF ECML-PKDD 2009 –Bled (Slovenia)

  11. Parameter-freeHierarchicalCo-Clusteringbyn-ArySplits CO-CLUSTERING Optimization strategy: - τ is asymmetrical, for this reason the algorithm alternates the optimization of two functions τCO|CF and τCF|CO- Stochastic optimization (example on rows): # Start with an initial parition on rows for i in 1..n_times # augment the current partition with an empty cluster # Move at random one element from a partition to another one# If obj. func. improve keep solution, else undo the operation # If there is an empty cluster, removeit end - Thisoptimizationallows the num. ofclusterstogrow or decrease - In (Robardet02) anefficient way to update incrementally the objectivefunctionwasintroduced ECML-PKDD 2009 –Bled (Slovenia)

  12. Parameter-freeHierarchicalCo-Clusteringbyn-ArySplits HIERARCHICAL CO-CLUSTERING HIERARCHICAL CO-CLUSTERING HiCC:- Hierarchical Co-Clustering algorithm that extends τ-CoClust - Divisive Approach - No parameter settings needed- No predefined number of splits for each node of the hierarchy ECML-PKDD 2009 –Bled (Slovenia)

  13. Parameter-freeHierarchicalCo-Clusteringbyn-ArySplits HIERARCHICAL CO-CLUSTERING HiCC:At the beginning use τ-CoClust repeat - From the current Row/Column partitions - Fix the Column partition - For each cluster in the Row partition Re-cluster with τ-CoClust and optimize the obj. func. τCO|CF-Update Row Hierarchy-Fix the new Row partition-For each cluster in the Column partition Re-cluster with τ-CoClust and optimize the obj. func. τCF|newCO-Update Column Hierarchy until (TERMINATION) ECML-PKDD 2009 –Bled (Slovenia)

  14. Parameter-freeHierarchicalCo-Clusteringbyn-ArySplits HIERARCHICAL CO-CLUSTERING A SIMPLE EXAMPLE Goes on …untilterminationconditionissatisfied ECML-PKDD 2009 –Bled (Slovenia)

  15. Parameter-freeHierarchicalCo-Clusteringbyn-ArySplits RESULTS Experimentation: - No previoushierarchicalco-clusteringalgorithmexists - Use a flatco-clusteringalgorithmwith the samenumberofclustersobtainedbyourapproachforeachlevel - Wechoose Information theoreticapproach (KDD03) and foreachlevelweperform 50 runsthenwecompute the average - Weusedocument-worddatasetto validate ourapproach: * OHSUMED (collectionofpubmedabstract) {oh0, oh15} * REUTERS-21578 (collected and labeled by Carnegie Group) {re0, re1} * TREC (Text RetrievalConference) {tr11, tr21} ECML-PKDD 2009 –Bled (Slovenia)

  16. Parameter-freeHierarchicalCo-Clusteringbyn-ArySplits RESULTS An exampleofrowhierachy on OHSUMED Welabeleach cluster with the majorityclass ECML-PKDD 2009 –Bled (Slovenia)

  17. Parameter-freeHierarchicalCo-Clusteringbyn-ArySplits RESULTS An exampleofcolumnhierachy on REUTERS Welabeleach cluster with top 10 wordsrankedbymutual information ECML-PKDD 2009 –Bled (Slovenia)

  18. Parameter-freeHierarchicalCo-Clusteringbyn-ArySplits RESULTS External Validation Indices: - Purity - Normalized Mutual Information (NMI) - Adjusted Rand IndexHierarchical setting:We combine the hierarchical result with this formula - is one of the external validation indices - is a weight for the hierarchy level i, in our case αi is equal to 1/i ECML-PKDD 2009 –Bled (Slovenia)

  19. Parameter-freeHierarchicalCo-Clusteringbyn-ArySplits RESULTS Performance Results ECML-PKDD 2009 –Bled (Slovenia)

  20. Parameter-freeHierarchicalCo-Clusteringbyn-ArySplits RESULTS Performance Results on re1 dataset ECML-PKDD 2009 –Bled (Slovenia)

  21. Parameter-freeHierarchicalCo-Clusteringbyn-ArySplits CONCLUSIONS We propose: - New approachtohierarchicalco-clustering - Parameter free -No apriorifixednumberofsplits (n-arysplits) - Obtainsgoodresults • -Buildssimultaneouslyhierarchies on bothdimensions • - Improveco-clusteringresultsexploration Conclusions ECML-PKDD 2009 –Bled (Slovenia)

  22. Parameter-freeHierarchicalCo-Clusteringbyn-ArySplits CONCLUSIONS • Future works: • - Parallelize the algorithmtoimprovetime performance • - Pushingconstraints inside ittouse background knowledge • - Extend the frameworktomanagecontinuous data ECML-PKDD 2009 –Bled (Slovenia)

  23. Parameter-freeHierarchicalCo-Clusteringbyn-ArySplits AnyQuestion? Thankyouforyourattention ECML-PKDD 2009 –Bled (Slovenia)

More Related