260 likes | 387 Views
Merging network patterns: a general framework to summarize biomedical network data. Yang Xiang, David Fuhry, Kamer Kaya, Ruoming Jin, Umit V. Catalyurek, Kun Huang Network Modeling Analysis in Health Informatics and Bioinformatics. Introduction.
E N D
Merging network patterns: a general framework to summarize biomedical network data Yang Xiang, David Fuhry, Kamer Kaya, Ruoming Jin, Umit V. Catalyurek, Kun Huang Network Modeling Analysis in Health Informatics and Bioinformatics
Introduction • Identifying network patterns is an important task in bioinformatics. • Summarization is frequently needed because we often find more patterns than we can focus on. • In this work, we propose merge network patterns, a method that achieves both goals.
Related work: Network Partition • Clustering • Coclustering (biclustering) • Graph partitioning Figure source: Ronhovde et. al., Detection of hidden structures for arbitrary scales in complex physical systems, Scientific Report 2, 2012 http://www.nature.com/srep/2012/120329/srep00329/fig_tab/srep00329_F1.html
Related work: Clique or Biclique generation • Listing all maximal cliques:Bron-Kerbosch algorithm • Listing all maximal bicliques:Frequent closed itemsets + Supporting transactions • Problem: Too many
Related work: Pattern summarization • Hyper and Hyper+ • Problems: • Still not compact enough • No guarantee on the quality of each discovered patterns
Related work: Pattern growing • QCM, eQCM, and other variations • Problems: • Theoretically cannot guarantee covering important patterns. • Running time is too long (O(n5)) for large datasets
Algorithm Framework • Minimize q • Maximum matching • Minimize q and then maximize overall density • Maximum weighted maximum cardinality matching • Maximize overall density with q=p-1 • choosing a pair which obtains the maximum density after the merge operation
Performance study on merging unweighted bipartite network patterns • gene-phenotype dataset • All maximal bicliques • Compare the performance of MultiMerge and SingleMerge • Implemented in C++
Running time of MultiMerge and SingleMerge algorithms for summarizing 1,000 to 10,000 patterns
Number of summarized (outputted) network patterns by MULTIMERGE and SINGLEMERGE algorithms under various β values.
Merging large number of network patterns • When the number of network patterns to be merged increases, SingleMerge and MultiMerge reach memory limitation before the running time becomes unacceptably long. • Why shall we do to handle millions of patterns? The answer is: Batch Processing
Application study on merging weighted network patterns • Gene coexpression network (Spearman Correlation) built on the microarray dataset GSE 2034. • Backbone threshold: 0.6
Backbone and merge 4 1 2 Threshhold=3 1 3 1 3 1 5 2 backbone merge
Results • Clique mining results: 633,725 cliques • Merging results (β=0.7): 1,130 networks • Passing survival tests (GSE2034, GSE1456, NKI, NKI ER-Neg, NKI LN-Pos): 242 networks
Macro patterns • Either setting a low β, or specifying the number of macro patterns desired.
Workflow of network merging unweighted graphs Weighted graphs thresholding Clique or biclique mining algorithms Cliques or bicliques merging Micro patterns Survival tests enrichment merging Macro patterns
Thanks Questions?