160 likes | 283 Views
RedundancyMiner A novel method of clustering in genomic studies. Barry Zeeberg, NCI Hongfang Liu, NCI and GU. Gene Ontology (GO) AmiGO browser Hierarchical organization of categories and mapped genes. High-Throughput GoMiner (HTGM). Typical HTGM result clustered image map (CIM).
E N D
RedundancyMinerA novel method of clustering in genomic studies Barry Zeeberg, NCI Hongfang Liu, NCI and GU
Gene Ontology (GO) AmiGO browserHierarchical organization of categoriesand mapped genes
Redundancy problem • Because of the hierarchical nature of GO structure, parent-child categories may contain partially redundant gene mappings • This can “inflate” the number of categories in the CIM • Thus obscure the core information content in the CIM • The redundancy itself can be studied to look at fine detail nuanced associations of category clusters
RedundancyMiner (RM) is an attempt to solve that problem • Remove the redundancy from the CIM • Redundancy cause the CIM to be inflated by e.g. 3-fold • Place the redundancy into a META CIM • Study the redundancy as a nuanced themes of association of groups of GO categories
RM paradigm • Similarity metric is probabilistic value based on the number of genes mapped in common to two GO categories • Groups in the META CIM follow a “complete linkage” criterion for a selected threshold of p value
RM overcomes two problems of traditional hierarchical clustering • All objects are put into one cluster or another, even if the object truly is an outlier • Each object can appear in only one cluster, even though it may be related to several clusters
Additional examplegene expression in NCI-60 cell lines • NCI-60 is set of 60 well-studied cancer cell lines • Composed of around 5 or 6 each of around 8 or 9 different cancer types
Problem • Full CIM of 60 cell lines x 20,000 gene expression values is too dense to allow meaningful viewing • Solution is to select sub-portion of CIM based on RM analysis
Sub-CIM of highest correlating genes from group 33 Gene expression values are adjusted z-scores Red = positive z score Green = negative z score
Conclusions • RM can remove redundancy from the primary CIM • RM can display the nuanced themes of redundancy structure in the META CIM • The META CIM can be used as the basis of further investigation