1 / 16

RedundancyMiner A novel method of clustering in genomic studies

RedundancyMiner A novel method of clustering in genomic studies. Barry Zeeberg, NCI Hongfang Liu, NCI and GU. Gene Ontology (GO) AmiGO browser Hierarchical organization of categories and mapped genes. High-Throughput GoMiner (HTGM). Typical HTGM result clustered image map (CIM).

alda
Download Presentation

RedundancyMiner A novel method of clustering in genomic studies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RedundancyMinerA novel method of clustering in genomic studies Barry Zeeberg, NCI Hongfang Liu, NCI and GU

  2. Gene Ontology (GO) AmiGO browserHierarchical organization of categoriesand mapped genes

  3. High-Throughput GoMiner (HTGM)

  4. Typical HTGM resultclustered image map (CIM)

  5. Redundancy problem • Because of the hierarchical nature of GO structure, parent-child categories may contain partially redundant gene mappings • This can “inflate” the number of categories in the CIM • Thus obscure the core information content in the CIM • The redundancy itself can be studied to look at fine detail nuanced associations of category clusters

  6. RedundancyMiner (RM) is an attempt to solve that problem • Remove the redundancy from the CIM • Redundancy cause the CIM to be inflated by e.g. 3-fold • Place the redundancy into a META CIM • Study the redundancy as a nuanced themes of association of groups of GO categories

  7. RM paradigm • Similarity metric is probabilistic value based on the number of genes mapped in common to two GO categories • Groups in the META CIM follow a “complete linkage” criterion for a selected threshold of p value

  8. RM overcomes two problems of traditional hierarchical clustering • All objects are put into one cluster or another, even if the object truly is an outlier • Each object can appear in only one cluster, even though it may be related to several clusters

  9. CIM after RM

  10. META CIM

  11. Additional examplegene expression in NCI-60 cell lines • NCI-60 is set of 60 well-studied cancer cell lines • Composed of around 5 or 6 each of around 8 or 9 different cancer types

  12. Problem • Full CIM of 60 cell lines x 20,000 gene expression values is too dense to allow meaningful viewing • Solution is to select sub-portion of CIM based on RM analysis

  13. NCI-60 META CIM based on correlation threshold = 0.20

  14. Sub-CIM of highest correlating genes from group 33 Gene expression values are adjusted z-scores Red = positive z score Green = negative z score

  15. Sub-CIM of highest correlating genes from group 32

  16. Conclusions • RM can remove redundancy from the primary CIM • RM can display the nuanced themes of redundancy structure in the META CIM • The META CIM can be used as the basis of further investigation

More Related