1 / 33

A Combinatorial Approach to the Analysis of Differential Gene Expression Data

A Combinatorial Approach to the Analysis of Differential Gene Expression Data. The Use of Graph Algorithms for Disease Prediction and Screening. The Goal. To classify patients based on expression profiles Presence of cancer Type of cancer Response to treatment

gage-macias
Download Presentation

A Combinatorial Approach to the Analysis of Differential Gene Expression Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Combinatorial Approach to the Analysis of Differential Gene Expression Data The Use of Graph Algorithms for Disease Prediction and Screening

  2. The Goal • To classify patients based on expression profiles • Presence of cancer • Type of cancer • Response to treatment • To identify the genes required for accurate classification • Too many = unnecessary noise • Too few = insufficient information

  3. Classic Clustering Problem • Current techniques: • Hierarchical Clustering • K-Means Clustering • Self-Organizing Maps • Others • Drawbacks: • Determining cluster boundaries difficult with diffuse data • Objects can only belong to one group

  4. Algorithmic Training Raw Data Gene Scoring Dominating Set Eliminate Poorly Discriminating Genes Eliminate Poorly Covering Genes Calculate Sample Similarities Apply Threshold Maximal Cliques Verify by Classification Set of Discriminatory Genes Gene Scores

  5. Algorithmic Training Raw Data Eliminate Poorly Discriminating Genes

  6. The Gene Scoring Function: Identifying Discriminators vs.

  7. Algorithmic Training Raw Data Eliminate Poorly Discriminating Genes Eliminate Poorly Covering Genes

  8. Eliminate Poorly Covering Genes Samples Genes Class 2 Class 1

  9. Algorithmic Training Raw Data Eliminate Poorly Discriminating Genes Eliminate Poorly Covering Genes Calculate Sample Similarities Apply Threshold

  10. Create Unweighted Graph • Complete, edge-weighted graph • Vertices = samples • Edge weight = similarity metric • Remove edge weights • If edge weight < threshold, remove edge from graph • Otherwise, keep edge, ignore weight • Result: incomplete unweighted graph

  11. The Edge Weight Function where, expression valueij = expression value of genei for samplej

  12. Algorithmic Training Raw Data Eliminate Poorly Discriminating Genes Eliminate Poorly Covering Genes Calculate Sample Similarities Apply Threshold Verify by Classification Set of Discriminatory Genes Gene Scores

  13. What is a Clique? • A completely connected subset of vertices in a graph • Maximal clique = local optimization • NP-complete

  14. Classification Using Clique GRAPH Class 1 Class 2 Class 1 Class 3 Class2

  15. A Selection of Discriminators

  16. The Algorithm - Unsupervised Raw Data Set of Discriminatory Genes, Scores Calculate Sample Similarities Apply Threshold Classify Unknown Samples

  17. Summary • Intersection of clique and dominating set techniques improves results • Combined orthogonal scoring identifies limited number of discriminatory genes • Clique offers means of validating obtained scores and weights • Our technique identifies differing set of discriminatory genes from original paper • Clique-based classification a viable complement to present clustering methods

  18. Ongoing and Future Research • Reverse Training • Train to distinguish among types of cancer • Experiment with different weight functions (ex. Pearson’s coefficient) • Investigate using less stringent techniques • Near-cliques • Neighborhood search • K-dense subgraphs • Port codes to SGI Altix supercomputer

  19. Our Research Group Mike Langston, Ph. D. Lan Lin Chris Symons Xinxia Peng Bing Zhang, Ph. D.

More Related