120 likes | 250 Views
Probing the systems biology of Mycobacterium tuberculosis through gene expression and genomic data. Luke Alden Yancy, Jr. Mentor: Robert Riley Broad Institute of MIT & Harvard Cambridge, MA. What is Tuberculosis?.
E N D
Probing the systems biology of Mycobacterium tuberculosis through gene expression and genomic data Luke Alden Yancy, Jr. Mentor: Robert Riley Broad Institute of MIT & Harvard Cambridge, MA
What is Tuberculosis? Source: http://staff.vbi.vt.edu/pathport/pathinfo_images/Mycobacterium_tuberculosis/AerosolTransmission.jpg
The Problem TB mortality, all forms (per 100 000 population per year), By Country, Total, 2006 Source: WHO Stop TB Department, website: www.who.int/tb
Why this study? • Biclustering • Bimax (Prelic et al. 2006) • CC (Cheng and Church, 2000) • Plaid Model (Turner et al. 2003) • Spectral (Kluger et al. 2003) • Xmotifs (Murali and Kasif, 2003) • Traditional Clustering • K-Means (MacQueen, 1967) • Hierarchical (Eisen et al. 1998) • Learn more about Mycobacterium Tuberculosis (Mtb) using analysis of gene expression data
Biclustering vs. Standard Clustering Source: Machine Learning and Its Applications to Biology, Tarca et al. 2007. (Editor: Fran Lewitter, Whitehead Institute)
What did we do? Bimax K-Means Boshoff Data (Processed: 3924 Genes, 359 Experiments) Clusters of Genes Source: The Transcriptional Responses of Mycobacterium tuberculosis to Inhibitors of Metabolism. (Boshoff et al. 2004)
Benchmarking Biclusters Using Operons (proS loci of Mtb ) (N) Significance of overlap k estimated using hypergeometric distribution: Operon Cluster (m) (k) (n) Gene Pair (Source: http://www.nature.com/nature/journal/v409/n6823/full/4091007a0.html)
Algorithm Performance Bimax Biclustering Operon Overlap Source: Prolinks: a database of protein functional linkages derived from coevolution (Bowers et al. 2005)
Problems with Biclustering • Random step – lacks reproducibility • No biological soundness • Artificial arrangement of data • Large data sets produce statistically significant, but small clusters • Practicality • Implementation • Large Input Data Sets
Conclusions & Next Steps • K-Means clustering performs better than biclustering on our data set • Next, use motif recognition methods to identify regulatory motifs in clusters • Further development of improved biclustering algorithms
Acknowledgments • Project Team Robert Riley (Mentor) Brian Weiner • The Broad Institue Eric Lander Core Members SRPG Program Members • Summer Research Program in Genomics (SRPG) • Shawna Young • Bruce Birren • Lucia Vielma • Maura Silverstein