Luke Alden Yancy, Jr. Mentor: Robert Riley Broad Institute of MIT & Harvard Cambridge, MA

Probing the systems biology of Mycobacterium tuberculosis through gene expression and genomic data Luke Alden Yancy, Jr. Mentor: Robert Riley Broad Institute of MIT & Harvard Cambridge, MA

What is Tuberculosis? Source: http://staff.vbi.vt.edu/pathport/pathinfo_images/Mycobacterium_tuberculosis/AerosolTransmission.jpg

The Problem TB mortality, all forms (per 100 000 population per year), By Country, Total, 2006 Source: WHO Stop TB Department, website: www.who.int/tb

Why this study? • Biclustering • Bimax (Prelic et al. 2006) • CC (Cheng and Church, 2000) • Plaid Model (Turner et al. 2003) • Spectral (Kluger et al. 2003) • Xmotifs (Murali and Kasif, 2003) • Traditional Clustering • K-Means (MacQueen, 1967) • Hierarchical (Eisen et al. 1998) • Learn more about Mycobacterium Tuberculosis (Mtb) using analysis of gene expression data

What are clustering and biclustering?

Biclustering vs. Standard Clustering Source: Machine Learning and Its Applications to Biology, Tarca et al. 2007. (Editor: Fran Lewitter, Whitehead Institute)

What did we do? Bimax K-Means Boshoff Data (Processed: 3924 Genes, 359 Experiments) Clusters of Genes Source: The Transcriptional Responses of Mycobacterium tuberculosis to Inhibitors of Metabolism. (Boshoff et al. 2004)

Benchmarking Biclusters Using Operons (proS loci of Mtb ) (N) Significance of overlap k estimated using hypergeometric distribution: Operon Cluster (m) (k) (n) Gene Pair (Source: http://www.nature.com/nature/journal/v409/n6823/full/4091007a0.html)

Algorithm Performance Bimax Biclustering Operon Overlap Source: Prolinks: a database of protein functional linkages derived from coevolution (Bowers et al. 2005)

Problems with Biclustering • Random step – lacks reproducibility • No biological soundness • Artificial arrangement of data • Large data sets produce statistically significant, but small clusters • Practicality • Implementation • Large Input Data Sets

Conclusions & Next Steps • K-Means clustering performs better than biclustering on our data set • Next, use motif recognition methods to identify regulatory motifs in clusters • Further development of improved biclustering algorithms

Acknowledgments • Project Team Robert Riley (Mentor) Brian Weiner • The Broad Institue Eric Lander Core Members SRPG Program Members • Summer Research Program in Genomics (SRPG) • Shawna Young • Bruce Birren • Lucia Vielma • Maura Silverstein

Luke Alden Yancy, Jr. Mentor: Robert Riley Broad Institute of MIT & Harvard Cambridge, MA