Probing the systems biology of Mycobacterium tuberculosis through gene expression and genomic data

Probing the systems biology of Mycobacterium tuberculosis through gene expression and genomic data Luke Alden Yancy, Jr. Mentor: Robert Riley Broad Institute of MIT & Harvard Cambridge, MA

What is Tuberculosis? Source: http://staff.vbi.vt.edu/pathport/pathinfo_images/Mycobacterium_tuberculosis/AerosolTransmission.jpg

The Problem TB mortality, all forms (per 100 000 population per year), By Country, Total, 2006 Source: WHO Stop TB Department, website: www.who.int/tb

Why this study? • Biclustering • Bimax (Prelic et al. 2006) • CC (Cheng and Church, 2000) • Plaid Model (Turner et al. 2003) • Spectral (Kluger et al. 2003) • Xmotifs (Murali and Kasif, 2003) • Traditional Clustering • K-Means (MacQueen, 1967) • Hierarchical (Eisen et al. 1998) • Learn more about Mycobacterium Tuberculosis (Mtb) using analysis of gene expression data

What are clustering and biclustering?

Biclustering vs. Standard Clustering Source: Machine Learning and Its Applications to Biology, Tarca et al. 2007. (Editor: Fran Lewitter, Whitehead Institute)

What did we do? Bimax K-Means Boshoff Data (Processed: 3924 Genes, 359 Experiments) Clusters of Genes Source: The Transcriptional Responses of Mycobacterium tuberculosis to Inhibitors of Metabolism. (Boshoff et al. 2004)

Benchmarking Biclusters Using Operons (proS loci of Mtb ) (N) Significance of overlap k estimated using hypergeometric distribution: Operon Cluster (m) (k) (n) Gene Pair (Source: http://www.nature.com/nature/journal/v409/n6823/full/4091007a0.html)

Algorithm Performance Bimax Biclustering Operon Overlap Source: Prolinks: a database of protein functional linkages derived from coevolution (Bowers et al. 2005)

Problems with Biclustering • Random step – lacks reproducibility • No biological soundness • Artificial arrangement of data • Large data sets produce statistically significant, but small clusters • Practicality • Implementation • Large Input Data Sets

Conclusions & Next Steps • K-Means clustering performs better than biclustering on our data set • Next, use motif recognition methods to identify regulatory motifs in clusters • Further development of improved biclustering algorithms

Acknowledgments • Project Team Robert Riley (Mentor) Brian Weiner • The Broad Institue Eric Lander Core Members SRPG Program Members • Summer Research Program in Genomics (SRPG) • Shawna Young • Bruce Birren • Lucia Vielma • Maura Silverstein

Probing the systems biology of Mycobacterium tuberculosis through gene expression and genomic data

Probing the systems biology of Mycobacterium tuberculosis through gene expression and genomic data

Presentation Transcript

Mycobacterium tuberculosis -Characteristics

Human gene expression and genomic imprinting

Gene prioritization through genomic data fusion

Mycobacterium tuberculosis

Tuberculosis Mycobacterium tuberculosis

Gene expression systems

Membrane Protein Expression in Mycobacterium Tuberculosis

MYCOBACTERIUM TUBERCULOSIS

Mycobacterium tuberculosis

Mycobacterium Tuberculosis

Human gene expression and genomic imprinting

Mycobacterium tuberculosis

The Mycobacterium tuberculosis SysBorg

Gene expression systems

Understanding Gene Regulation Through Integrated Analysis of Genomic Data

Molecular Biology of Gene Expression

Gene Expression Data

Mycobacterium Tuberculosis

MYCOBACTERIUM TUBERCULOSIS