220 likes | 314 Views
Distinguishing Regulators of Biomolecular Pathways. Mentor: Dr. Xiwei Wu City of Hope. Sean Caonguyen SoCalBSI 8/21/08. Expression Pattern Analysis. Microarray technology is a powerful tool for investigating cellular activity at different levels
E N D
Distinguishing Regulators of Biomolecular Pathways Mentor: Dr. Xiwei WuCity of Hope Sean Caonguyen SoCalBSI 8/21/08
Expression Pattern Analysis • Microarray technology is a powerful tool for investigating cellular activity at different levels • DNA microarrays can be used to identify genetic ‘‘signatures’’ for disease Pan et al. (2005) http://www.sciencedaily.com/images/2007/09/070912102212.jpg
Threshold A Traditional Approach to DNA Microarray Analysis • Individual Gene Analysis • Two step process • Selects genes from an arbitrarily chosen cut-off • From the selected genes, one infers biological meaning of gene expression data Gene Expression Data Gene Selected Biological Interpretation Jiang Z and Gentlemen R. (2006) and Nam D, et al. (2007)
Assess gene set directly Emerging Approach to DNA Microarray Analysis • Gene Set Analysis (GSA) • Rank all genes based on their phenotype association • Calculate a maximal enrichment score for each gene set • Rank each gene set score for biological interpretation Gene Set Database Gene Expression Data Biological Interpretation Jiang Z and Gentlemen R. (2006) and Nam D, et al. (2007)
Biological Significance of Gene Set Analyses • Ability to identify subtle changes in gene expression that are undetectable by traditional approaches • No arbitrary threshold • Generate results that are easier to interpret
Current Problem with GSA • Reduces gene set into a list of names • No difference in up-regulation and down-regulation • Directionality is lost A A up-regulation up-regulation down-regulation B B C E E D D F F G P P HIGHER Suggests a lower probability of pathway activation Suggests that the pathway is activated
Assess gene set directly Enriched Gene Set Analysis Gene Expression Data Gene Set Database Curated Analysis Biological Interpretation
Useful Tools for the Pathway Analysis Program • National Cancer Institutes (NCI) Pathway Interaction Database (http://pid.nci.nih.gov/PID/index.shtml) • contains information about molecular interactions and biological processes in signaling pathways • focuses on cancer research in human cells • searches for biomolecules, processes, or by viewing pathways • Data format • Graphics: SVG or GIF • Texts: XML or BioPax
Key to Icons Segment of the Phosphoinositide 3-Kinases (PI3K) Signaling Pathway XML Script non-lipid kinase pathway of Class IB PI3K
Project Objective • Create a program to distinguish the activators and inhibitors in each signaling pathway • Requires extensive use of XML Parser in Python
Approach to Project • Identify all the elements in the pathway • Record the pairwise interactions • Linking each interaction • Determine the role of each molecule • Finding each leaf node • Using a traceback method A B C E D F G P
1) Identify the Elements in the Pathway • Properly assign each ID to reference a “preferred symbol” • Locate each interaction ID
2) Record the Pairwise Interactions • How to can we store each interaction? • Memory efficient • Easy extraction of data A B C E D F G Sparse Matrix! P
Sparsing Matrix Initialization Regulators A 1 B C 1 -1 Output 1 E D -1 1 F G Sparse Matrix 1 P
3) Determine the Role of Each Molecule Regulators A 1 B C 1 -1 Output 1 E D -1 1 F G 1 P Traceback each leaf node Identify each leaf node
Locate Activated Pathways for Better Biological Interpretation • Gene Expression Data • Up-regulation of B and D • Down-regulation of E • Enriched Gene Set Analysis A B B down-regulation C up-regulation E E D D F G P Possible activation of Pathway
Results • For each pathway menu, one can: • find a list of proteins with associated roles for each node • look at each protein in an interaction • find a list of all interactions in a pathway
Conclusion • Successfully parse XML files • Pathway analysis program works • ~50% of pathways include inhibitors • 20% of the pathways contains >=5% of inhibitors • Average total molecules = 60
Future Directions • Improvements to Software • Ambiguous roles • Proteins in different Complex may have different roles • Fine tune the overall role of proteins in each pathway • Run program with real expression data set • Improve prognoses and drugs for diseases A B C E D F G P
References • Pan KH, Lih Cj, Cohen SN. Effects of threshold choice on biological conclusions reached during analysis of gene expression by DNA microarrays. Proc Natl Acad Sci 2005, 102:8961-5. • Subramanian A, Tamayo P, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci 2005, 102:15545-50. • Nam D, Kim SY. Gene-set approach for expression pattern analysis. Brief Bioinform 2008, 9:189-97. • Dupuy A, Simon RM. Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J Natl Cancer Inst 2007, 99:147-57. • Jiang Z, Gentleman R. Extensions to gene set enrichment. Bioinformatics 2007,23:306-13. • Dinu I, Potter JD, et al. Improving gene set analysis of microarray data by SAM-GS. BMC Bioinformatics 2007, 8:242. • Liu Q, Dinu I, et al. Comparative evaluation of gene-set analysis methods. BMC Bioinformatics 2007,8:431.
Acknowledgements • Mentor • Xiwei Wu • SoCalBSI Faculty and Staff • Jamil Momand • Sandy Sharp • Nancy Warter-Perez • Wendie Johnston • Funding for SoCalBSI: • DOE and NASA • LA / Orange County Biotechnology Center • NSF, NIH, and Economic & Workforce Development • Funding at City of Hope: • National Cancer Institute • National Institute of Health