1 / 41

Characterizing Gene Functional Expression Profiles

Characterizing Gene Functional Expression Profiles. Zoran Obradovic Slobodan Vucetic Hongbo Xie, Hao Sun, Pooja Hedge Information Science and Technology Center, Temple University. Outline. Microarray Data Analysis Process Functional Expression Profile Analysis

Download Presentation

Characterizing Gene Functional Expression Profiles

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Characterizing Gene Functional Expression Profiles Zoran Obradovic Slobodan Vucetic Hongbo Xie, Hao Sun, Pooja Hedge Information Science and Technology Center, Temple University

  2. Outline • Microarray Data Analysis Process • Functional Expression Profile Analysis • Functional Expression Profile Ranking • Functional Expression Profile Clustering • Functional Characterization of • Plasmodium Falciparum, • Saccharomyces Cerevisiae, • Mus Musculus and • Homo Sapiens

  3. What is a DNA Microarray? DNA microarray technology allows measuring expressions for tens of thousands of genes at a time Analysis of Replicated Experiments Gordon Smyth, Walter and Eliza Hall Institute

  4. equal expression higher expression in Cy3 higher expression in Cy5 Scanning/Signal Detection Cy3 channel Cy5 channel

  5. Microarray Data Analysis Process • Designing gene expression experiments • Image processing and analysis • Preprocessing raw intensity data • Discovering differentially expressed genes • Advanced analysis • Finding relevant pathways • Discovering gene expression patterns • Understanding gene functions More information: • www.ist.temple.edu/research/biocore.html

  6. Designing Gene Expression Experiments reference design loop design Design experiment A saturated design Comparative designing http://discover.nci.nih.gov/microarrayAnalysis/Experimental.Design.jsp

  7. Image Processing and Analysis(figure is obtained using Imagene software)

  8. Preprocessing Raw Intensity Data normalize Analysis of Replicated Experiments Gordon Smyth, Walter and Eliza Hall Institute

  9. Discovering Differentially Expressed Genes • Fold change (log ratio) • Statistics methods 1)T-test 2)ANOVA 3)Non-parametric analysis Wilcoxon Rank-Sum Test

  10. Advanced Analysis: Finding Relevant Pathways(figure is obtained using Ingenuity software)

  11. Advanced Analysis: Discovering Gene Expression Patterns • Plasmodium Falciparumintraerythrocytic developmental cycle • Genes are sorted based on expression time peaks Bozdech Z et al., PLoS Biol. 2003 Oct;1(1))

  12. Advanced Analysis: Identifying Unknown Gene Functions Based on Expression Profiles Is this alignment reliable ? • Standard practice: • Basic Assumption:Expression profiles of functionally related genes are correlated • Objectives:Confirm a specific biological hypothesis; predict functional properties of less characterized genes; or uncover new/unexpected biological knowledge • Methodology:clustering genes based on similarity of their expression profiles; followed by functional analysis of the obtained clusters Gene 2 expression profile with function B Unknown sequence has high correlation With gene 1 expression profile Unknown sequence Tag Gene 1 expression profile with function A Functions ? Sequence Tag has function A

  13. Problems with old approaches • Genes with same function do not necessarily have the same expression profiles • Clustering on all genes expression profiles could be unreliable

  14. Our Approach: Analyzing Microarray Functional Expression Profiles (FEP)FEPs:Compute FEP as the average profile of all genes associated with a given highly correlated GO term Advanced Analysis: Identifying Unknown Gene Functions Based on Expression Profiles GO:0004721 : phosphoprotein phosphatase activity GO:0016311 : Dephosphorylation

  15. Questions that we address: • How to perform functional analysis in an objective manner • How to estimate biological significance of discovers

  16. Tools and Applications • Developed tools to identify: • (1) Explore which functions have the conserved expression profiles (Tool 1: functional expression profile ranking package) • (2) Explore which functions have similar expression profiles and test of their functional similarity (Tool 2: functional expression profile clustering package) • Applications: • Functional characterization of gene expression related to Intraerythrocytic Developmental Cycle of Plasmodium Falciparum, Saccharomyces Cerevisiae, Mus Musculus and Home Sapiens

  17. Tools Architecture Microarray raw data Report List of significantly correlated GO terms Clusters of functional Expression profiles Gene function annotation database Data pre- processing Functional expression profile ranking Functional expression profile clustering Gene Function Semantic Distance Mapping Space

  18. Tool 1: Functional Expression Profile (FEP) Ranking Package • Objective: • Identify genes with same function having correlated expression profiles • Task: • Evaluate gene expression correlation within each FEP • Methodology • Step 1: calculate average pairwise correlation coefficient S among n gene expression profiles for a given function term • Step 2: randomly select n genes from the whole dataset and compute average pairwise correlation coefficient S’ • Step 3: repeated Step 2 m times (m>10,000) and compare the distribution S’ to the original S to evaluate p-value

  19. Dataset 1: Plasmodium Falciparum Intraerythrocytic Developmental Cycle Objective: Identification of P.falciparum genes whose RNA levels vary periodically within the asexual intraerythrocytic developmental cycle (IDC) transcriptom Materials: 5080 ORFs, 3532 unique genes, 46 assays (sampled in time) using cDNAs Methods: Permutation test with Fast Fourier Transform alg. and correlations Found: 60% of genes transcriptionally active and most genes only active once during the IDC Figure: Major morphological stages during the IDC and 2712 genes’ transcriptional profiles (Bozdech Z et al., (2003) PLoS Biol. Oct; 1(1))

  20. Objective: Identification of yeastgenes whose RNA levels vary periodically within cell cycle process Materials: 6178 ORFs, 4450 unique genes, 77 assays (sampled in time) using cDNAs Methods: Periodicity and correlation algorithm Found: Identified 800 genes that meet an objective minimum criterion for cell cycle regulation Figure : The M/G1 clusters Dataset 2: Saccharomyces Cerevisiae Cell Cycle(Spellman et al., (1998) Molecular Biology of the Cell 9, 3273-3297)

  21. Objective: Identification of human genes whose RNA levels vary periodically within cell cycle process Materials: 6800 ORFs, 5795 unique genes, 14 assays (sampled in time) Using affymatrix arrays Methods: Fold change Found: 700 genes that display transcriptional fluctuation with a periodicity consistent with that of the cell cycle Figure: Clustering analysis of cell-cycle–regulated transcripts Dataset 3: Homo Sapiens Cell Cycle(R.Cho, et al (2001) Nature, 27)

  22. Objective: Analysis of gene regulation during the mammalian cell cycle Materials: 6347 unique genes, 14 assays Methods: Clustering Found: Identified 7 distinct clusters of genes that exhibit unique patterns of expression Figure: Patterns of gene expression following growth stimulation and during the mammalian cell cycle DataSet 4: Mus Musculus Cell Cycle(Ishida, S et al (2001) Mol. Cell. Biol. 21, 4684-4699 )

  23. Applying FEP Ranking Package:Cumulative Distributions of GO Term p-Values of Human, Yeast, Mouse and P.F.

  24. Applying FEP Ranking Package: GO Terms with the Most Conserved FEP Among Multi-organisms

  25. Applying FEP Ranking Package: Selection of GO Terms with Significantly Correlated Expression Patterns at Plasmodium Falciparum Developmental Cycle Data Cumulative distribution of p-values for GO termsassociated with at least two genes GO:0016311 : Dephosphorylation GO: 0007028: cytoplasm Organization and biosynthesis Selected: 46% functions of all function GO terms are significantly correlated 52% processes of all process GO terms are significantly correlated

  26. Plasmodium Falciparum: Processes and Functions with the Highest/Lowest Correlation Highest correlation Lowest correlation

  27. Plasmodium Falciparum: Findings by FEP Ranking Package • Of 12 FEPs referenced by Bozdech et al, two have p-value larger than 0.05. • E.g. the average correlation coefficient among genes associated with Robonucleotide Synthesis function is only 0.258 (p-value = 0.11) which weakens the claim that is related to the Ring stage of IDC. • No linear relationship were found between number of genes associated with a given GO term and average correlation coefficient among these genes • Ranking of GO terms based on p-value could be useful in rapid identification of functions that are closely related with a specific developmental stage (of Plasmodium Falciparum)

  28. All Datasets: Findings by FEP Ranking Package • To some extent genes with identical functions have similar expression profiles • However, a large fraction of functions do not follow the underlying hypothesis! • Higher level organisms seem to have lower fraction of significantly correlated expression profiles for identical functions. • Fractions of correlated FEPs: • Saccharomyces Cerevisiae: 59% (643/1,083)* • Plasmodium Falciparum: 48.4% (428/ 884) • Homo Sapiens: 16.4% (249/1514) • Mus musculus: 13.3% (182/1366) *fractions are for both processes and functions

  29. Tool 2: FEP Clustering Package • Objective: • Identifying genes with similar functions and similar expression profiles • Tasks: • Cluster FEPs selected by FEP ranking package • Evaluate found clusters for biological relevance by • Identifying similar functions based on GO term hierarchy tree structure • Evaluating inter-cluster GO term distance • Methodology • Randomly generate k sets each containing same number of GO terms as the corresponding cluster • Calculate total GO term distance within each generated set and sum total distance of all sets to get the overall score S’ • Repeat the procedure 1000 times and compare the distribution S’ to the overall distance obtained through clustering

  30. Structure of GO Term Tree (Example) GO:0008150 : Biological Process Level 1 GO:0007275 : development GO:0007582 : physiological process Level 2 GO:0007389 : pattern specification GO:0008152 : metabolism Level 3 GO:0000003 : reproduction GO:0009798 : axis specification Level 4 GO:0009948 : anterior/posterior axis specification Level 5 • Measuring Distance of GO Terms -- length of the minimal chain between X and Y terms in GO tree -- is length of maximal chain from the top to the bottom

  31. Determination of Number of Clusters • Measured • Larger z-score indicates a better grouping of functions within clusters.

  32. Number of Clusters vs Z-score: Results for Plasmodium Falciparum Plasmodium Falciparum biological processes number of clusters vs z-scores Plasmodium Falciparum molecular function number of clusters vs z-scores

  33. Applying FEP Clustering Package:Results on Plasmodium Falciparum Processes k-mean clustering profiles of FEPs for 238 identified processes 1 2 Cluster vs Stage of IDC 3 4

  34. Applying FEP Clustering Package:Results on Plasmodium Falciparum Functions k-means clustering profiles of FEPs for 199 identified molecular functions 1 2 Cluster vs stage of IDC 3 4

  35. GO Trees of Functions: 4 Clusters of Plasmodium Falciparum

  36. Statistical Evaluation:Fund vs. Random Clusters for P. Falciparum Biological Processes Molecular Functions found clusters found clusters • larger distance from found cluster to random clusters for biological processes. • random clusters for biological processes have smaller variance

  37. Clustering all GO terms will lead to smaller z-score which means that we have worse quality clusters Right figure is P.F. functional clustering result. Z-score is 8.5 compared to 12 for clustering correlated GO terms only Statistical Evaluation: ClusteringAll GO TermsforP. Falciparum found clusters

  38. Statistical Evaluation:Found vs. Random Clusters at S. Cerevisiae and Homo Sapiens found clusters found clusters Yeast Processes Human Processes found clusters found clusters Yeast functions Human functions

  39. Remarks • Statistical significance of identified clusters (separation between clusters and random groupings) is increased by • Normalizing data (Plasmodium Falciparum) • Eliminating noise through singular vector decomposition (SVD) • Reducing data through Principle Components Analysis

  40. Conclusions • Proposed microarray tools help identifying • genes with same function and correlated expression profiles • genes with similar functions have similar expression profiles • Measuring GO tree based distance was useful for evaluating biological relevance of clusters; however, • many GO terms have only 1 associated gene • many genes do not even have a GO term • parenthood and siblings in GO trees should be differentiated, but there should be a smaller penalty for siblings relationship compared to parenthood • More robust clustering methods could be used

  41. Thank You ! More information: www.ist.temple.edu/research/biocore.html Contact: Zoran Obradovic, director IST Center, Temple University 215 204-6265 zoran@ist.temple.edu

More Related