180 likes | 303 Views
Identification of cell cycle-related regulatory motifs using a kernel canonical correlation analysis. Presented by Rhee, Je-Keun Graduate Program in Bioinformatics Center for Biointelligence Technology (CBIT) Biointelligence Laboratory Seoul National University. Contents. Introduction
E N D
Identification of cell cycle-related regulatory motifsusing a kernel canonical correlation analysis Presented by Rhee, Je-Keun Graduate Program in Bioinformatics Center for Biointelligence Technology (CBIT) Biointelligence Laboratory Seoul National University
Contents • Introduction • Kernel canonical correlation analysis (kernel CCA) • Datasets & Experiments • Experimental results • Conclusion (c) 2009 Biointelligence Laboratory, Seoul National University
Introduction • One of the major challenges in gene regulation studies is to identify regulators affecting the expression of their target genes in specific biological processes. • In the present study, we propose a kernel-based approach to efficiently identify core regulatory elements involved in specific biological processes using gene expression profiles. • Using yeast cell cycle data, we explored significant relationships between motifs and expression profiles, and searched for regulatory motifs and their pairs correlated with specific expression patterns. G1 S M G2 (c) 2009 Biointelligence Laboratory, Seoul National University
Φ: x→φ(x) Kernel methods • The kernel trick is a method to solve a non-linear problem by mapping the original non-linear observations into a higher-dimensional space. (c) 2009 Biointelligence Laboratory, Seoul National University
Canonical correlation analysis (CCA) • Canonical correlation analysis (CCA) is a classical multivariate statistical method for finding linearly correlated features from a pair of datasets. • Suppose there is a pair of multivariates xi and xj, CCA finds a pair of linear transformations such that the correlation coefficient between extracted features is maximized. xi xj ai aj ui uj … … … (c) 2009 Biointelligence Laboratory, Seoul National University
Kernel canonical correlation analysis(kernel CCA) • Kernel CCA offers a solution for overcoming the linearity problem by projecting the data into a higher dimensional feature space. • While CCA is limited to linear features, kernel CCA can capture nonlinear relationships. xexp xseq fseq fexp useq uexp Φseq Φexp … … … … … expression profiles sequence data (c) 2009 Biointelligence Laboratory, Seoul National University
Preparation of datasets • Gene expression datasets • Expression profiles of all ORFs (open reading frames) during the yeast cell cycle that consists of 18 time points by Spellman et al. • Sequence datasets • Upstream sequences of ORFs scanned for the presence of 42 known motifs extracted by Pilpel et al. using the AlignACE program • Raw upstream sequences extracted ~1kb upstream sequences of each gene. (c) 2009 Biointelligence Laboratory, Seoul National University
Experiments • Identification of the relationship between gene expression and known motifs using a set of motifs extracted by AlignACE • 42 motifs • Identification of cell cycle-related motifs from raw upstream sequence • A total of 1,024 features (window size l=5) • Combinatorial effects of regulatory motifs • Searching the motif pairs that have synergistic or co-regulatory effects in the yeast cell cycle (c) 2009 Biointelligence Laboratory, Seoul National University
Known regulatory motifs in yeast (c) 2009 Biointelligence Laboratory, Seoul National University
Relationship between gene expression and sequence motifs (c) 2009 Biointelligence Laboratory, Seoul National University
The list of top ranked motifs by the kernel CCA (c) 2009 Biointelligence Laboratory, Seoul National University
Weight distributions for motifs derived from cellcycle and non cell cycle-related datasets MCB MCB SWI5 SFF’ SFF’ SWI5 (c) 2009 Biointelligence Laboratory, Seoul National University
Correlation between expression profiles and motifs derived by using the raw upstream sequence data (c) 2009 Biointelligence Laboratory, Seoul National University
High-scored motifs in the first and the second components using 5-mer raw upstream sequences (c) 2009 Biointelligence Laboratory, Seoul National University
Measurement of the effect of motif pairs • ECRScore (Expression Coherence coRrelation Score) • It is calculated by a Pearson correlation coefficient of expression profiles for all possible pairs of genes whose upstream regions had the two motifs, mi and mj. • N(mi ∩ mj) is the number of all pairs of genes whose upstream regions have the two motifs. • Nτ(mi ∩ mj) is the number of gene pairs whose correlation coefficient is larger than the threshold τ. • The threshold was chosen based on the fifth percentile of the distribution for correlation coefficients of randomly sampled gene pairs. (c) 2009 Biointelligence Laboratory, Seoul National University
Heat map of weight values of motif pairs related to cell cycle regulation (c) 2009 Biointelligence Laboratory, Seoul National University
Combinational effects of regulatory motifs (c) 2009 Biointelligence Laboratory, Seoul National University
Conclusion • We presented a novel method that can identify the candidate conditional specific regulatory motifs by employing kernel-based methods. • In summary, given expression profiles, our method was able to identify regulatory motifs involved in specific biological processes. • The method could be applied to the elucidation of the unknown regulatory mechanisms associated with complex gene regulatory processes. • In the future research, we will apply the proposed method to diverse gene expression datasets, especially cancer-related datasets. (c) 2009 Biointelligence Laboratory, Seoul National University