390 likes | 527 Views
Computing Co-Expression Relationships. Wen-Dar Lin. Contents. Motivation Basic Idea Case Studies An Example of Single Experiment An Example of Time-Course Experiment Potential Applications Availability Future Works. Motivation.
E N D
Computing Co-Expression Relationships Wen-Dar Lin
Contents • Motivation • Basic Idea • Case Studies • An Example of Single Experiment • An Example of Time-Course Experiment • Potential Applications • Availability • Future Works
Motivation • Given a set of differentially displayed genes that are reported by an array experiment. • We would like to know relationships among these genes. • These relationships may recover important modules or motifs with respect to the experiment.
Motivation • Co-expression relationships are one kind of the most biologically meaningful and easily computable relationships. • Co-expression relationships form modules that may infer important biological information. • They can be computed from a large amount of publicly available array data.
Basic Idea • Array data can be retrieved from publicly available data repository • like the NASCarrays, NCBI GEO, EMBL-EBI ArrayExpress • They should be normalized before computing the co-expression relationships. • e.g. normalized by the RMA method
Basic Idea • Defining co-expression relationships • We define that a co-expression relationship between two genes exists if the pearson correlation coefficient between their normalized expression levels is greater than or equal to a certain threshold. Y X
Basic Idea negative correlation • Properties of pearson correlation coefficient • Let Correl(A, B) be the pearson correlation coefficient between normalized expression levels of gene A and gene B. • 0 Correl(A, B) 1 from http://www.gseis.ucla.edu/courses/ed230bc1/notes1/var1.html
Basic Idea • The computational assistance • Given a set of interested genes • Compute co-expression relationships among them • Identify co-expression clusters
Case Studies • We have implemented aforementioned ideas into a tool kit and applied it to two case studies. • A single experiment • A time-course experiment
A Single Experiment • In this example, an array experiment was performed • 178 differentially displayed genes were identified. • Based on RMA array data of 300 ATH1 slides downloaded from the NASCarrays • sample of each slide was derived nonexclusively from roots • Threshold for pearson correlation coefficient = 0.7
A Single Experiment One minor subcluster Two larger clusters
A Single Experiment • We may compute co-expression relationships based on all kinds of array experiment data • Based on RMA array data of 1436 ATH1 slides downloaded from the TAIR, co-expression relationships were identified • Threshold for pearson correlation coefficient = 0.7
A Single Experiment Two larger clusters
A Single Experiment • Is there any difference between the graphs based on root-array data and that based on all-array data? • By differentially marking clusters of one graph onto the other graph.
A Single Experiment One cluster that should be root-specific Two clusters mapped by the other graph
A Single Experiment Cluster size: 9 Cluster sizes: 47 & 14
A Single Experiment • Some remarks • The number of differentially displayed genes reported by the experiment is 178 • The number of clustered genes is 47+14+9 = 70 • Reduced by more than 50% • The co-expression relationships are recovered • Each cluster may be a module that usually work together. • Finding tissue-specific co-expression relationships • Can be done by mapping the graph based on all-array data onto the graph based on tissue-related-array data.
A Single Experiment • In addition to cluster genes according to co-expression relationships, we also fished genes that may potentially co-expressed. • These genes may not be identified as differentially displayed in the experiment.
A Single Experiment • A GO enrichment analysis was also carried out • using the GOBU software (gobu.iis.sinica.edu.tw) • which should give a conceptual view of clustered genes.
A time-course experiment • In this example, a time-course array experiment was performed • Three time points • About 800 genes differentially displayed at least one time point. • Based on array data of 300 ATH1 slides extracted from RMA array data of about 2600 ATH1 slides downloaded from the NASCarrays • Threshold for pearson correlation coefficient = 0.8
A time-course experiment Time point 1 About 100 genes About 100 genes
A time-course experiment Time point 2 About 100 genes About 100 genes
A time-course experiment Time point 3 About 100 genes About 100 genes
A time-course experiment • Though this clustering and time-course expression data shows some biological meaning, • this size of clustered genes (more than 200) • makes the graph too complex and • is too large to be realized in a short time.
A time-course experiment • Reducing the size of clustered genes may help • reducing complexity of the graph and • realizing revealed co-expression module • We reduced the graph by removing co-expression relationships that generally exist in the entire plant • based on RMA array data of about 2600 ATH1 slides downloaded from the NASCarrays • Threshold for pearson correlation coefficient = 0.7
A time-course experiment • Edges (relationships) to be removed Y root-related others X
A time-course experiment • Edges (relationships) to be retained Y root-related others X
A time-course experiment Time point 1 About 20 genes About 50 genes About 60 genes
A time-course experiment Time point 2 About 20 genes About 50 genes About 60 genes
A time-course experiment Time point 3 About 20 genes About 50 genes About 60 genes
A time-course experiment • Some remarks • The number of differentially displayed genes at least one time point is about 800. • The number of clustered genes is about 60+50+20 = 130 • Reduced by more than 80% • The retained graph contains edges, i.e., gene pairs, that are co-expressed in root but not in the entire plant • The recovered clusters should be root specific.
Potential Applications • We have created a tool kit that • computes co-expression relationships based on array data • where probe names can be replaced by aliases made by something like orthologous mapping • can be used for studying non-model organism using array data of a model organism.
Potential Applications • We have created a tool kit that • fills colors according to graphs by • intensity fold-changes, or • clusters in another graph
Potential Applications • We have created a tool kit that • removes/retains co-expression relationships in another graph • finds specific or common co-expression relationships 200 genes 120 genes
We have created a tool kit that fishes genes that are potentially co-expressed with assigned bait Potential Applications
Future Works • Incorporate pathway database • like the AraCyc • for finding relationships between co-expression clusters and known pathways • A user-friendly interface which would • facilitate using this tool kit and • help manage output data
Availability • The tool kit is now an open-source project • http://maccu.sourceforge.net • Project name: MACCU • Multi-Array Correlation Computation Utility • A detailed description of each program module has been created. • A running script with example is provided.
Special Thanks • I would like to thank • Drs. Chang (Bill), Schmidt & Wu • for raising this idea, • the initial implementation, and • valuable comments.