Computing Co-Expression Relationships

Computing Co-Expression Relationships Wen-Dar Lin

Contents • Motivation • Basic Idea • Case Studies • An Example of Single Experiment • An Example of Time-Course Experiment • Potential Applications • Availability • Future Works

Motivation • Given a set of differentially displayed genes that are reported by an array experiment. • We would like to know relationships among these genes. • These relationships may recover important modules or motifs with respect to the experiment.

Motivation • Co-expression relationships are one kind of the most biologically meaningful and easily computable relationships. • Co-expression relationships form modules that may infer important biological information. • They can be computed from a large amount of publicly available array data.

Basic Idea • Array data can be retrieved from publicly available data repository • like the NASCarrays, NCBI GEO, EMBL-EBI ArrayExpress • They should be normalized before computing the co-expression relationships. • e.g. normalized by the RMA method

Basic Idea • Defining co-expression relationships • We define that a co-expression relationship between two genes exists if the pearson correlation coefficient between their normalized expression levels is greater than or equal to a certain threshold. Y X

Basic Idea negative correlation • Properties of pearson correlation coefficient • Let Correl(A, B) be the pearson correlation coefficient between normalized expression levels of gene A and gene B. • 0   Correl(A, B)   1 from http://www.gseis.ucla.edu/courses/ed230bc1/notes1/var1.html

Basic Idea • The computational assistance • Given a set of interested genes • Compute co-expression relationships among them • Identify co-expression clusters

Case Studies • We have implemented aforementioned ideas into a tool kit and applied it to two case studies. • A single experiment • A time-course experiment

A Single Experiment • In this example, an array experiment was performed • 178 differentially displayed genes were identified. • Based on RMA array data of 300 ATH1 slides downloaded from the NASCarrays • sample of each slide was derived nonexclusively from roots • Threshold for pearson correlation coefficient = 0.7

A Single Experiment One minor subcluster Two larger clusters

A Single Experiment • We may compute co-expression relationships based on all kinds of array experiment data • Based on RMA array data of 1436 ATH1 slides downloaded from the TAIR, co-expression relationships were identified • Threshold for pearson correlation coefficient = 0.7

A Single Experiment Two larger clusters

A Single Experiment • Is there any difference between the graphs based on root-array data and that based on all-array data? • By differentially marking clusters of one graph onto the other graph.

A Single Experiment One cluster that should be root-specific Two clusters mapped by the other graph

A Single Experiment Cluster size: 9 Cluster sizes: 47 & 14

A Single Experiment • Some remarks • The number of differentially displayed genes reported by the experiment is 178 • The number of clustered genes is 47+14+9 = 70 • Reduced by more than 50% • The co-expression relationships are recovered • Each cluster may be a module that usually work together. • Finding tissue-specific co-expression relationships • Can be done by mapping the graph based on all-array data onto the graph based on tissue-related-array data.

A Single Experiment • In addition to cluster genes according to co-expression relationships, we also fished genes that may potentially co-expressed. • These genes may not be identified as differentially displayed in the experiment.

A Single Experiment • A GO enrichment analysis was also carried out • using the GOBU software (gobu.iis.sinica.edu.tw) • which should give a conceptual view of clustered genes.

A time-course experiment • In this example, a time-course array experiment was performed • Three time points • About 800 genes differentially displayed at least one time point. • Based on array data of 300 ATH1 slides extracted from RMA array data of about 2600 ATH1 slides downloaded from the NASCarrays • Threshold for pearson correlation coefficient = 0.8

A time-course experiment Time point 1 About 100 genes About 100 genes

A time-course experiment • Though this clustering and time-course expression data shows some biological meaning, • this size of clustered genes (more than 200) • makes the graph too complex and • is too large to be realized in a short time.

A time-course experiment • Reducing the size of clustered genes may help • reducing complexity of the graph and • realizing revealed co-expression module • We reduced the graph by removing co-expression relationships that generally exist in the entire plant • based on RMA array data of about 2600 ATH1 slides downloaded from the NASCarrays • Threshold for pearson correlation coefficient = 0.7

A time-course experiment • Edges (relationships) to be removed Y root-related others X

A time-course experiment • Edges (relationships) to be retained Y root-related others X

A time-course experiment Time point 1 About 20 genes About 50 genes About 60 genes

A time-course experiment • Some remarks • The number of differentially displayed genes at least one time point is about 800. • The number of clustered genes is about 60+50+20 = 130 • Reduced by more than 80% • The retained graph contains edges, i.e., gene pairs, that are co-expressed in root but not in the entire plant • The recovered clusters should be root specific.

Potential Applications • We have created a tool kit that • computes co-expression relationships based on array data • where probe names can be replaced by aliases made by something like orthologous mapping • can be used for studying non-model organism using array data of a model organism.

Potential Applications • We have created a tool kit that • fills colors according to graphs by • intensity fold-changes, or • clusters in another graph

Potential Applications • We have created a tool kit that • removes/retains co-expression relationships in another graph • finds specific or common co-expression relationships 200 genes 120 genes

We have created a tool kit that fishes genes that are potentially co-expressed with assigned bait Potential Applications

Future Works • Incorporate pathway database • like the AraCyc • for finding relationships between co-expression clusters and known pathways • A user-friendly interface which would • facilitate using this tool kit and • help manage output data

Availability • The tool kit is now an open-source project • http://maccu.sourceforge.net • Project name: MACCU • Multi-Array Correlation Computation Utility • A detailed description of each program module has been created. • A running script with example is provided.

Special Thanks • I would like to thank • Drs. Chang (Bill), Schmidt & Wu • for raising this idea, • the initial implementation, and • valuable comments.

Thank you!

Computing Co-Expression Relationships

Computing Co-Expression Relationships

Presentation Transcript

Relationships Between Computing and Aesthetic Appreciation

Beyond Co-expression: Gene Network Inference

Attribute Expression of using Gray Level co-occurrence

Relationships Between Computing and Aesthetic Appreciation

Consensus eigengene networks: Studying relationships between gene co-expression modules across networks

Co-expression with Duet vectors

MINING THE GENE EXPRESSION MATRIX: INFERRING GENE RELATIONSHIPS FROM L ARGE SCALE GENE EXPRESSION DATA

Expression

Mining Gene Co-expression Network for Cancer Biomarker Prediction

Detecting Network Motifs in Gene Co-expression Networks

Attribute Expression Using Gray Level Co-Occurrence

9.5: Re-Expression of Curved Relationships

Computing in BE-CO

An Overview of Weighted Gene Co-Expression Network Analysis

Gene Expression Platforms for Global Co-Expression Analyses

Gene Expression as Output Detection for Bio-molecular Computing

computing the relationships between autonomous systems

Expression

Expression

computing the relationships between autonomous systems

computing the relationships between autonomous systems

Protein Co-expression Service in E. coli System & Protein Co-expression in Mammalian Cells