1 / 14

Functional annotation and network reconstruction through cross-platform integration of microarray data

Functional annotation and network reconstruction through cross-platform integration of microarray data. X. J. Zhou et al. 2005. Challenges in microarray data analysis. Integration of multiple microarray data sets. Different platforms, e.g. cDNA arrays, Affymetrix arrays

aspen
Download Presentation

Functional annotation and network reconstruction through cross-platform integration of microarray data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al. 2005

  2. Challenges in microarray data analysis • Integration of multiple microarray data sets. • Different platforms, e.g. cDNA arrays, Affymetrix arrays • Alternative experimental parameters • Identification of functionally related genes which do not have similar expression patterns. • Reconstruction of transcriptional regulatory networks. • It is difficult to elucidate the cooperativity between TFS because the changes in their expression are often subtle and their activities are often controlled at levels other than expression.

  3. Data pre-processing • Classify the 618 expression profiles into 39 data sets. A data set contains a set of expression profiles measured under relevant conditions. • 19 cDNA data sets from SMD • 4 Affymetrix data sets from GEO • 16 data sets from Rosetta

  4. 19 SMD data sets • Alpha factor release • cdc15 block release • DTT Exposure • Elutriation • Forkhead regulation • Gamma radiation • Menadione exposure • DNA damage (MMS) response • Nitrogen depletion • Nutrition limitation • Osmotic shock • SIR proteins (Chromatin Silencing) • Sorbitol effects • H2O2 response • Heat shock • Heat steady • CellCycle Factor • YPD Stationary phase • Zinc homoeostasis Corresponding to 19 SMD subcategories

  5. 4 GEO data sets • Aging • Chitin synthesis • Fermentation time course • Ume6 regulon

  6. 16 Rosetta data sets • Cell cycle control • Cell wall organization • Chromatin assembly • Ion homeostasis • Nucleotide metabolism • Organelle biogenesis • Perception of external stimulus • Protein biosynthesis • Protein degradation • Protein metabolism • Protein phosphorylation • Protein transport • Pseudohyphal growth • Steroid metabolism • Amino Acid Starvation • MAPK pathway Classification is based on the GeneOntology (GO) biological process categories of the deleted genes.

  7. The idea: 2nd-order expression correlation • 1st-order expression correlation • Correlation of expression patterns from one data set • For each pair of genes, a vector of length n is obtained. n is the number of data sets. • 2nd-order expression correlation • Correlation of the 1st-order expression correlation

  8. An example The overall expression similarity between the two gene pairs is not significantly high. However, their 1st-order expression correlation profiles exhibit high correlation, that is, the four genes have high 2nd-order expression correlation.

  9. Clustering functionally related genes • Procedure • Identification of doublets • A doublet is a pair of genes that is tightly co-expressed in multiple data sets. • Clustering of doublets based on their 1st- order expression correlation profiles • Results • 72 of the top 100 tightest clusters are functionally homogeneous.

  10. Gene function prediction • A prediction of function is made for a doublet only if it is in a tight cluster that includes at least three doublets and in which all remaining doublets share the same function. • 79 functions are assigned to 67 unknown genes. Some have been verified by experimental studies.

  11. Reconstruction of regulatory networks • For each transcription module, a 1st-order average expression correlation profile (a vector with the same length as the number of data sets) is calculated. The profile of a module can be interpreted as the activity profile of the transcription factor(s) that regulate the module. • A transcription module is defined to be a set of genes that are regulated by the same transcription factor(s) based on genome-wide location data, and are coexpressed in multiple data sets. • 60 TM are identified. • A 2nd-order expression correlation is calculated for two activity profiles of transcription factors, to measure the cooperativity between the two transcription factors. • 34 pairs show high 2nd-order correlation.

  12. Clustering of modules

  13. Annotation of TFs • The function of a TF is predicted based on two evidences: • The functions of known genes in its target module • The functions of known genes in other modules in the same module cluster • TF GAT3 is predicted to play a role in mitotic and meiotic cell cycles.

More Related