1 / 21

MDMS-A Web Tool to Manage & Analyze Gene Expression Microarray Data

MDMS-A Web Tool to Manage & Analyze Gene Expression Microarray Data Sachin Mathur Overview Steps in analysis of Gene Expression Microarray Data Preprocessing Filtering Statistical Analysis Machine Learning & Data Mining (Clustering) Functional Analysis Data Analysis features in MDMS

libitha
Download Presentation

MDMS-A Web Tool to Manage & Analyze Gene Expression Microarray Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MDMS-A Web Tool to Manage & Analyze Gene Expression Microarray Data Sachin Mathur

  2. Overview • Steps in analysis of Gene Expression Microarray Data • Preprocessing • Filtering • Statistical Analysis • Machine Learning & Data Mining (Clustering) • Functional Analysis • Data Analysis features in MDMS • Workflow in MDMS • Analysis of Early Lung Development dataset using MDMS • MDMS Demo

  3. Image Quantification & Quality Control Preprocessing Filtering Statistical Analysis Machine learning Functional Analysis Steps in Microarray Data Analysis Analysis of Data ~ Deriving Knowledgebase from Datum and mining Information from the knowledgebase

  4. Steps in Microarray Data Analysis • Image Quantification • Check for artifacts, Segmentation • Extraction of expression values of genes • Preprocessing • Background Correction • Normalization • Summarization • MAS5, RMA, GC-RMA, DChip www.swegene.org/SWEGENE_microarray_eng.php?Id=18

  5. Steps in Microarray Data Analysis • Filtering • About 10%-50% of the genome is not expressed in a given tissue • Aim is to isolate the genes that are expressed • Also helps in more accuracy in statistical significance tests • Specific & Non-specific filtering • Filter of Presence/Absence calls • Filter on expression signal, Variability in gene expression

  6. Steps in Microarray Data Analysis • Statistical Analysis • Many genes will be expressed to perform many routine tasks in the cell • Aim is to isolate genes responsible for phenotypic variation • Interesting Vs Random • Variant significance tests ~ T-Test, ANOVA • Multiple Testing Correction

  7. Steps in Microarray Data Analysis • Machine Learning Approaches ~ Data Mining • Small changes in gene expressions can collectively regulate an important pathway, which by themselves may not be statistically significant • Limitations with fewer replicates and fitting approximate models on data during statistical analysis • Aim is to find significant patterns in the data set. • Periodic, Time-lagged, cyclic • Machine Learning approaches mine data for information ~ data mining using computational and statistical techniques (Eg Clustering)

  8. Functional Analysis • Functional Analysis • Given a statistically significant pattern or list significant of genes, how significant is it biologically? • Aim is to find genes that are responsible for the phenotypic condition • Extracting annotations and finding functionally similar genes. • Gene Ontology • Gene set enrichment, relating genes to known pathways http://cardioserve.nantes.inserm.fr/ptf-puce/images/camembert_go.gif

  9. Data Analysis Features in MDMS • All data analysis features in MDMS are implemented through Bioconductor Package (http://www.bioconductor.org) • Covers many aspects of data analysis for Gene-Expression, SNP, Custom made arrays • Many different tests for quality control, preprocessing, filtering, statistical analysis, machine learning and functional analysis • Large user community, helpful mailing lists, used by many labs in many countries • Tutorials are available on the website and hands-on training is also available. • Better than all available packages in terms of coverage of data analysis aspects. • Open Source

  10. Data Analysis Features in MDMS • MDMS supports Affymetrix Gene Expression arrays • No Image Quantification (usually done at microarray facility) • Quality Control • 3’/5’ bias • % Detection calls • Background signals • Correlation coefficients between arrays

  11. MDMS - Preprocessing • Preprocessing • MAS5 – Default Affymetrix normalization • RMA – Robust Multichip Analysis • GC-RMA, DChip (Li-Wong) • MAS5 and RMA are highly recommended • Available literature shows significant advantages of RMA over MAS5

  12. MDMS - Filtering • Filtering • Expression value cut-off • Eg. All genes > 200 • Detection calls • Eg. All genes that are detected as Present • Fold Change • Eg. All genes that have > 2 fold or less than -2 fold • Inter-Quartile Range (1st & 3rd quartiles) • For genes that show higher variability • All analysis is done on a log 2 scale

  13. MDMS – Statistical Analysis • Significance Tests • LIMMA (Linear Models of Microarrays) • SAM (Significance Analysis of Microarrays) • EBAM (E-Bayes Analysis of Microarrays) • Correction for Multiple Testing • FDR, Bonferroni, Holm’s correction • Machine Learning • Clustering • Hierarchical Clustering, K-Means, Self Organizing Maps.

  14. MDMS-Functional Analysis • Functional Analysis through GOAPhAR • Gene Annotation • Protein Annotation • Biological Pathways • Gene Ontology Annotation • Protein Interaction Evidence • All gene lists generated using the data analysis options can be saved in the database for future use. These can be also downloaded as text files.

  15. MDMS-WORKFLOW Microarray Core USER Data Repository Software Rat2302, Hg133U MDMS Database Preprocessing Filtering Statistical Analysis Machine Learning GOAPhAR Annotation

  16. Data Analysis Example • Data set specifications (GSE3541) • The aim of the study is to find genes involved in early lung development. • Mechanical Stress was applied to fetal type II endothelial cells taken from 19 day old rat embryos • Data set Processing • Data was preprocessed by MAS5 • Expression > 200, Invariant change between pairs of control & experiment samples > 50 (75% filtered) • SAM statistical method was used to find significant genes (92 genes, 63 up and 29 down-regulated) • 34 up-regulated genes were selected for further analysis

  17. Biological Significance of Clusterings • K-Means was applied to 34 genes, with K=2, 3, 4, ….,29 • Random clusterings were generated for K = 2,3,4,…29 to compare the statistical clusterings to random • Biological significance scores were calculated for all clusterings. • A z-score and P-value was calculated for each K value

  18. Biological Significance of Clusterings • The study found that genes related to amino acid synthesis, amino acid transport and sodium ion transport contributed to lung development. • 1 gene for sodium ion transport • 4 genes for amino acid transport were found in 2 clusters • 4 genes for amino acid synthesis were found in 2 clusters

  19. MDMS • Demonstration - Using MDMS to analyze data

  20. MDMS • Questions, comments, suggestions

More Related