Microarray Data Analysis Differential Gene Expression

Microarray Data AnalysisDifferential Gene Expression David Montaner dmontaner@cipf.es http://bioinfo.cipf.es Bioinformatics Department Centro de Investigacion Principe Felipe (Valencia) March 2008

Analysis of Differential Gene Expression

Data • Gene normalized intensities • Gene names or gene identifiers • Microarray information • In an independent text file • Using special lines and reserved words within the text file containing gene intensity measurements

Data entry File with ID and intensity measurements Class file Intensity file with special line for microarray class Text files tab delimited

Data entry • Reserved words #NAMES #CLASS #INDEPENDENT_VARIABLE #TIME_VARIABLE #CENSORING_VARIABLE #CONTIN #SERIES • Comment lines # • Better do not use empty lines…

Results Parameter estimates, p-values, adjusted p-values, posterior probabilities… Raw data Output file Ordered genes Graphic display of results Ordered genes Redirect output to other GEPAS’ tools

Results Ordered genes and ordered arrays • We perform one hypothesis test for each gene • There is an increased chance of finding false positives • We need to adjust p-values to control • FWER (family-wise error rate) • FDR (false discovery rate)

Two class comparison We can rank the genes according to a straightforward biological meaning

Two class comparison • t-test • data-adaptive statistic • Empirical Bayes (hierarchical mixture model) • CLEAR test

t – test for a gene expression For each gene, we check if its mean expression is equal or different across the two classes Null hypothesis: the mean expression is equal n both groups. Alternative hypothesis: the mean expression is different between the groups. Mean in group 1 Mean in group 2 Test Statistic: Estimation of the variability of the differences

p - value Under the null hypothesis Frequency histogram

p - value Under the null hypothesis Frequency histogram Distribution function

p - value Under the null hypothesis Frequency histogram p-value (area) Distribution function Rejection region Confidence region

p - value • If we reject when p-value < 0.05 there is a 5% chance of getting a false positive • On average • If you test 100 hypotheses 5 will be false positives (appear significantly wrong) • If you test 10000 hypotheses 500 will appear as false positives • Multiple testing correction is needed

Multi-class comparison It is not clear how to arrange genes by their pattern across classes

Multi-class comparison • ANOVA • CLEAR

Gene expression related to a continuous variable Expression data Continuous Variable #INDEPENDENT_VARIABLE Assessing linear relationships

Regression

Regression gene1 slope

Regression gene2 slope gene3 slope … gene1 slope

Correlation • Pearson correlation coefficient • Spearman correlation coefficient • Linear regression Arrays ranked according to the independent variable Genes ranked by correlation to the continuous variable

Survival data Expression Survival times & Censoring indicator #TIME_VARIABLE #CENSORING_VARIABLE Cox proportional hazards regression model

Survival data • Cox model coefficients • Estimate for the statistics • p-values Arrays ranked according to the survival time Genes ranked by their relationship with survival times

Time course analysis / Dose analysis Expression data • Time variable and series classification • #CONTIN • #SERIES

Redirecting to Babelomics

Microarray Data Analysis Differential Gene Expression

Microarray Data Analysis Differential Gene Expression

Presentation Transcript

Clustering analysis of microarray gene expression data

Differential Gene Expression

Microarray technology and analysis of gene expression data

Microarray Gene Expression Data Analysis

Analysis of Gene Expression Data

Introduction to Microarray Gene Expression

Gene expression: Microarray data analysis

Classification of Microarray Gene Expression Data

Microarray and Gene Expression Profiling

Differential gene expression

A Gene Expression Barcode for Microarray Data

Analysis of honey bee microarray gene expression data

4. Gene Expression Data Analysis

BIOL6900 Chapter 9 Gene expression: Microarray data analysis

Classification of Microarray Gene Expression Data

Differential Gene Expression

Differential Gene Expression Analysis of RNA Seq and Microarray Data highlights role of Hfq gene in expression regulatio

Lecture 4 MicroArray Gene expression

Clustering analysis of microarray gene expression data

Bioinformatics : Gene Expression Data Analysis

Eigensolvers for analysis of microarray gene expression data