1 / 26

Microarray Data Analysis Differential Gene Expression

Microarray Data Analysis Differential Gene Expression. David Montaner dmontaner@cipf.es http://bioinfo.cipf.es Bioinformatics Department Centro de Investigacion Principe Felipe (Valencia) March 2008. Analysis of Differential Gene Expression. Data. Gene normalized intensities

harris
Download Presentation

Microarray Data Analysis Differential Gene Expression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microarray Data AnalysisDifferential Gene Expression David Montaner dmontaner@cipf.es http://bioinfo.cipf.es Bioinformatics Department Centro de Investigacion Principe Felipe (Valencia) March 2008

  2. Analysis of Differential Gene Expression

  3. Data • Gene normalized intensities • Gene names or gene identifiers • Microarray information • In an independent text file • Using special lines and reserved words within the text file containing gene intensity measurements

  4. Data entry File with ID and intensity measurements Class file Intensity file with special line for microarray class Text files tab delimited

  5. Data entry • Reserved words #NAMES #CLASS #INDEPENDENT_VARIABLE #TIME_VARIABLE #CENSORING_VARIABLE #CONTIN #SERIES • Comment lines # • Better do not use empty lines…

  6. Results Parameter estimates, p-values, adjusted p-values, posterior probabilities… Raw data Output file Ordered genes Graphic display of results Ordered genes Redirect output to other GEPAS’ tools

  7. Results Ordered genes and ordered arrays • We perform one hypothesis test for each gene • There is an increased chance of finding false positives • We need to adjust p-values to control • FWER (family-wise error rate) • FDR (false discovery rate)

  8. Two class comparison We can rank the genes according to a straightforward biological meaning

  9. Two class comparison • t-test • data-adaptive statistic • Empirical Bayes (hierarchical mixture model) • CLEAR test

  10. t – test for a gene expression For each gene, we check if its mean expression is equal or different across the two classes Null hypothesis: the mean expression is equal n both groups. Alternative hypothesis: the mean expression is different between the groups. Mean in group 1 Mean in group 2 Test Statistic: Estimation of the variability of the differences

  11. p - value Under the null hypothesis Frequency histogram

  12. p - value Under the null hypothesis Frequency histogram Distribution function

  13. p - value Under the null hypothesis Frequency histogram p-value (area) Distribution function Rejection region Confidence region

  14. p - value • If we reject when p-value < 0.05 there is a 5% chance of getting a false positive • On average • If you test 100 hypotheses 5 will be false positives (appear significantly wrong) • If you test 10000 hypotheses 500 will appear as false positives • Multiple testing correction is needed

  15. Multi-class comparison It is not clear how to arrange genes by their pattern across classes

  16. Multi-class comparison • ANOVA • CLEAR

  17. Gene expression related to a continuous variable Expression data Continuous Variable #INDEPENDENT_VARIABLE Assessing linear relationships

  18. Regression

  19. Regression gene1 slope

  20. Regression gene1 slope

  21. Regression gene2 slope gene3 slope … gene1 slope

  22. Correlation • Pearson correlation coefficient • Spearman correlation coefficient • Linear regression Arrays ranked according to the independent variable Genes ranked by correlation to the continuous variable

  23. Survival data Expression Survival times & Censoring indicator #TIME_VARIABLE #CENSORING_VARIABLE Cox proportional hazards regression model

  24. Survival data • Cox model coefficients • Estimate for the statistics • p-values Arrays ranked according to the survival time Genes ranked by their relationship with survival times

  25. Time course analysis / Dose analysis Expression data • Time variable and series classification • #CONTIN • #SERIES

  26. Redirecting to Babelomics

More Related