1 / 64

Analysis of Differential Expression

Analysis of Differential Expression. T-test ANOVA Non-parametric methods Correlation Regression. Research Question. Do nicotine-exposed rats have different X gene expression than control rats in ventral tegmental area?

emccombs
Download Presentation

Analysis of Differential Expression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression

  2. Research Question • Do nicotine-exposed rats have different X gene expression than control rats in ventral tegmental area? • Design an experiment in which treatment rats (N>2) are exposed to nicotine and control rats (N>2) are exposed to saline. • Collect RNA from VTA, convert to cDNA • Determine the amount of X transcript in each individual. • Perform a test of means considering the variability within each group.

  3. Observed difference between groups • May be due to • Treatment • Chance

  4. Hypothesis Testing • Null hypothesis: There is no difference between the means of the groups. • Alternative hypothesis: Means of the groups are different.

  5. Hypothesis testing • You can not accept null hypothesis • You can reject it • You can support it

  6. P-value • The ‘P’ stands for probability, and measures how likely it is that any observed difference between groups is due to chance, alone.

  7. P-value • there is a significant difference between groups if the P value is small enough (e.g., <0.05). • P value equals to the probability of type I error. • Type I error: wrongly concluding that there is a difference between groups (false positive). • Type II error: wrongly concluding that there is no difference between groups (false negative).

  8. Multiple tests on the same data • Expression data on multiple genes from the same individuals • Subsets of genes are coregulated thus they are not independent. • Such data requires multiple tests.

  9. Why not do multiple t-tests? Or if you do, adjust the p-values • Because it increases type I error: • a study involving four treatments, there are six possible pairwise comparisons. • If the chance of a type I error in one such comparison is 0.05, then the chance of not committing a type I error is 1 – 0.05 = 0.95. • then the chance of not committing a type I error in any one of them is 0.956 = 0.74. • Cumulative type I error = 1-0.74=0.26

  10. Normal Distribution • it is entirely defined by two quantities: its mean and its standard deviation (SD). • The mean determines where the peak occurs and • the SD determines the shape of the curve.

  11. Curves: same mean, different stds

  12. Rules of normal distribution • 68.3% of the distribution falls within 1 SD of the mean (i.e. between mean – SD and mean + SD); • 95.4% of the distribution falls between mean – 2 SD and mean + 2 SD; • 99.7% of the distribution falls between mean – 3 SD and mean + 3 SD.

  13. Most commonly used rule • 95% of the distribution falls between mean – 1.96 SD and mean + 1.96 SD • If the data are normally distributed, one can use a range (confidence interval) within which 95% of the data falls into.

  14. A sample • Samples vary • Samples are collected in limited numbers • They are representatives of a population. • A sample: • E.g., nicotine treated rat RNA

  15. Sample means • Consider all possible samples of fixed size (n) drawn from a population. • Each of these samples has its own mean and these means will vary between samples. • Each sample will have their own distribution, thus their own std.

  16. Population mean • The mean of all the sample means is equal to the population mean (). • SD of the sample means measures the deviation of individual sample means from the population mean ()

  17. Standard error • It reflects the effect of sample size, larger the SE, either the variation is high or sample size is small.

  18. Confidence Intervals • a confidence interval gives a range of values within which it is likely that the true population value lies. • It is defined as follows: • 95% confidence interval (sample mean – 1.96 SE) to (sample mean + 1.96 SE). • a 99% confidence interval(calculated as mean ± 2.56 SE)

  19. T-distribution • The t-distribution is similar in shape to the Normal distribution, being symmetrical and unimodal, but is generally more spread out with longer tails. • The exact shape depends on a quantity known as the ‘degrees of freedom’, which in this context is equal to • the sample size minus 1.

  20. T-distribution

  21. One-sample t-test • Null hypothesis: Sample mean does not differ from hypothesized mean, e.g., 0 (Ho: =0) • A t-statistics (t) is calculated. • t is the number of SEs that separate the sample mean from the hypothesized value. • The associated P value is obtained by comparison with the t distribution. • Larger the t-statistics, lower the probability of obtaining such a large value, thus p is smaller and more significant.

  22. Paired t-test • Used with paired data. • Paired data arise in a number of different situations, • a matched case–control study in which individual cases and controls are matched to each other, or • A repeat measures study in which some measurement is made on the same set of individuals on more than one occasion

  23. Paired t-test

  24. Two-sample t-test • Comparison of two groups with unpaired data. • E.g., comparison of individuals of treatment and those of control for a particular variable. • Now there are two independent populations thus two STDs

  25. Calculation of pooled STD • The pooled SD for the difference in means is calculated as follows:

  26. Calculation of pooled SE • the combined SE gives more weight to the larger sample size (if sample sizes are unequal) because this is likely to be more reliable. The pooled SD for the difference in means is calculated as follows:

  27. Two sample T-test • Comparison of means of two groups based on a t-statistics and its student’s t-distribution. • dividing the difference between the sample means by the standard error of the difference.

  28. T-statistic • A P value may be obtained by comparison with the t distribution on n1 + n2 – 2 degrees of freedom. • Again, the larger the t statistic, the smaller the P value will be.

  29. Example

  30. Calculation of SD

  31. Calculation of SE

  32. T-statistic • t = (95-81)/2.41 = 14/2.41 = 5.81, • with a corresponding P value less than 0.0001. • Reject null hypothesis that states that sample means do not differ.

  33. Analysis of Variance • ANOVA • A technique for analyzing the way in which the mean of a variable is affected by different types and combinations of factors. • E.g., the effect of three different diets on total serum cholesterol

  34. Sample Experiment Variance:

  35. Sum of squares calculations between within total

  36. Degrees of freedom

  37. Sources of variation P value of 0.0039 means that at least two of the treatment groups are different.

  38. Multiple Tests • Post hoc comparisons between pairs of treatments. • Overall type I error rate increases by increasing number of pairwise comparisons. • One has to maintain the 0.05 type I error rate after all of the comparisons.

  39. Bonferroni Adjustment • 0.05/#of tests • Too conservative

  40. NonParametric methods • Many statistical methods require assumptions. • T-test requires samples are normally distributed. • They require transformations • Nonparametric methods require very little or no assumptions.

  41. Wilcoxon signed rank test for paired data

  42. Wilcoxon signed rank test

  43. Central venous oxygen saturation on admission and after 6 h into ICU. • Take the difference between the paired data points. • Patients have SvO2 values on admission and after 6 hours.

  44. Central venous oxygen saturation on admission and after 6 h into ICU. • Rank differences regardless of their sign. • Give a sign to the ranked differences

  45. Calculate • Sum of positive ranks (R+) • Sum of negative ranks (R-)

  46. Sum of positive and negative ranks

  47. Critical values for WSR test when n = 10 5

  48. Wilcoxon sum or Mann-Whitney test • Wilcoxon signed rank is good for paired data. • For unpaired data, wilcoxon sum test is used.

  49. Steps of Wilcoxon rank-sum test

  50. Total drug doses in patients with a 3 to 5 day stay in intensive care unit. • Rank all observations in the increasing order regardless of groupings • Use average rank if the values tie • Add up the ranks • Select the smaller value, calculate a p-value for it.

More Related