1 / 16

Genomic Profiles of Brain Tissue in Humans and Chimpanzees II

Genomic Profiles of Brain Tissue in Humans and Chimpanzees II. Naomi Altman Oct 06. SAM. Significance Analysis of Microarrays is a popular method of differential expression analysis, freely available from www-stat.stanford.edu/~tibs

graceland
Download Presentation

Genomic Profiles of Brain Tissue in Humans and Chimpanzees II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06

  2. SAM Significance Analysis of Microarrays is a popular method of differential expression analysis, freely available from www-stat.stanford.edu/~tibs It uses permutation based tests, and allows for some common models including paired and unpaired t-tests, one-way ANOVA, and some simple block designs. It also has some other analyses. The data must be normalized in advance. No missing data are allowed. SAM includes a method to "fill in" (impute) missing values, assuming they are missing at random and sparse.

  3. SAM SAM can be run from Excel through an interface that sends data to and from R. samr is the package running on R. I will demonstrate the Excel interface, which is the popular method.

  4. SAM Like Limma, SAM starts by computing a test statistic for each gene. SAM uses a regularized denominator: i.e. the test statistic is based on a paired or two-sample t-test, or an ANOVA F-test, but a small constant computed from all the data replaces the within treatment estimate of variance for each gene. The variance of a gene is supposed to be the same for all treatments.

  5. SAM Like Limma, SAM starts by computing a test statistic for each gene. SAM uses a regularized denominator: i.e. the test statistic is based on a paired or two-sample t-test, or an ANOVA F-test, but a small constant computed from all the data replaces the within treatment estimate of variance for each gene. The variance of a gene is supposed to be the same for all treatments. Usual Moderated 2-sample paired ANOVA

  6. s0 s0 is computed from the values of si computed from all the genes. An ad hoc procedure based on simulations is used.

  7. Selecting the Significant Genes SAM uses a quantile-quantile plot of the data versus the expected quantiles of the null distribution. Observations off the identity line are considered detections. The FDR is estimated based on the percentage of the randomization values that would have been "detected".

  8. Selecting the Significant Genes SAM uses a quantile-quantile plot of the data versus the expected quantiles of the null distribution. Observations off the identity line are considered detections. The FDR is estimated based on the percentage of the randomization values that would have been "detected".

  9. Example for Random Normals We sort the data into y(1)<y(2) ...y(n) y(i) has a sampling distribution with mean: nz(i) the ith normal score. We plot y(i) versus nz(i). If the data are normally distributed, then the data should lie on the line y=x. (Note that in the case of N(m,s2) data, we often plot against the normal scores for N(0,1) - then the data should lie on the line y=m+sx

  10. Example for Random Normals We sort the data into y(1)<y(2) ...y(n) y(i) has a sampling distribution with mean: nz(i) the ith normal score. We plot y(i) versus nz(i). If the data are normally distributed, then the data should lie on the line y=x. (Note that in the case of N(m,s2) data, we often plot against the normal scores for N(0,1) - then the data should lie on the line y=m+sx

  11. Selecting the Significant Genes SAM computes a test statistic Di for the ith gene. Then, the sample labels are permuted. For each permutation: D(1)<D(2) ...<D(G) saved. These are averaged over the permutations to obtain the X-axis of the plot (call these the DN scores). As well, all the distances dist(i)=|D(i)-DN(i)| are recorded. The median number of values such that dist(i)>K is considered to be the estimate of the number of expected false discoveries at distance K.

  12. Selecting the Significant Genes SAM computes a test statistic Di for the ith gene. The user selects a distance. SAM computes the number of genes detected at that distance R, and estimates the expected number of false discoveries at that distance V to obtain an estimate of the FDR

  13. Example for Random Normals If this is the plot for the data, the points indicated are the discoveries. For each permutation data set, we also compute the number of discoveries, and then obtain an estimate of V.

  14. Running SAM • Write normalized data to a file compatible with Excel (tab or comma delimited). • Start Excel. First 2 columns should be gene ids. First row are numbers 1 ... T giving treatments. • Select rows and columns of spreadsheet that you want to analyze. • Click on SAM on GUI. Select type of analysis, random seed and number of permutations.

  15. Running SAM • The SAM qqplot comes up. Select a distance or use slider to assess FDR. • Print genelist. The contrasts are:

  16. Limma model-based can handle small numbers of replicates handles ANOVA-type problems including 1 random effect handles missing data produces a genelist and CIs can determine significance of any linear contrast hard to use Limma Vs SAM • SAM • nonparametric • cannot handle small numbers of replicates • handles limited ANOVA -type problems and survival • "imputes" missing data • produces only a genelist • only determines significance of deviation from mean • easy to use

More Related