Genomic Profiles of Brain Tissue in Humans and Chimpanzees II

Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06

SAM Significance Analysis of Microarrays is a popular method of differential expression analysis, freely available from www-stat.stanford.edu/~tibs It uses permutation based tests, and allows for some common models including paired and unpaired t-tests, one-way ANOVA, and some simple block designs. It also has some other analyses. The data must be normalized in advance. No missing data are allowed. SAM includes a method to "fill in" (impute) missing values, assuming they are missing at random and sparse.

SAM SAM can be run from Excel through an interface that sends data to and from R. samr is the package running on R. I will demonstrate the Excel interface, which is the popular method.

SAM Like Limma, SAM starts by computing a test statistic for each gene. SAM uses a regularized denominator: i.e. the test statistic is based on a paired or two-sample t-test, or an ANOVA F-test, but a small constant computed from all the data replaces the within treatment estimate of variance for each gene. The variance of a gene is supposed to be the same for all treatments.

SAM Like Limma, SAM starts by computing a test statistic for each gene. SAM uses a regularized denominator: i.e. the test statistic is based on a paired or two-sample t-test, or an ANOVA F-test, but a small constant computed from all the data replaces the within treatment estimate of variance for each gene. The variance of a gene is supposed to be the same for all treatments. Usual Moderated 2-sample paired ANOVA

s0 s0 is computed from the values of si computed from all the genes. An ad hoc procedure based on simulations is used.

Selecting the Significant Genes SAM uses a quantile-quantile plot of the data versus the expected quantiles of the null distribution. Observations off the identity line are considered detections. The FDR is estimated based on the percentage of the randomization values that would have been "detected".

Example for Random Normals We sort the data into y(1)<y(2) ...y(n) y(i) has a sampling distribution with mean: nz(i) the ith normal score. We plot y(i) versus nz(i). If the data are normally distributed, then the data should lie on the line y=x. (Note that in the case of N(m,s2) data, we often plot against the normal scores for N(0,1) - then the data should lie on the line y=m+sx

Selecting the Significant Genes SAM computes a test statistic Di for the ith gene. Then, the sample labels are permuted. For each permutation: D(1)<D(2) ...<D(G) saved. These are averaged over the permutations to obtain the X-axis of the plot (call these the DN scores). As well, all the distances dist(i)=|D(i)-DN(i)| are recorded. The median number of values such that dist(i)>K is considered to be the estimate of the number of expected false discoveries at distance K.

Selecting the Significant Genes SAM computes a test statistic Di for the ith gene. The user selects a distance. SAM computes the number of genes detected at that distance R, and estimates the expected number of false discoveries at that distance V to obtain an estimate of the FDR

Example for Random Normals If this is the plot for the data, the points indicated are the discoveries. For each permutation data set, we also compute the number of discoveries, and then obtain an estimate of V.

Running SAM • Write normalized data to a file compatible with Excel (tab or comma delimited). • Start Excel. First 2 columns should be gene ids. First row are numbers 1 ... T giving treatments. • Select rows and columns of spreadsheet that you want to analyze. • Click on SAM on GUI. Select type of analysis, random seed and number of permutations.

Running SAM • The SAM qqplot comes up. Select a distance or use slider to assess FDR. • Print genelist. The contrasts are:

Limma model-based can handle small numbers of replicates handles ANOVA-type problems including 1 random effect handles missing data produces a genelist and CIs can determine significance of any linear contrast hard to use Limma Vs SAM • SAM • nonparametric • cannot handle small numbers of replicates • handles limited ANOVA -type problems and survival • "imputes" missing data • produces only a genelist • only determines significance of deviation from mean • easy to use

Genomic Profiles of Brain Tissue in Humans and Chimpanzees II

Genomic Profiles of Brain Tissue in Humans and Chimpanzees II

Presentation Transcript

HUMANS AND NON-HUMANS

Nervous System: Nervous Tissue and Brain

REPONSE OF BRAIN TISSUE TO TRAUMA

In situ Localization of MT3 and Associated Proteins in Mouse Brain Tissue

Impact of Humans on Biosphere II

Genomic Profiles of Brain Tissue in Humans and Chimpanzees

Extraction of Optical Properties and Prediction of Light Distribution in Rat Brain Tissue

Chimpanzees

Histology of bone tissue premed II

Chimpanzees

Genomic meta-analysis in combining expression profiles

Chimpanzees

Chimpanzees

Lecture II: Genomic Methods

Impact of Humans on Biosphere II

Corso Teorico Pratico Tissue and Brain Banking

Chimpanzees and Skunks

Chimpanzees

Chimpanzees

Gestural communication in children and chimpanzees

Chimpanzees

Chimpanzees