300 likes | 424 Views
Gene Expression Data Analysis Lab Session. CAD course Jian Li 01.28. 2011. Gene expression signatures. Will be loosely defined here to mean a set of genes that are functionally associated with each other in some way.
E N D
Gene Expression Data Analysis Lab Session CAD course Jian Li 01.28. 2011
Gene expression signatures • Will be loosely defined here to mean a set of genes that are functionally associated with each other in some way. • When using expression profiling to define genes, a gene expression signature consists of two things: • A set of genes going “up” (relative to something). • A set of genes going “down” (relative to something).
MYC Ras E2F3 b-cat Src Five oncogenic pathway signatures in human cancers
(1) One combined signature (3,4) compare 5 signatures (2)
Excel functions/features you will need for the computational exercise
TTEST TTEST(array1,array2,tails,type) • array1 is the first data set. • array2 is the second data set. • tails specifies the # of distribution tails (Use “2”) • type is the kind of t-Test to perform (Use “2”).
AVERAGE AVERAGE(number1, number2) • Number1, number2, ... are 1 to 30 numeric arguments for which you want the average. • The arguments must either be numbers or be names, arrays, or references that contain numbers.
Data > Filter > AutoFilter • arrows appear to the right of the column labels • filtered items appear in blue. • complex criteria:rows that contain values within a specific range (e.g. p<0.01)
MATCH MATCH(lookup_value,lookup_array,match_type) • lookup_value what value are you looking for? • Lookup_array range of cells • match_type should be 0 for our purposes.
COUNT COUNT(range) • Only numbers in a range are counted. Empty cells, logical values, text, or error values in the array or reference are ignored. • range cells to count
Compare two signatures • Sig A: 1152 • Sig B: 119 • Genes on both platforms: 11079 • Genes shared by both gene signatures: 44 one-sided Fisher's exact test
R function for one-sided Fisher's exact testdhyper • Example: • 100 balls • 10 of the balls are red • I grab 20 balls • Five of my 20 balls are red • Was the number of red balls I selected a significant number ? > m<-10 #number of red balls > n<-90 #number of other balls (total pop-m) > k<-20 #number of balls selected > x<-0:k #vector of successes > 1-sum(dhyper(x,m,n,k)[1:5]) [1] 0.02546455
R function for one-sided Fisher's exact testdhyper • Sig A: 1162 • Sig B: 119Genes on both platforms: 11079Genes shared by both gene signatures: 44 > m<-119 #number of Sig B genes > n<-11079-119 #number of other genes > k<-1162 #number of Sig A genes > x<-0:k #vector of successes > 1-sum(dhyper(x,m,n,k)[1:44]) [1] 1.265654e-14
GSEA (rank-based) enrichment analysis All the genes in the dataset are used here Subramanian, Aravind et al. (2005) Proc. Natl. Acad. Sci. USA 102, 15545-15550 • Start from the top of the Ranked list. • Add points to “Random walk” for each gene you find in S. • Remove points from “Random walk” for each gene not in S.
GSEA (rank-based) enrichment analysis assign nominal P value
step 1 step 2 status/result
GSEA (rank-based) enrichment analysis (1) (3) (2) All the genes in the dataset are used here Subramanian, Aravind et al. (2005) Proc. Natl. Acad. Sci. USA 102, 15545-15550 • Start from the top of the Ranked list. • Add points to “Random walk” for each gene you find in S. • Remove points from “Random walk” for each gene not in S.
Rank-based approaches use all of the genes from one of the datasets to determine enrichment (does not make a “cut”). Ranked-based enrichment analysis Locations of genes from set Y Rank ordered genes from dataset X