280 likes | 371 Views
Project of CZ5225. Zhang Jingxian: g0800791@nus.edu.sg. Identifying biomarkers of drug response for cancer patients. Aims: To d evelop of predictors of response to drugs To learn how to get public microarray data To learn how to preprocess microarray raw data
E N D
Project of CZ5225 Zhang Jingxian: g0800791@nus.edu.sg
Identifying biomarkers of drug response for cancer patients • Aims: • To develop of predictors of response to drugs • To learn how to get public microarray data • To learn how to preprocess microarray raw data • To annotate the genes of interest
Requirements • Each group investigates: • ONE kind of cancer patient drug response • Need Two datasets from different studies • Download the raw data • Use Bioconductor in R to prepossess raw data • Identify certain number of genes • Annotate those identified genes in your report • Each group needs only ONE report
Requirements • All kinds of affymatrix expression datasets related to drug response of cancer patients are available • Dataset needs to contain at least 20 samples • Dataset needs two comparable outcome groups: response vs. non-response; resistance vs. non-resistance, et al.
Bioconductor & R • http://www.bioconductor.org
Advantages • Cross platform • Linux, windows and MacOS • Comprehensive and centralized • Analyzes both Affymetrix and two color spotted microarrays, and covers various stages of data analysis in a single environment • Cutting edge analysis methods • New methods/functions can easily be incorporated and implemented • Qualitycheck of data analysis methods • Algorithms and methods have undergone evaluation by statisticians and computer scientists before launch. And in many cases there are also literature references • Good documentations • Comprehensive manuals, documentations, course materials, course notes and discussion group are available • A good chance to learn statistics and programming
Installation R & Bionconductor • Install R from: http://cran.stat.nus.edu.sg/ • Open R platform then execute: >source("http://bioconductor.org/biocLite.R") >biocLite() • Check library by execute: >library()
Case study • Dataset source (GSE19697): http://www.ncbi.nlm.nih.gov/geo
Extraction raw data into: D://gse19697 • Create title.txt :
Open R • Set workdir by execute: • >setwd(‘d://gse19697’) • Load simpleaffy module by execute: • >library(simpleaffy) • Load data by: • >eset <- read.affy('title.txt')
Calculate expression by: • >eset.rma <- call.exprs(eset,'rma') • Compare two groups by: • >pc.result <- pairwise.comparison(eset.rma, "title", c("pCR", "RD"), eset)
Filter significant changed markers between two groups by: • >significant <- pairwise.filter(pc.result,fc=log2(1.5), tt=0.001)
Plot significant changed markers: • >plot(significant) • Annotate selected markers: • >significant
> significant <- pairwise.filter(pc.result,fc=log2(1), tt=0.001) • > pid<-rownames(significant@means) • >eset.hm<-eset.rma[pid,] • > install.packages("RColorBrewer") • > library(RColorBrewer) • > hmcol <- colorRampPalette(brewer.pal(10, "RdBu"))(256) • > spcol <- ifelse(eset.hm$title == "pCR", "goldenrod", "skyblue") • > heatmap(exprs(eset.hm), col = hmcol, ColSideColors = spcol)
Assignment 2 • Genetics of gene expression (eQTL) • Aim: to identify potential genetics various that causes differential expression • Deadline of report: two weeks before final examination
expression Quantitative Trait Locus (eQTL) • tries to find genomic variation to explain expression traits. • One difference between eQTL mapping and traditional QTL mapping is that, traditional mapping study focuses on one or a few traits, while in most of eQTL studies, thousands of expression traits will be analyzed and thousands of QTLs will be declared.
> biocLite(“GGtools”) • >biocLite(“GGdata”) • >library(GGtools) • >library(GGdata) • > c17 = getSS("GGdata", "17") • >/////get(“CSDA", revmap(illuminaHumanv1SYMBOL)) • > t1 = gwSnpTests(genesym("CSDA") ~ male, c17, chrnum("17")) • > /////t1 = gwSnpTests(probeId(" GI_21359983-S ") ~ male, c17, chrnum("17")) • > topSnps(t1) • >plot_EvG(genesym("CSDA"), rsid("rs7212116"), c17) • >//c_full = getSS(“GGdata", as.character(1:22))
Requirements for assignment 2 • Identify the genetics cause (eQTL) of the genes selected in assignment 1 • Get SNPs with significant association (<10e-4) from each chromosome • Paste the plot image for each association • Annotate SNPs in dbSNP • Submit a report for each group