190 likes | 214 Views
Workshop: Analysis of Microbial Transcriptomics Data. - Data Analysis of mRNA expression profiles - Identifiaction of differential gene-expression. Martien Caspers, BSc Remco Kort, Prof, Dr TNO, Microbiology & Systems Biology Zeist, NL 16 sep 2011, VU, Amsterdam. mRNA expression profiles.
E N D
Workshop: Analysis of Microbial Transcriptomics Data - Data Analysis of mRNA expression profiles - Identifiaction of differential gene-expression Martien Caspers, BSc Remco Kort, Prof, Dr TNO, Microbiology & Systems Biology Zeist, NL 16 sep 2011, VU, Amsterdam
mRNA expression profiles • Arrays (this course): • Hybridisation of (fluorescently) labelled cDNA populations (whole cell) to arrays with probes of known DNA sequences / functions / genes • Result: Signal/probe • RNAseq: • Mass-sequencing of cDNA populations • Mass-Blast to annotated genome(s) • Result: nr of sequences per gene / genetic-element
Types of Arrays • Oligo arrays • Designed from annotated genomes • Industry made (Nimblegen, Affymetrix, Agilent) • ~50-80 bases/feature • ~5-20 um/feature • Lab-made arrays • Random genomic or cDNA fragments • 500-2000 bp PCR-fragments/feature • Printed on glass with spotter • 100-200 um / spot
Experimental use of arrays • Double hybridisation: 2 color • Treated Sample e.g. Cy5-labelled • Untreated sample (ctrl) e.g. Cy3-labelled • Each spot it’s own control correction of spot differences/slide needed for lab spotted arrays • Disadvantage: each slide needs ctrl • Single hybridisation (this presentation) • Industry arrays reproducible spots from slide to slide • Each sample on separate slide
Study design Cell sorter Total Population GFP+ GFP- RNA cDNA: 1x 1x 1x Klenov.Pol. Amplified cDNA: 1x 2x 2x 8 expression array hybridisations
Public MicroArray Data: Geo database • http://www.ncbi.nlm.nih.gov/geo/ • Study identifier: GSE16345 • B. subtilis heterogeneity motile/non-motile
Description NimbleGen array design • Name: TI224308_60mer_expr. • 4,104 genes from B.subtilis subsp. subtilis strain 168 NC_000964 • eight probe pairs (PM/MM) per gene. • Each probe is replicated 2 times. • The design includes control probes (random oligo’s).
Study design Cell sorter Total Population GFP+ GFP- RNA cDNA: 1x 1x 1x Klenov.Pol. Amplified cDNA: 1x 2x 2x 8 expression array hybridisations
Geo Content download .pdf = array-oligo specs .ngd = array gene identifiers .pair = expression data (eight probe pairs (PM/MM) per gene) .ndf = grid.file used to read array features GPL7146_Layout_spotted_oligo_array_Bsub_9.2k.txt = array layout GSE16345_series_matrix.txt.gz = RMA-norm dataset gene-oligosets 1 value
Expression data “Noise” between Replicates Treatment1.2 (control) X/Y-axis (signal/spot) Differential expression Treatment2 Treatment1.1 (control)
Total-Signal Normalisation:Adapt Avg(S)SlideX to Avg(S)Control S=Signal of each spot in a slide B= Background / slide = Percentile(range,0.005) Assumption: Differential spots have minor effect on Avg(S-2B normalised Treatment1.2 (control) 2B B X/Y-axis (signal/spot) normalised Treatment2 2B B Treatment1.1 (control)
Total-Signal Normalisation:Adapt Avg(S)SlideX to Avg(S)Control • S=Signal of each spot in a slide • B= Background / slide • = Percentile(range,0.005) • Assumption: • Differential spots have minor effect on Avg(S-2B) • For Avg(S-2B) of SlideX use only data-points with significant S : • S>2B • Optional: S<saturation-value • Floor S: if S<2B S=2BControl • Normalize all datapoints of SlideX: • Sn=(S-2B)/Avg(S-2B)X * Avg(S-B)Control + [BControl ] normalised Treatment1.2 (control) 2B B X/Y-axis (signal/spot) normalised Treatment2 2B B Treatment1.1 (control)
Normalisation and Ratio-calculation in Excel • Combine duplicate gene-probeset-data • Add ORF names • Add array Gene identifyers • Normalise using • Avg(S) of slide “TotNoAmp” and Avg(S) of all other slides • 2B (Backgr) of each slide • Calculate 2log(R) for relevant slide pairs (e.g. GFP + / -) • R/spot = S1/S2 for slide 1 and 2 • 2logR = symmetric up/down regulation • 2log2 = 1 • 2log1 = 0 • 2log(1/2) = -1
ORF names • Find in NCBI the Bsub Genome • Download the BSU and ORF name list to excel
Tprofiler:Identifies Significant Differential expression of Regulons, Pathways, Cognitive groups etc. Profilers Try First • http://www.science.uva.nl/~boorsma/t-profiler-bacillusnew/(needs ORF + R-column) • http://biocyc.org/LMON265669/expression.html • http://mgv2.cmbi.ru.nl/genome/index.html • http://www.geneontology.org/
Workshop Task • Identify Genes/Paths/Regulons related with motility related GFP-expression • Get data from internet (Geo, NCBI ….?) • Pre-process data in Excel (normalisation etc.) • Identify relevant scientific questions and corresponding array-pair comparisons and controls, e.g.: • +/- GFP • Reproducibility • Effect of cDNA-amplification