Microbial Transcriptomics Data Analysis Workshop: Identifying Gene Expression Profiles

Workshop: Analysis of Microbial Transcriptomics Data - Data Analysis of mRNA expression profiles - Identifiaction of differential gene-expression Martien Caspers, BSc Remco Kort, Prof, Dr TNO, Microbiology & Systems Biology Zeist, NL 16 sep 2011, VU, Amsterdam

mRNA expression profiles • Arrays (this course): • Hybridisation of (fluorescently) labelled cDNA populations (whole cell) to arrays with probes of known DNA sequences / functions / genes • Result: Signal/probe • RNAseq: • Mass-sequencing of cDNA populations • Mass-Blast to annotated genome(s) • Result: nr of sequences per gene / genetic-element

Types of Arrays • Oligo arrays • Designed from annotated genomes • Industry made (Nimblegen, Affymetrix, Agilent) • ~50-80 bases/feature • ~5-20 um/feature • Lab-made arrays • Random genomic or cDNA fragments • 500-2000 bp PCR-fragments/feature • Printed on glass with spotter • 100-200 um / spot

Experimental use of arrays • Double hybridisation: 2 color • Treated Sample e.g. Cy5-labelled • Untreated sample (ctrl) e.g. Cy3-labelled • Each spot it’s own control  correction of spot differences/slide  needed for lab spotted arrays • Disadvantage: each slide needs ctrl • Single hybridisation (this presentation) • Industry arrays  reproducible spots from slide to slide • Each sample on separate slide

Study design Cell sorter Total Population GFP+ GFP- RNA  cDNA: 1x 1x 1x Klenov.Pol. Amplified cDNA: 1x 2x 2x 8 expression array hybridisations

Public MicroArray Data: Geo database • http://www.ncbi.nlm.nih.gov/geo/ • Study identifier: GSE16345 • B. subtilis heterogeneity motile/non-motile

Description NimbleGen array design • Name: TI224308_60mer_expr. • 4,104 genes from B.subtilis subsp. subtilis strain 168 NC_000964 • eight probe pairs (PM/MM) per gene. • Each probe is replicated 2 times. • The design includes control probes (random oligo’s).

Study design Cell sorter Total Population GFP+ GFP- RNA  cDNA: 1x 1x 1x Klenov.Pol. Amplified cDNA: 1x 2x 2x 8 expression array hybridisations

Geo Content download .pdf = array-oligo specs .ngd = array gene identifiers .pair = expression data (eight probe pairs (PM/MM) per gene) .ndf = grid.file used to read array features GPL7146_Layout_spotted_oligo_array_Bsub_9.2k.txt = array layout GSE16345_series_matrix.txt.gz = RMA-norm dataset  gene-oligosets  1 value

Array data (.pair)

Array data (RMA-normalised)(GSE16345_series_matrix.txt.gz)

Expression data “Noise” between Replicates Treatment1.2 (control) X/Y-axis (signal/spot) Differential expression Treatment2 Treatment1.1 (control)

Total-Signal Normalisation:Adapt Avg(S)SlideX to Avg(S)Control S=Signal of each spot in a slide B= Background / slide = Percentile(range,0.005) Assumption: Differential spots have minor effect on Avg(S-2B normalised Treatment1.2 (control) 2B B X/Y-axis (signal/spot) normalised Treatment2 2B B Treatment1.1 (control)

Total-Signal Normalisation:Adapt Avg(S)SlideX to Avg(S)Control • S=Signal of each spot in a slide • B= Background / slide • = Percentile(range,0.005) • Assumption: • Differential spots have minor effect on Avg(S-2B) • For Avg(S-2B) of SlideX use only data-points with significant S : • S>2B • Optional: S<saturation-value • Floor S: if S<2B  S=2BControl • Normalize all datapoints of SlideX: • Sn=(S-2B)/Avg(S-2B)X * Avg(S-B)Control + [BControl ] normalised Treatment1.2 (control) 2B B X/Y-axis (signal/spot) normalised Treatment2 2B B Treatment1.1 (control)

Normalisation and Ratio-calculation in Excel • Combine duplicate gene-probeset-data • Add ORF names • Add array Gene identifyers • Normalise using • Avg(S) of slide “TotNoAmp” and Avg(S) of all other slides • 2B (Backgr) of each slide • Calculate 2log(R) for relevant slide pairs (e.g. GFP + / -) • R/spot = S1/S2 for slide 1 and 2 • 2logR = symmetric up/down regulation • 2log2 = 1 • 2log1 = 0 • 2log(1/2) = -1

ORF names • Find in NCBI the Bsub Genome • Download the BSU and ORF name list to excel

Tprofiler:Identifies Significant Differential expression of Regulons, Pathways, Cognitive groups etc. Profilers Try First • http://www.science.uva.nl/~boorsma/t-profiler-bacillusnew/(needs ORF + R-column) • http://biocyc.org/LMON265669/expression.html • http://mgv2.cmbi.ru.nl/genome/index.html • http://www.geneontology.org/

Workshop Task • Identify Genes/Paths/Regulons related with motility related GFP-expression • Get data from internet (Geo, NCBI ….?) • Pre-process data in Excel (normalisation etc.) • Identify relevant scientific questions and corresponding array-pair comparisons and controls, e.g.: • +/- GFP • Reproducibility • Effect of cDNA-amplification

Microbial Transcriptomics Data Analysis Workshop: Identifying Gene Expression Profiles

Microbial Transcriptomics Data Analysis Workshop: Identifying Gene Expression Profiles

Presentation Transcript

Transcriptomics

Transcriptomics

Safety Data Analysis Tools Workshop

Post-Test Data Analysis Workshop

PSC 47410: Data Analysis Workshop

Microbial Community Analysis

The analysis of microbial proteomes : Strategies and data exploitation

Safety Data Analysis Tools Workshop

EuroSPI’99 Workshop on Data Analysis Popular Pitfalls of Data Analysis

Transcriptomics

Transcriptomics

Comparative transcriptomics of fungi

Data Analysis Workshop

Analysis of Microbial Community Structure

Multimodal Analysis Workshop: Visual Data

Transcriptomics sequencing

Action Research Workshop Data Analysis

Transcriptomics

Multimodal Analysis Workshop: Visual Data

Qualitative Data Analysis Workshop

transcriptomics market

Transcriptomics Market