1 / 18

Workshop: Analysis of Microbial Transcriptomics Data

Workshop: Analysis of Microbial Transcriptomics Data. - Data Analysis of mRNA expression profiles - Identifiaction of differential gene-expression. Martien Caspers, BSc Remco Kort, Prof, Dr TNO, Microbiology & Systems Biology Zeist, NL 16 sep 2011, VU, Amsterdam. mRNA expression profiles.

Download Presentation

Workshop: Analysis of Microbial Transcriptomics Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Workshop: Analysis of Microbial Transcriptomics Data - Data Analysis of mRNA expression profiles - Identifiaction of differential gene-expression Martien Caspers, BSc Remco Kort, Prof, Dr TNO, Microbiology & Systems Biology Zeist, NL 16 sep 2011, VU, Amsterdam

  2. mRNA expression profiles • Arrays (this course): • Hybridisation of (fluorescently) labelled cDNA populations (whole cell) to arrays with probes of known DNA sequences / functions / genes • Result: Signal/probe • RNAseq: • Mass-sequencing of cDNA populations • Mass-Blast to annotated genome(s) • Result: nr of sequences per gene / genetic-element

  3. Types of Arrays • Oligo arrays • Designed from annotated genomes • Industry made (Nimblegen, Affymetrix, Agilent) • ~50-80 bases/feature • ~5-20 um/feature • Lab-made arrays • Random genomic or cDNA fragments • 500-2000 bp PCR-fragments/feature • Printed on glass with spotter • 100-200 um / spot

  4. Experimental use of arrays • Double hybridisation: 2 color • Treated Sample e.g. Cy5-labelled • Untreated sample (ctrl) e.g. Cy3-labelled • Each spot it’s own control  correction of spot differences/slide  needed for lab spotted arrays • Disadvantage: each slide needs ctrl • Single hybridisation (this presentation) • Industry arrays  reproducible spots from slide to slide • Each sample on separate slide

  5. Study design Cell sorter Total Population GFP+ GFP- RNA  cDNA: 1x 1x 1x Klenov.Pol. Amplified cDNA: 1x 2x 2x 8 expression array hybridisations

  6. Public MicroArray Data: Geo database • http://www.ncbi.nlm.nih.gov/geo/ • Study identifier: GSE16345 • B. subtilis heterogeneity motile/non-motile

  7. Description NimbleGen array design • Name: TI224308_60mer_expr. • 4,104 genes from B.subtilis subsp. subtilis strain 168 NC_000964 • eight probe pairs (PM/MM) per gene. • Each probe is replicated 2 times. • The design includes control probes (random oligo’s).

  8. Study design Cell sorter Total Population GFP+ GFP- RNA  cDNA: 1x 1x 1x Klenov.Pol. Amplified cDNA: 1x 2x 2x 8 expression array hybridisations

  9. Geo Content download .pdf = array-oligo specs .ngd = array gene identifiers .pair = expression data (eight probe pairs (PM/MM) per gene) .ndf = grid.file used to read array features GPL7146_Layout_spotted_oligo_array_Bsub_9.2k.txt = array layout GSE16345_series_matrix.txt.gz = RMA-norm dataset  gene-oligosets  1 value

  10. Array data (.pair)

  11. Array data (RMA-normalised)(GSE16345_series_matrix.txt.gz)

  12. Expression data “Noise” between Replicates Treatment1.2 (control) X/Y-axis (signal/spot) Differential expression Treatment2 Treatment1.1 (control)

  13. Total-Signal Normalisation:Adapt Avg(S)SlideX to Avg(S)Control S=Signal of each spot in a slide B= Background / slide = Percentile(range,0.005) Assumption: Differential spots have minor effect on Avg(S-2B normalised Treatment1.2 (control) 2B B X/Y-axis (signal/spot) normalised Treatment2 2B B Treatment1.1 (control)

  14. Total-Signal Normalisation:Adapt Avg(S)SlideX to Avg(S)Control • S=Signal of each spot in a slide • B= Background / slide • = Percentile(range,0.005) • Assumption: • Differential spots have minor effect on Avg(S-2B) • For Avg(S-2B) of SlideX use only data-points with significant S : • S>2B • Optional: S<saturation-value • Floor S: if S<2B  S=2BControl • Normalize all datapoints of SlideX: • Sn=(S-2B)/Avg(S-2B)X * Avg(S-B)Control + [BControl ] normalised Treatment1.2 (control) 2B B X/Y-axis (signal/spot) normalised Treatment2 2B B Treatment1.1 (control)

  15. Normalisation and Ratio-calculation in Excel • Combine duplicate gene-probeset-data • Add ORF names • Add array Gene identifyers • Normalise using • Avg(S) of slide “TotNoAmp” and Avg(S) of all other slides • 2B (Backgr) of each slide • Calculate 2log(R) for relevant slide pairs (e.g. GFP + / -) • R/spot = S1/S2 for slide 1 and 2 • 2logR = symmetric up/down regulation • 2log2 = 1 • 2log1 = 0 • 2log(1/2) = -1

  16. ORF names • Find in NCBI the Bsub Genome • Download the BSU and ORF name list to excel

  17. Tprofiler:Identifies Significant Differential expression of Regulons, Pathways, Cognitive groups etc. Profilers Try First • http://www.science.uva.nl/~boorsma/t-profiler-bacillusnew/(needs ORF + R-column) • http://biocyc.org/LMON265669/expression.html • http://mgv2.cmbi.ru.nl/genome/index.html • http://www.geneontology.org/

  18. Workshop Task • Identify Genes/Paths/Regulons related with motility related GFP-expression • Get data from internet (Geo, NCBI ….?) • Pre-process data in Excel (normalisation etc.) • Identify relevant scientific questions and corresponding array-pair comparisons and controls, e.g.: • +/- GFP • Reproducibility • Effect of cDNA-amplification

More Related