220 likes | 596 Views
Microarray data analysis. Annamaria Carissimo carissimo@tigem.it. Outline. Microarray analysis Pipeline Practicals: Array Express Gene Ontology with David Tool Gene Set Enrichment Analysis (GSEA). What is a DNA microarray?.
E N D
Microarray data analysis Annamaria Carissimo carissimo@tigem.it
Outline • Microarray analysis Pipeline Practicals: • Array Express • Gene Ontology with David Tool • Gene Set Enrichment Analysis (GSEA)
What is a DNA microarray? • A grid of DNA spots on a substrate (chip) used to detect complementary sequences Monitoring the expression of several thousand genes at the same time
Probe Array Hybridized Array Detect Labeled cDNA/RNA Fluorescent Stain (for the data Acquisition) Hybridization on a chip Intensity -> how much hybridization occurred for each probe
Data flow Intensity files .CEL (Affymetrix) .txt (Illumina-Agilent) Image Processing Chip scanning DATA ANALYSIS USING OUR PIPELINE
Microarray analysis pipeline http://microarrayanalysis.tigem.it/index_i.html
Platform supported • 3’ Expression array Mouse-> MOE430A, Mouse430_2, MG_U74Av2 Human-> HG-U133A, HG-U133A_2, HG-U133_Plus_2 • Whole Transcript Expression and Exon array Mouse-> Mouse Gene 1.0 ST, Mouse Exon 1.0 ST Human-> Human Gene 1.0 ST, Human Exon 1.0 ST • Agilent GE 4x44 Human and Mouse -> two color and one color • Illumina Bead Chip Human and Mouse -> WG-6, Ref-8 and HT-12
Affymetrix 3’ microarray A chip consists of a number of probesets. Probesets are intended to measure expression for a specific mRNA Each probeset is complementary to a target sequence which is derived from one or more mRNA sequences Probesets consist of 25mer probe pairsselected from the target sequence: one Perfect Match (PM) and one Mismatch (MM) for each chosen target position. Each chip has a corresponding Chip Description File (CDF) which (among other things) describes probe locations and probeset groupings on the chip.
Target sequences and Probes Example: • 1415771_at: • Description: Mus musculus nucleolin mRNA, complete cds • LocusLink: AF318184.1 (NT sequence is 2412 bp long) • Target Sequence is 129 bp long 11 probe pairs tiling the target sequence gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt gagaagtcaaccatccaaaactctgtttgtcaaaggtctgtctgaggataccactgaagagaccttaaaagaatcatttgagggctctgttcgtgcaagaatagtcactgatcgggaaactggttctt
Affymetrix probeset Perfect match ctgtctgaggataccactgaagaga Probe pair ctgtctgaggattccactgaagaga Mismatch probe pairs values summarization ONE probeset value
Background correction and Normalization Compare different samples on different microarray chips Example Control Tratment Sample1- sample2 - sample3 Sample1 - sample2 - sample3 replicates replicates Normalize all together
Differentially expression We want to compare two biologically different conditions through the identification of differentially expressed genes Example Control Tratment Sample1- sample2 - sample3 Sample1 - sample2 - sample3 replicates replicates T-test for each gene
Processing Microarray data(from .CEL files to gene expression) • Background correction • Normalization • Expression summary Microarray Analysis Suite (MAS5) (Affimetrix proprietary method ) Robust Multy-array Average (RMA) (Irizarry (2003)) • Identifying significant expressed genes in treatment versus control • Bayesian t-test (Cyber-T tool) – Multiple testing correction-> False discovery • rate (FDR) • Paired or unpaired design? • Output is a text file (Excel) with the resulting analysis.
Microarray Pipeline - step 1upload your .CEL files • On Mac:
Microarray Pipeline - step 1upload your .CEL files • On Windows: