540 likes | 897 Views
A Review/Update of the ERCC & MAQC Microarray Consortia and Some Applications of Their Findings “Expressionist” Seminar Group Johns Hopkins School of Public Health Ernest S. Kawasaki NCI Advanced Technology Center Microarray Facility August 9, 2006. ERCC Summary/Update
E N D
A Review/Update of the ERCC & MAQC Microarray Consortia and Some Applications of Their Findings “Expressionist” Seminar Group Johns Hopkins School of Public Health Ernest S. Kawasaki NCI Advanced Technology Center Microarray Facility August 9, 2006
ERCC Summary/Update External RNA Controls Consortium MAQC Summary/Update MicroArray Quality Control Consortium Possible Use of ERCC/MAQC Standards & Large Data Set
Organizations/Consortia Developing Standards & Controls for Gene Expression Profiling Technologies • MGED -- Microarray Gene Expression Database • Standard for data reporting (MIAME) • MAQC -- Microarray Quality Control Group • FDA sponsored RNA standards, ref • datasets, etc. (Leming Shi et al) • ERCC -- External RNA Controls Consortium • (M. Salit, J. Warrington et al) • NIST -- Metrology for Gene Expression Program provides a better understanding of the • fundamentals of microarray technologies • (M. Salit, M. Satterfield, et al)
Rapid Increase in Microarray Publications 2005 -- The 10 Year Anniversary of the First Expression Microarray 5400! No common standards are used across platforms so data are difficult or impossible to compare. 5000 - 4350 4000 - Number of publications 3110 3000 - 2000 2000 - 1125 55 140 425 0 2005 1995-8 1999 2000 2001 2002 2003 2004 Yearly Summary From PubMed
Proliferation of Whole Genome Arrays ABI 60mer 31,000 Probe Sets Affymetrix 25mer 54,000 “ “ Agilent 60mer 44,000 “ “ GE Amersham 30mer 55,000 “ “ Illumina 50mer 46,000 “ “ Microarrays Inc. 70mer 49,000 “ “ NimbleGen 60mer 38,000 “ “ Phalanx Biotech 60mer ~30,000 “ “ Home Brew 70mer ~40,000 “ “ cDNA Etc, Many other companies (Combimatrix) making smaller custom arrays. DNA-DNA hybrid occupies ~4nm2 on slide surface
ERCCExternalRNAControlConsortiumConception in March 2003Stanford University The Private, Public, and Academic sectors working together to produce control materials for gene expression analysis. Mark Salit NIST/ Janet Warrington Affymetrix
Mission of the ERCC The ERCC is developing external RNA controls useful for gene expression assays in Microarrays & QRT-PCR on a wide variety of platforms. J. Warrington -- Affymetrix
Members of the ERCCMore than 70 and counting….A good mix of academic, government and commercial organizations with ~115 scientists, 10 countries FDA, CBER FDA, CDER FDA, CDRH FDA, NCTR FDA, OIVD GE Healthcare Genetics Society of Vietnam Harvard University Illumina Informax, Inc. International Federation of Clinical Chemistry & Laboratory Medicine Invitrogen Johns Hopkins University Lawrence Livermore Lab LGC Marine Molecular Quality Controls Mayo Clinic National Institute of Standards & Technology NIH, National Cancer Institute Northwestern Affymetrix Agilent Ambion Applied Biosystems ATCC Biomerieux BMS Cambridge University Capital Bio Celera Diagnostics Cenetron Centers for Disease Control Centers for Medicare & Medicaid Services Clinical & Laboratory Standards Institute Clinical Hospital Center Zagreb Combimatrix Eli Lilly Eppendorf Microarray Division Expression Analysis Nugenic Qiagen Queens University Hospital Roche Molecular Systems Stanford University Stratagene Tokyo University UCLA University Health Network US Department of Agriculture Veridex, Johnson & Johnson Vialogy Vigentech Etc, etc, etc J. Warrington -- Affymetrix
The ERCC is producing standardized expression controls, analysis tools and protocols • Well-characterized, widely accepted RNA standard controls for multiple platforms • Certified Reference Material (CRM) • Protocols for multiple applications, research and the clinical laboratory (CLSI – Clinical & Laboratory Standards Inst) Approved July 2006! • Software tools to support development work • Software tools to support multiple applications J. Warrington -- Affymetrix
Control Sequences June 2006 L. Reid -- Expression Analysis, J. Warrington -- Affymetrix
Testing Strategy for RNA Controls • Design and development -- generate reagents -- ~100 in place w/70 sequenced • Prototype testing -- validate reagents • Proof of concept -- validate the assays • Functional testing -- validate the product • Performance review -- analyze all data • Testing begins in the 4th quarter. L. Reid et al
Uses of RNA Controls/Standards • Negative Controls • -- Determine “true” background • -- QC for slide quality, hybridization, etc. • Positive Controls • -- QC as above • -- Labeling efficiency • -- Dilution series, determine sensitivity of assay, • determine lowest conc. with reliable signal • -- Ratiometric series, normalization tool • Will allow better comparison of intra or inter lab data and with the same or different array platforms.
Tests for Validation of ERCC Controls • Negative control test – background studies • Cross-hybridization – determine if any of the • controls hybridize to each other or to mRNAs • Labeling test – determine efficiency in the • presence of complex RNA sample • Latin square – test controls over a range of • concentrations (1:5,000,000 to 1:1000) • Linear range test and ratiometric studies • Above studies will require ~102 arrays per site!
Latin Squares Design for Testing Controls A1 – A4 = the 4 arrays used G1 – G4 = the 4 transcripts being studied L1 – L4 = the 4 concentrations of each transcript L. Reid, BMC Genomics 6:150
ERCC Test Sites • >100 Arrays/Site for Validating Controls • Affymetrix • GE Healthcare • Illumina • NIAID • Novartis • Qiagen • Agilent, ABI, Roche maybe
The MAQC Project • MicroArray Quality Control • An FDA sponsored consortium (Leming Shi) • Founded to address concerns of microarray • community concerning reproducibility of • expression profiling experiments. • Group consists of over 140 members from • academia, government, pharma & biotech. • A large study was designed to compare ex- • pression data from 10 different platforms and • 40 different test sites with >650 arrays. • Study has been completed and results will be • published in Nature Biotechnology. Data will • released next month.
MAQC Study Goals/Exptl. Design • Establish a set of reference standards for use in the • MAQC, but more importantly for the array community • Generate large collection of reference data sets using • multiple microarray platforms and many diff. labs…. • . • . • Promote the use of reference RNA samples….. • Make recommendations on the appropriate uses of • microarray technology. • The MAQC group first tested multiple RNAs with 160 arrays and then chose two for titration studies with 200 arrays. Two RNAs at two concentrations were chosen for repeated (5 arrays per sample) assays for four pools. The samples were UHRR from Stratagene and Human Brain Ref from Ambion. The four pools were: A. 100% UHRR • B. 100% HBRR C. 75% UHRR: 25% HBRR D. 25% UHRR:75% HBRR. • At the completion of this study there is data from over 1026 arrays!
Platforms Used In MAQC Study ABI(Applied Biosystems)One-Color Array 32,878 Probes AFX (Affymetrix) One-Color Array 54,675 Probes AGL (Agilent) Two-Color Array 43,931 Probes AGI (Agilent) One-Color Array 43,931 Probes CBC (CapitalBioCorp) One & Two Color 23,231 Probes EPP (Eppendorf) One-Color Array 294 Probes GEH (GE Healthcare) One-Color Array 54,359 Probes ILM (Illumina) One-Color Array 47,293 Probes NCI (NCI-Operon) Two-Color Array 37,632 Probes TCI (TeleChem Int) One & Two Color 27,648 Probes TAQ (Applied Biosystems) TaqMan® Assays 1,004 PCRs QGN (Panomics) QuantiGene Assay 245 Probes GEX (GeneExpress) StaRT-PCR™ Assay 205 Probes
MAQC STUDY DESIGN 12,091 Genes Used for Com- parison Across All Platforms. (Damir Herman, Jean Thierry-Mieg)
Take Home Messages/General Findings From MAQC Study • Large data sets are available for objectively • assessing platform performance and various • data analysis algorithms. • Microarray technology is reproducible and • reliable when one has an understanding of • its limitations. • Cross platform analyses requires a very • careful annotation & mapping of probe • sequences. • All the platforms had good intra-lab repeat- • ability, and inter-lab reproducibility after • removal of outliers. • Methods of microarray analysis are an impor- • tant variable, and this large data set will help • resolve issues in this area (statisticians and • bioinformaticists take delight……..)
Manuscripts in MAQC Study -- Entire Issue of • Nature Biotechnology Sept. 2006 • Editorial • FDA Forward • Stanford - Data quality in genomics and microarrays • Impact of microarray data quality in genomic data • submissions to the FDA • US EPA efforts to develop a framework for using • genomics data in risk assessment and regulatory • decision making. • MAQC main manuscript – overall description • The reproducibility of differentially expressed gene • lists in microarray studies* • An analysis and comparison of alternative platforms • Use of RNA titrations to assess platform performanc • Performance of one-color vs two-color arrays
MAQC Manuscripts (cont.) • External RNA controls for assessment of microarray • analytical performance • Normalization and technical variation in gene • expression measurements* • Toxigenomics and microarrays: biological response • measurements are preserved across platforms • Reproducibility probability score: A metric incorp- • orating measurement variability across labs for • gene comparison* • Late news: 9 manuscripts submitted and 6 were accepted. With 3 commentaries there are 9 articles in the Sept. Nature Biotechnology Suppl. from the MAQC.
With proper use of negative and positive controls, microarrays may be used to identify, quantitate expression and count the absolute number of genes being expressed in any given cell or tissue sample. ………Anonymous…………. aka ESK Nature May 25, 2006
Present (P)& Absent (A) Calls in Spotted Long Oligo Arrays • “Average” cell expresses <10,000 genes. • “Whole” genome array contains >25,000 genes. • Therefore, Present calls should be 40% or less or 60% • Absent. • However, P calls are usually 90% or more using usual • image analysis systems like GenePix. • Why is this? Why do we care? • Good negative controls may resolve this issue.
What is Background? Articles are still being written about how to determine “true” background. Controls can be used to settle this issue. Internal Background External Background Li et al (2005) Bioinformatics 21:2875
Common Methods for Background Subtraction W Yin et al (2005) Bioinformatics 21:2410
Use of Negative Controls for Background Subtraction Internal Background ~ 500-1000 units External Background ~ 100-200 “ %Present using external = 96% %Present using internal = 77% = 21,565/22,464 vs 17,010/22,464 Bckgrd subt eliminated 4,555 genes from further analysis. Good or bad?? Use of negative controls can dramatically change values for % genes expressed and gene expression ratios!
Low Signal Negative Control Background N External Background Negative Controls & Background
Signal distribution of noise background (B), negative control background (median)(neg) and mean intensities of all probes (F) on the slide separated by Cy5 and Cy3 channels
Influence of Type of Background Subtraction on Expression Ratios • Assume control sample gene has signal of 600 units. • The experimental has a signal of 5600 in same gene. • The external background is 100 units. • Therefore, the calculated ratio value would be 11. • 5500/500 = 11 • But if the negative control background is 500, the • ratio is now 51. • 5100/100 = 51 • Use of negative controls as background may relieve • some of the “compression” in ratios for these types of • arrays and give a more accurate expression value.
Box plots of CV (data are loess normalized, one set with negative bg sub, another set without) – this figure shows background subtraction could improve the data quality 1 2 3 4 5 6 7 8 9 10 11 12 1. jurkat; 2. jurkat_neg; 3. L428l; 4. L428_neg; 5. lncap; 6. lncap_neg; 7. mcf; 8. mcf_neg; 9. oci; 10. oci_neg; 11. sud; 12sud_neg
Probability of True Positives and True Negatives Using 3 SD Cutoff
Sensitivity & Specificity Cutoff Threshold At a 3 SD Value
Perfect Match (PM) and Mis-Match (MM) Courtesy Eric Hoffman
Perfect Match (PM) and Mismatch (MM): The Affy Image Quantitation Methods GCOS (Gene Chip Operating System): default Affy analysis software. RMA (Robust Multiarray Average): Irizarry method using only PM signals. GCRMA: Similar to RNA but takes into account GC content dChip: Similar to GCOS but has with or without MM options. YW Chip: The Yonghong Wang method. PM only with only sequence validated oligos used in analysis.
Influence of Different Methods of Background Subtraction Correlation Between 2 Technical Replicates – Affy Chips GCOS No background subtraction MAS background subtraction RMA background subtraction
PM Only vs PM-MM Analyis of Technical Replicates PM Only PM-MM Log2 Intensities Mean Values Intensities S.D. Dist. Probesets 4 Reps PM MM PM MM PM PM
Correlation Study of Gene With Absent Calls Genes here were called absent by GCOS in 8 hybs from 2 technical replicates. Data indicates that absent calls may not be truly absent in many cases.
The MM Probes: C or T at 13th Position May Result in Artefactual High Signal: 92% of All MM with Higher Signal Than PM have C or T
Probe Mapping Data Will Be Available for All Platforms Used In The MAQC Study
Analysis of Probe Sequences Within Probe Sets in Affy Gene Chip # of “Correct” or Mapped# Probe Sets in Each Oligos/Probe SetCategory 1 692 2 514 3 433 4 450 5 425 6 499 7 626 8 862 9 1608 10 3771 11 36562
How The ERCC & MAQC Can Increase The • Reliability/Acceptance of Microarray Data • A set of controls used by all expression platforms will • go a long way to end confusion about comparability • of data from related experiments. • Probe mapping and sequences from all platforms will • be extremely useful for cross platform comparisons. • Very large data set from all major platforms will point • out problem areas in present protocols/technologies, • which, hopefully, will result in their improvement. • Large data sets from ERCC and MAQC combined will • provide a great resource for critically evaluating algo- • rithms used in analyzing arrays. Which analysis • method provides “true” answers? • Hopefully, a (workable) consensus about utilization of • microarray technologies will arise from these two large • exercises in (sometimes a bit contentious) human • scientific cooperation.
In Closing……. My attempt at being funny….. USF Is your back to the wall? Are you under a lot of pressure?
Do you feel you’re on the Treadmill of Life? Moebius Strip II by M.C. Escher Nature vol. 246, p776, 2003
Keep on smilin’, ‘caus when you’re smilin’, the whole world smiles with you…… 100 nm Nano Smiley DNAs --- Many Happy Genomes Courtesy P Rothemund Nature v440p297y06