550 likes | 568 Views
Minimum Information About a Microarray Experiment - MIAME. Alvis Brazma European Bioinformatics Institute European Molecular Biology Laboratory. What is MIAME?.
E N D
Minimum Information About a Microarray Experiment - MIAME Alvis Brazma European Bioinformatics Institute European Molecular Biology Laboratory
What is MIAME? • A document, the goal of which is to specify the minimum information that must be reported about a microarray experiment in order to ensure its interpretability, as well as potential verification of the results • Underlying motivation – • to enable the establishment of public repositories for microarray data • to serve as a basis for designing a microarray data exchange format
Acknowledgements • MIAME working group • MAML working group • MGED steering committee • John Aach, Wilhelm Ansorge, Pascal Hingamp, Frank Holstege, Alex Lash, John Quackenbush, Alan Robinson, Paul Spellman, Criss Stoeckert, Martin Vingron
MIAME history • A need to establish a public repository or repositories for microarray gene expression data became apparent in 1998 • That requires data standards • MGED 1 meeting in Cambridge in November, 1999 establishes five working groups, including the microarray data annotation group (MIAME) • Several MIAME drafts produced by the group • MGED steering committee meeting in November 2000 in Bethesda endorses a MIAME draft • Last revision yesterday in MIAME working group meeting
Outline of this talk • Considerations behind the MIAME design – why • The MIAME details – what • Future developments and use of MIAME – how
How to think about MIAME What minimum information about a microarray gene expression measuring experiment should be recorded in a database for the database entries to be usable on stand-alone basis: • the users may not know any background information that is not recorded • the database should be usable for automated data analysis and mining, i.e. not only on record-by-record basis • the data may be coming from different laboratories and different technology platforms
Sample annotations Gene annotations Gene expression database – a conceptual view: Samples Gene expression matrix Genes Gene expression levels
Three parts of a gene expression database • Gene annotation – might be given by links to gene sequence databases and GO – not perfect state of art, but lets not worry about it • Sample annotation – we do not have any external databases for sample description (except species taxonomy) – problem 1 • Gene expression matrix – what are the measurement units for gene expression levels? – problem 2
Problem/consideration 1 – sample annotation • Gene expression data have any meaning only in the context of detailed description of the sample • If the data is going to be interpreted by independent parties, the information about the sample has to be in the database • Controlled vocabularies and ontologies (species, cell types, compound nomenclature, treatments, etc) are needed for unambiguous sample description, if it has to be queried
Sample annotation – what can be done • Some use of free text descriptions are unavoidable • Controlled vocabularies and ontologies should be used wherever available • Externally defined controlled vocabularies and ontologies should be used whenever they exist
Problem/consideration 2 – the lack of gene expression measurement units • What we would like to have • gene expression levels expressed in some standard units (e.g. molecules per cell) • reliability measure associated with each value (e.g. standard deviation) • What we do have • each experiment using different units • no reliability information
cm inc Comparing expression data
? ? Comparing expression data
Raw data Intermediate data Final data Array scans Images Samples Genes Spots Gene expression levels Spot/Image quantiations From microarray images to gene expression data
What to do in the absence of standard measurement units? • Record raw, intermediate and final analysis data together with the detailed annotation how the analysis has been performed • This effectively passes on the responsibility about interpreting the final analysis data to the user
Measurement units • In perspective: • standard controls for experiments (on chips and in the samples) should be introduced • replicate measurements will become a norm • Temporary solution: • storing intermediate analysis results (including the images) and annotations of how they were obtained - i.e., the evidence
Problem/consideration - 3 • We need to find a compromise found between the burden on the data producers to annotate and provide the data and the need of data to be sufficiently annotated for the database users • Too much detail may turn away the potential data providers and complicate the data submission and storage • Too little detail may limit the usability of the data • The current draft is a compromise between these two
Some more general principles • MIAME is aimed at a cooperative data provider, not as a legal document designed to close all loop-holes • MIAME is an informal specification • The concept of ‘qualifier, value, source’ triplets, e.g., • qualifier – cell type • value – epithelial • source – Human Anatomy (author, edition) • The concept of ‘experimental protocol’
General principles - continued • MIAME is not designed as a ‘questionnaire’ that can be filled in, but only as an informal specification based on which such a questionnaire, in fact, an annotation tool, can be based • Although MIAME is conceptually independent on databases, the aim of establishing a microarray database should be kept in mind then reading MAIME
Outline of this talk • Considerations behind the MIAME design – why • The MIAME details – what
Experiment Hybridisation Analysis Sample Array Source (e.g., Taxonomy) Gene (e.g., EMBL) A microarray experiment Publication (e.g. , PubMedCentral) External links ArrayExpress Normalisation Annotation of an experiment - a major challenge
MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole 2. Array design: each array used and each element (spot) on the array 3. Samples: samples used, the extract preparation and labeling 4. Hybridizations: procedures and parameters 5. Measurements: images, quantitation, specifications 6. Controls: types, values, specifications www.mged.org
MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole
Part 1 - Experimental design: the set of the hybridisation experiments as a whole • Normally ‘an experiment’ should consist of one or more hybridisations that are in some way related and performed in a limited number of time, e.g. all related to the same publication • Author, contact information, citations • Type of experiment (e.g., time course, normal vs diseased comparison) • Experimental factors – i.e. tested parameters in the experiment (e.g. time, dose, genetic variation, response to a compound) • List of organisms used in the experiment • List of platforms used
Experimental design - continued • List of samples, arrays and hybridisations and their relationships, e.g.: • Samples: S1, S2, S3 • Arrays: A1, A2, A3 • Hybridisations: • H1 is S1 and S2 on A1 • H2 is S2 and S3 on A2 • H3 is S1 and S2 on A3 • Which hybridisations are replicates, • e.g. H1 and H3 are replicates
Experimental design – continued 2 • Quality related indicators • Optional user defined ‘qualifier, value, source’ triplet – e.g.: • qualifier – survival data • value – given • source – user defined • Description of the experiment or link to a publication
MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole 2. Array design: each array used and each element (spot) on the array
Part 2 - Array design: each array used and each element (spot) on the array • This part is separate for each type of array used in the experiment • For the database, the array description should be normally submitted only once • For each physical array used in the experiment a unique ID and the array type are given
Array design – continued • Array design related information (e.g. platform type – insitu synthesized or spotted, array provider, surface type – glass, membrane, other, etc) • Properties of each type of elements on the array, that are generated by similar protocols (e.g. synthesized oligos, PCR products, plasmids, colonies, others) – may be simple or composite (Affymetrix) • Each element (spot) on the array
Array design – continued • Each element (spot) on the array • Elements may be simple or composite • Each element must be identified by either the sequence, clone ID, PCR primer pair, or in any other unambiguous way • Composite elements may be identified by a reference sequence • May be linked to genes (preferably) • Will normally be provided in a separate file (e.g. spreadsheet)
MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole 2. Array design: each array used and each element (spot) on the array 3. Samples: samples used, the extract preparation and labeling
Part 3 - Samples: samples used, the extract preparation and labeling • Sample source and treatment • Organism (NCBI taxonomy) • Additional ‘qualifier, value, source’ list • cell source and type • developmental sage • organism part (tissue) • animal/plant strain or line • genetic variation • disease state or normal • … Typically only some of these qualifiers are relevant – an ontology tree is needed to implement the annotation tool for sample source and treatment
Sample - continued • Hybridisation extract preparation • Laboratory protocol, including extraction method, whether RNA, mRNA, or genomic DNA is extracted, amplification method • Labelling • Laboratory protocol, including amount of nucleic acids labelled, label used (e.g. Cy3, Cy5, 33P, etc)
Experiment Hybridisation Analysis Sample Array Source (e.g., Taxonomy) Gene (e.g., EMBL) A microarray experiment Publication (e.g. , PubMedCentral) External links ArrayExpress Normalisation Annotation of an experiment - a major challenge
MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole 2. Array design: each array used and each element (spot) on the array 3. Samples: samples used, the extract preparation and labeling 4. Hybridizations: procedures and parameters 5. Measurements: images, quantitation, specifications 6. Controls: types, values, specifications
Part 4 - Hybridizations: procedures and parameters • Laboratory protocol including • The solution (e.g. concentration of solutes) • Blocking agent • Wash procedure • Quantity of labelled target used • Time, concentration, volume, temperature • Description of the hybridisation instruments • Optional additional ‘qualifier, value, source’ list
MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole 2. Array design: each array used and each element (spot) on the array 3. Samples: samples used, the extract preparation and labeling 4. Hybridizations: procedures and parameters 5. Measurements: images, quantitation, specifications
Raw data Intermediate data Final data Array scans Images Samples Genes Spots Gene expression levels Spot/Image quantiations Raw, intermediate and final data
Part 5 - Measurements: images, quantitation, specifications • Hybridisation scan raw data – image • Intermediate data – image analysis and quantiation • Final data – summarised information from possible replicates
Raw data Array scans From microarray images to gene expression data
Measurements continued • Image data • The scanner image file (e.g. TIFF, DAT) • Scanning information • Scan parameters, including laser power, spatial resolution, pixel space, PMT voltage • Laboratory protocol for scanning, including scanning hardware and software used
Raw data Intermediate data Array scans Images Spots Spot/Image quantiations From microarray images to gene expression data
Measurements continued • Image analysis and quantitation • Complete image analysis output (of the particular image analysis software) for each element – normally given as separate file (e.g. spreadsheet) • Image analysis information • Image analysis software specification • All parameters
Row data Intermediate data Final data Array scans Images Samples Genes Spots Gene expression levels Spot/Image quantiations From microarray images to gene expression data
Measurements continued • Summarised information from possible replicates • Derived measurement values summarising related elements as used by the author • Reliability information for these values, as used by the author (may be ‘unknown’) (these will be typically given in a spreadsheet) • Specifications of these two (e.g., median value of the replicates, standard deviation)
MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole 2. Array design: each array used and each element (spot) on the array 3. Samples: samples used, the extract preparation and labeling 4. Hybridizations: procedures and parameters 5. Measurements: images, quantitation, specifications 6. Controls: types, values, specifications
Part 6 - Controls: types, values, specifications • Normalisation strategy (spiking, housekeeping genes, total array, other) • Normalisation algorithm • Control array elements • Hybridisation extract preparation
Outline of this talk • Considerations behind the MIAME design – why • The MIAME details – what • Future developments and use of MIAME – why
How to use MIAME • Data exchange format (MAML) allowing to communicate MIAME information • Establishing MIAME compliant databases (e.g. ArrayExpress) • Developing annotation tools for generating MIAME compliant information • Journals and public funding agencies may establish MIAME related policies