1 / 55

Minimum Information About a Microarray Experiment - MIAME

Minimum Information About a Microarray Experiment - MIAME. Alvis Brazma European Bioinformatics Institute European Molecular Biology Laboratory. What is MIAME?.

Download Presentation

Minimum Information About a Microarray Experiment - MIAME

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Minimum Information About a Microarray Experiment - MIAME Alvis Brazma European Bioinformatics Institute European Molecular Biology Laboratory

  2. What is MIAME? • A document, the goal of which is to specify the minimum information that must be reported about a microarray experiment in order to ensure its interpretability, as well as potential verification of the results • Underlying motivation – • to enable the establishment of public repositories for microarray data • to serve as a basis for designing a microarray data exchange format

  3. Acknowledgements • MIAME working group • MAML working group • MGED steering committee • John Aach, Wilhelm Ansorge, Pascal Hingamp, Frank Holstege, Alex Lash, John Quackenbush, Alan Robinson, Paul Spellman, Criss Stoeckert, Martin Vingron

  4. MIAME history • A need to establish a public repository or repositories for microarray gene expression data became apparent in 1998 • That requires data standards • MGED 1 meeting in Cambridge in November, 1999 establishes five working groups, including the microarray data annotation group (MIAME) • Several MIAME drafts produced by the group • MGED steering committee meeting in November 2000 in Bethesda endorses a MIAME draft • Last revision yesterday in MIAME working group meeting

  5. Outline of this talk • Considerations behind the MIAME design – why • The MIAME details – what • Future developments and use of MIAME – how

  6. How to think about MIAME What minimum information about a microarray gene expression measuring experiment should be recorded in a database for the database entries to be usable on stand-alone basis: • the users may not know any background information that is not recorded • the database should be usable for automated data analysis and mining, i.e. not only on record-by-record basis • the data may be coming from different laboratories and different technology platforms

  7. Sample annotations Gene annotations Gene expression database – a conceptual view: Samples Gene expression matrix Genes Gene expression levels

  8. Three parts of a gene expression database • Gene annotation – might be given by links to gene sequence databases and GO – not perfect state of art, but lets not worry about it • Sample annotation – we do not have any external databases for sample description (except species taxonomy) – problem 1 • Gene expression matrix – what are the measurement units for gene expression levels? – problem 2

  9. Problem/consideration 1 – sample annotation • Gene expression data have any meaning only in the context of detailed description of the sample • If the data is going to be interpreted by independent parties, the information about the sample has to be in the database • Controlled vocabularies and ontologies (species, cell types, compound nomenclature, treatments, etc) are needed for unambiguous sample description, if it has to be queried

  10. Sample annotation – what can be done • Some use of free text descriptions are unavoidable • Controlled vocabularies and ontologies should be used wherever available • Externally defined controlled vocabularies and ontologies should be used whenever they exist

  11. Problem/consideration 2 – the lack of gene expression measurement units • What we would like to have • gene expression levels expressed in some standard units (e.g. molecules per cell) • reliability measure associated with each value (e.g. standard deviation) • What we do have • each experiment using different units • no reliability information

  12. cm inc Comparing expression data

  13. ? ? Comparing expression data

  14. Comparing expression data

  15. Raw data Intermediate data Final data Array scans Images Samples Genes Spots Gene expression levels Spot/Image quantiations From microarray images to gene expression data

  16. What to do in the absence of standard measurement units? • Record raw, intermediate and final analysis data together with the detailed annotation how the analysis has been performed • This effectively passes on the responsibility about interpreting the final analysis data to the user

  17. Measurement units • In perspective: • standard controls for experiments (on chips and in the samples) should be introduced • replicate measurements will become a norm • Temporary solution: • storing intermediate analysis results (including the images) and annotations of how they were obtained - i.e., the evidence

  18. Problem/consideration - 3 • We need to find a compromise found between the burden on the data producers to annotate and provide the data and the need of data to be sufficiently annotated for the database users • Too much detail may turn away the potential data providers and complicate the data submission and storage • Too little detail may limit the usability of the data • The current draft is a compromise between these two

  19. Some more general principles • MIAME is aimed at a cooperative data provider, not as a legal document designed to close all loop-holes • MIAME is an informal specification • The concept of ‘qualifier, value, source’ triplets, e.g., • qualifier – cell type • value – epithelial • source – Human Anatomy (author, edition) • The concept of ‘experimental protocol’

  20. General principles - continued • MIAME is not designed as a ‘questionnaire’ that can be filled in, but only as an informal specification based on which such a questionnaire, in fact, an annotation tool, can be based • Although MIAME is conceptually independent on databases, the aim of establishing a microarray database should be kept in mind then reading MAIME

  21. Outline of this talk • Considerations behind the MIAME design – why • The MIAME details – what

  22. Experiment Hybridisation Analysis Sample Array Source (e.g., Taxonomy) Gene (e.g., EMBL) A microarray experiment Publication (e.g. , PubMedCentral) External links ArrayExpress Normalisation Annotation of an experiment - a major challenge

  23. MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole 2. Array design: each array used and each element (spot) on the array 3. Samples: samples used, the extract preparation and labeling 4. Hybridizations: procedures and parameters 5. Measurements: images, quantitation, specifications 6. Controls: types, values, specifications www.mged.org

  24. MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole

  25. Part 1 - Experimental design: the set of the hybridisation experiments as a whole • Normally ‘an experiment’ should consist of one or more hybridisations that are in some way related and performed in a limited number of time, e.g. all related to the same publication • Author, contact information, citations • Type of experiment (e.g., time course, normal vs diseased comparison) • Experimental factors – i.e. tested parameters in the experiment (e.g. time, dose, genetic variation, response to a compound) • List of organisms used in the experiment • List of platforms used

  26. Experimental design - continued • List of samples, arrays and hybridisations and their relationships, e.g.: • Samples: S1, S2, S3 • Arrays: A1, A2, A3 • Hybridisations: • H1 is S1 and S2 on A1 • H2 is S2 and S3 on A2 • H3 is S1 and S2 on A3 • Which hybridisations are replicates, • e.g. H1 and H3 are replicates

  27. Experimental design – continued 2 • Quality related indicators • Optional user defined ‘qualifier, value, source’ triplet – e.g.: • qualifier – survival data • value – given • source – user defined • Description of the experiment or link to a publication

  28. MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole 2. Array design: each array used and each element (spot) on the array

  29. Part 2 - Array design: each array used and each element (spot) on the array • This part is separate for each type of array used in the experiment • For the database, the array description should be normally submitted only once • For each physical array used in the experiment a unique ID and the array type are given

  30. Array design – continued • Array design related information (e.g. platform type – insitu synthesized or spotted, array provider, surface type – glass, membrane, other, etc) • Properties of each type of elements on the array, that are generated by similar protocols (e.g. synthesized oligos, PCR products, plasmids, colonies, others) – may be simple or composite (Affymetrix) • Each element (spot) on the array

  31. Array design – continued • Each element (spot) on the array • Elements may be simple or composite • Each element must be identified by either the sequence, clone ID, PCR primer pair, or in any other unambiguous way • Composite elements may be identified by a reference sequence • May be linked to genes (preferably) • Will normally be provided in a separate file (e.g. spreadsheet)

  32. MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole 2. Array design: each array used and each element (spot) on the array 3. Samples: samples used, the extract preparation and labeling

  33. Part 3 - Samples: samples used, the extract preparation and labeling • Sample source and treatment • Organism (NCBI taxonomy) • Additional ‘qualifier, value, source’ list • cell source and type • developmental sage • organism part (tissue) • animal/plant strain or line • genetic variation • disease state or normal • … Typically only some of these qualifiers are relevant – an ontology tree is needed to implement the annotation tool for sample source and treatment

  34. Sample - continued • Hybridisation extract preparation • Laboratory protocol, including extraction method, whether RNA, mRNA, or genomic DNA is extracted, amplification method • Labelling • Laboratory protocol, including amount of nucleic acids labelled, label used (e.g. Cy3, Cy5, 33P, etc)

  35. Experiment Hybridisation Analysis Sample Array Source (e.g., Taxonomy) Gene (e.g., EMBL) A microarray experiment Publication (e.g. , PubMedCentral) External links ArrayExpress Normalisation Annotation of an experiment - a major challenge

  36. MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole 2. Array design: each array used and each element (spot) on the array 3. Samples: samples used, the extract preparation and labeling 4. Hybridizations: procedures and parameters 5. Measurements: images, quantitation, specifications 6. Controls: types, values, specifications

  37. Part 4 - Hybridizations: procedures and parameters • Laboratory protocol including • The solution (e.g. concentration of solutes) • Blocking agent • Wash procedure • Quantity of labelled target used • Time, concentration, volume, temperature • Description of the hybridisation instruments • Optional additional ‘qualifier, value, source’ list

  38. MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole 2. Array design: each array used and each element (spot) on the array 3. Samples: samples used, the extract preparation and labeling 4. Hybridizations: procedures and parameters 5. Measurements: images, quantitation, specifications

  39. Raw data Intermediate data Final data Array scans Images Samples Genes Spots Gene expression levels Spot/Image quantiations Raw, intermediate and final data

  40. Part 5 - Measurements: images, quantitation, specifications • Hybridisation scan raw data – image • Intermediate data – image analysis and quantiation • Final data – summarised information from possible replicates

  41. Raw data Array scans From microarray images to gene expression data

  42. Measurements continued • Image data • The scanner image file (e.g. TIFF, DAT) • Scanning information • Scan parameters, including laser power, spatial resolution, pixel space, PMT voltage • Laboratory protocol for scanning, including scanning hardware and software used

  43. Raw data Intermediate data Array scans Images Spots Spot/Image quantiations From microarray images to gene expression data

  44. Measurements continued • Image analysis and quantitation • Complete image analysis output (of the particular image analysis software) for each element – normally given as separate file (e.g. spreadsheet) • Image analysis information • Image analysis software specification • All parameters

  45. Row data Intermediate data Final data Array scans Images Samples Genes Spots Gene expression levels Spot/Image quantiations From microarray images to gene expression data

  46. Measurements continued • Summarised information from possible replicates • Derived measurement values summarising related elements as used by the author • Reliability information for these values, as used by the author (may be ‘unknown’) (these will be typically given in a spreadsheet) • Specifications of these two (e.g., median value of the replicates, standard deviation)

  47. MIAME six parts: 1. Experimental design: the set of the hybridisation experiments as a whole 2. Array design: each array used and each element (spot) on the array 3. Samples: samples used, the extract preparation and labeling 4. Hybridizations: procedures and parameters 5. Measurements: images, quantitation, specifications 6. Controls: types, values, specifications

  48. Part 6 - Controls: types, values, specifications • Normalisation strategy (spiking, housekeeping genes, total array, other) • Normalisation algorithm • Control array elements • Hybridisation extract preparation

  49. Outline of this talk • Considerations behind the MIAME design – why • The MIAME details – what • Future developments and use of MIAME – why

  50. How to use MIAME • Data exchange format (MAML) allowing to communicate MIAME information • Establishing MIAME compliant databases (e.g. ArrayExpress) • Developing annotation tools for generating MIAME compliant information • Journals and public funding agencies may establish MIAME related policies

More Related