190 likes | 509 Views
Microarray Databases and MIAME ( Minimum Information About a Microarray Experiment ). Yong Liu Bioinformatics Unit. Outline. Review of microarray technology from data/database perspective Motivation behind the MIAME standard MIAME: what’s in it? Current existing microarray databases
E N D
Microarray Databases and MIAME (Minimum Information About a Microarray Experiment) Yong Liu Bioinformatics Unit
Outline • Review of microarray technology from data/database perspective • Motivation behind the MIAME standard • MIAME: what’s in it? • Current existing microarray databases • Future development
No differential expression Induced Repressed DNA Microarray Technology Cy5: ~650 nm Cy3: ~550 nm
Context is Everything ! • “An observed phenotype is specific for the conditions under study” (Pat Brown, Stanford University) • Information recorded in microarray database should be used on standalone basis • Any background information • Automated data analysis and mining, i.e. not only on record-by-record basis • Data from different laboratories and different technology platforms
How Much Data? • Experiments • 100 000 genes in human • 320 cell types • 2000 compounds • 3 time points • 2 concentrations • 2 replicates • Data volume • 8 x 1011 data-points • 1 x 1015 = 1 petaB of data
Images Samples Genes Spots Gene expression levels Spot/Image quantiations Gene Expression Matrix The final gene expression matrix (on the right) is needed for higher level analysis and mining.
MGED and MIAME • A need to establish a public repository or repositories for microarray gene expression data became apparent in 1998, which requires data standards • MGED-1 (Microarray Gene Expression Database) Group: November 14-15, 1999, Cambridge, UK • Established five working groups, including the microarray data annotation group (MIAME) • MGED-2: May 25 - 27, 2000, Heidelberg, Germany • Endorsed a MIAME draft • MGED-3: March 29-31, 2001, Stanford University • Adopted MIAME 1.0 • MGED-4: Feb. 13-16, 2002, Boston • Adopted MIAME 1.1
MIAME Part 1 - Experimental Design: the set of the hybridisation experiments as a whole • Author, contact information, citations • Type of experiment (e.g., time course, normal vs diseased comparison) • Experimental factors – i.e. tested parameters in the experiment (e.g. time, dose, genetic variation, response to a compound) • List of organisms used in the experiment • List of platforms used
MIAME Part 2 - Array Design: each array used and each element (spot) on the array • Array design related information (e.g. platform type – insitu synthesized or spotted, array provider, surface type – glass, membrane, other, etc) • Properties of each type of elements on the array, that are generated by similar protocols (e.g. synthesized oligos, PCR products, plasmids, colonies, others) – may be simple or composite (Affymetrix) • Each element (spot) on the array
MIAME Part 3 - Samples: samples used, the extract preparation and labeling • Sample source and treatment • Hybridisation extract preparation • Laboratory protocol, including extraction method, whether RNA, mRNA, or genomic DNA is extracted, amplification method • Labelling • Laboratory protocol, including amount of nucleic acids labelled, label used (e.g. Cy3, Cy5, 33P, etc)
MIAME Part 4 - Hybridizations: procedures and parameters • The solution (e.g. concentration of solutes) • Blocking agent • Wash procedure • Quantity of labelled target used • Time, concentration, volume, temperature • Description of the hybridisation instruments
MIAME Part 5 - Measurements: images, quantitation, specifications • Scanning information • Scan parameters, including laser power, spatial resolution, pixel space, PMT voltage • Laboratory protocol for scanning, including scanning hardware and software used • Image analysis information • Image analysis software specification • All parameters • Summarised information from possible replicates
MIAME Part 6 – Normalization: types, values, specifications • Normalisation strategy (spiking, housekeeping genes, total array, other) • Normalisation algorithm • Control array elements
Current Existing Microarray Databases • Local Installation • AMAD, GeneDirector, mAdb, maxdSQL, NOMAD • Public Queries only • ChipDB, RAD • Public Queries and Local Installation • SMD • Public Data Deposition and Queries • ArrayExpress, GEO, GXD • GeneX and GeNet FOR MORE INFO... Margaret Gardiner-Garden and Timothy G. Littlejohn, A comparison of micoarray databases, Briefings in Bioinformatics, May 2001
MIAME-compliant Systems • Different labs have different needs: lab-centric system is more desirable • MIAME-compliant microarray database systems are still under development • Commerical • GeneTraffic (www.iobion.com) • PARTISAN arrayLIMS (www.clondiag.com) • Rosetta Resolver (www.rosettabio.com) • ……. • OpenSource • GeneX and NOMAD, among others, are still under development to be MIAME-compliant,
Future Development • Establishing MIAME-compliant databases • Different labs continue to develop their own systems • Data exchange format (MAGE-ML) allowing to communicate MIAME information • Microarray data has no central DB yet: distributed data queries and data mining? • HTTP/XML • SOAP (Simple Object Access Protocol) • WDSL(Web Services Description Language) • UDDI (Universal Description, Discovery, and Integration)