470 likes | 645 Views
MIAME (Minimum Information About a Microarray Experiment). MIAME describes the Minimum Information About a Microarray Experiment that is needed to enable the interpretation of the results of the experiment unambiguously and potentially to reproduce the experiment. [Brazma et al, Nature Genetics] . M
E N D
1. Introduction and Applications of Microarray Databases Chen-hsiung Chan
Department of Computer Science and Information Engineering
National Taiwan University
2. MIAME (Minimum Information About a Microarray Experiment) MIAME describes the Minimum Information About a Microarray Experiment that is needed to enable the interpretation of the results of the experiment unambiguously and potentially to reproduce the experiment. [Brazma et al, Nature Genetics]
3. MIAME raw data (CEL or GPR files)
final processed (normalized) data
essential sample annotation including experimental factors and their values
experimental design including sample data relationships
sufficient annotation of the array
essential laboratory and data processing protocols
4. Databases using MIAME ArrayExpress at EBI
GEO at NCBI
CIBEX at DDBJ
5. ArrayExpress http://www.ebi.ac.uk/microarray-as/aer/
Stores transcriptomics and related data
Data warehouse stores gene indexed expression profiles
In accordance with MGED recommendations: MIAME
7. ArrayExpress statistics Experiment repository: 2,914 experiments (each with at least 6 microarrays) and growing
Expression profiles: including 267 experiments, 121,891 genes
Data warehouse updated everyday
8. Searching ArrayExpress Keywords: breast cancer, cell cycle, … etc.
Accession numbers: E-XXXX-d, e.g. E-AFFY-1281, E-TIGR-372, … etc.
Secondary accession numbers: GEO accession, e.g. GSE5389.
Species names mainly in Latin names (e.g. Homo sapiens), common names may be used as well (e.g. human).
10. ArrayExpress interface
12. ArrayExpress Search/Browse ResultKeyword: lung cancer
13. ArrayExpress Search/Browse ResultDetailed view
19. Expression Profile results Thumbnail view
BigPlot view
Gene ranking (most differentially expressed experiments are top ranked)
Similarity search: search genes with similar expression levels
23. Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/
Gene expression/molecular abundance repository
MIAME compliant
Supports browsing, query and retrieval
25. GEO record types Platform
Sample
Series
DataSet
Profile
26. GEO Platform Platform record defines the list of elements that may be detected and quantified in that experiment (e.g., cDNAs, oligonucleotide probesets)
Each Platform record is assigned a unique and stable GEO accession number (GPLxxx)
A Platform may reference many Samples that have been submitted by multiple submitters
27. GEO Sample Sample record describes the conditions under which an individual Sample was handled, the manipulations it underwent, and the abundance measurement of each element derived from it
Each Sample record is assigned a unique and stable GEO accession number (GSMxxx)
A Sample entity must reference only one Platform and may be included in multiple Series
29. GEO Series A Series record links together a group of related Samples and provides a focal point and description of the whole study
Series records may also contain tables describing extracted data, summary conclusions, or analyses
Each Series record is assigned a unique and stable GEO accession number (GSExxx)
31. GEO DataSet Assembled in NCBI
Samples are all equivalently measured and normalized
Can be viewed and analyzed with NCBI’s advanced data display and analysis tool
33. GEO Profile Profile consists of the expression measurements for an individual gene across all Samples in a DataSet
Profiles can be searched using Entrez GEO Profiles
Similar to Expression Profile in ArrayExpress
36. SOFT (Simple Omnibus Format in Text) Text based
Line based
Easily parsed with text processing languages, including Perl, Python, Ruby, PHP, … etc.
39. Network Biology Visualization and Analysis
40. Cytoscape Open source network visualization and analysis software
‘Core’ features include network layout and query, also integrate visualizations with state data
Can be extended by plugins
41. Cytoscape developers University of California at San Diego (Trey Ideker)
Institute for Systems Biology (Leroy Hood)
Memorial Sloan-Kettering Cancer Center (Chris Sander)
Institut Pasteur (Benno Schwikowski)
Agilent Technologies (Annette Adler)
University of California at San Francisco (Bruce Conklin)
42. Cytoscape A java application
Require Java 5 or 6 (JDK5/6 or JRE5/6)
44. Simple Interaction Format (SIF) Each line denotes one interactionInteractorA xx Interactor B
‘xx’ are interaction types:
pp: protein-protein interaction
pd: protein-DNA interaction (transcription factor/regulation)
pr (protein-reaction), rc (reaction-compound), cr (compound-reaction), gl (genetic-lethal), pm (protein-metabolite), mp (metabolite-protein)
45. Other interaction formats supported GML
XGMML
SBML
BioPAX
PSI-MI
Tab-delimited text table and excel
47. Applications of Gene Expression Gene selection (differentially expressed genes)
State annotation in networks (expression level)
Gene regulatory network identification