1 / 63

ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data

ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data. Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL gabry@ebi.ac.uk. Talk structure. Why do we need a database for functional genomics data? ArrayExpress database Archive Gene Expression Atlas Database content

Download Presentation

ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics TeamEBI-EMBL gabry@ebi.ac.uk

  2. Talk structure • Why do we need a database for functional genomics data? • ArrayExpress database • Archive • Gene Expression Atlas • Database content • Query the database • Data download • Data submission ArrayExpress

  3. Components of a functional genomics experiment • Sample source • Sample treatments • RNA extraction protocol • Labelling protocol • Sample source • Sample treatments • Template preparation Sample Sample Library • Library preparation • Array design information • Location of each element • Description of each element • Hybridization protocol Chip Array • Cluster amplification • Image • Scanning protocol • Software specifications • Sequencing and imaging • Quantification matrix • Software specifications • From images to sequences • Quality Control • Sequence alignment • Assembly • Specific steps depending on the application Raw data • Control array elements • Normalization method Data analysis Normalized data Data analysis

  4. ArrayExpresswww.ebi.ac.uk/arrayexpress/ • Is a public repository for functional genomics data, mostly generated using microarray or high throughput sequencing (HTS) assays • Serves the scientific community as an archive for data supporting publications, together with GEO at NCBI and CIBEX at DDBJ • Provides easy access to well annotated microarray data in a structured and standardized format • Facilitates the sharing of microarray designs and experimental protocols • Based on FGED standards: MIAME checklist, MAGE-TAB format and MO Ontology. • MINSEQE checklist for HTS data (http://www.mged.org/minseqe/) 4 ArrayExpress

  5. Reporting standards for microarraysMIAME checklist Minimal Information About a Microarray Experiment The 6 most critical elements contributing towards MIAME are: Essential sample annotation including experimental factors and their values (e.g. compound and dose) Experimental design including sample data relationships (e.g. which raw data file relates to which sample, ….) Sufficient array annotation (e.g. gene identifiers, genomic coordinates, probe sequences or array catalog number) Essential laboratory and data processing protocols (e.g. normalization method used) Raw data for each hybridization (e.g. CEL or GPR files) Final normalized data for the set of hybridizations in the experiment 5 ArrayExpress

  6. Reporting standards for sequencingMINSEQE checklist Minimal Information about a high-throughput Nucleotide SEQuencing Experiment The proposed guidelines for MINSEQE are (still work in progress): General information about the experiment Essential sample annotation including experimental factors and their values (e.g. compound and dose) Experimental design including sample data relationships (e.g. which raw data file relates to which sample, ….) Essential experimental and data processing protocols Sequence read data with quality scores, raw intensities and processing parameters for the instrument Final processed data for the set of assays in the experiment 6 ArrayExpress

  7. MAGE-TAB is a simple spreadsheet format that uses a number of different files to capture information about a microarray experiment: Reporting standards for microarraysMAGE-TAB format 7 ArrayExpress

  8. Reporting standards What semantics (or ontology) should we use to best describe its annotation? Ontology, which is a formal specification of terms in a particular subject area and the relations among them. Its purpose is to provide a basic, stable and unambiguous description of such terms and relations in order to avoid improper and inconsistent use of the terminology pertaining to a given domain. Thus far, Gene Ontology (GO) has been the most successful ontology initiative. GO is a controlled vocabulary used to describe the biology of a gene product in any organism. 8 ArrayExpress

  9. Reporting standards for microarraysMGED ontology (MO) The MO provides terms for annotating all aspects of a microarray experiment from the design of the experiment and array layout, through to the preparation of the biological sample and the protocols used to hybridize the RNA and analyze the data The MO was developed to provide terms for annotating experiments in line with the MIAME guidelines, i.e. to provide the semantics to describe a microarray experiment according to the concepts specified in MIAME Also check Open Biomedical Ontologies (OBO) initiative (www.obofoundry.org) for the development of life-science ontologies 9 ArrayExpress

  10. ArrayExpress – two databases ArrayExpress

  11. How to query AE and Atlas? AE Archive • Query by experiment, sample and experimental factor annotations • Filter on species, array platform, molecule assayed and technology used Gene Expression Atlas • Gene and/or condition queries • Query across experiments and across platforms ArrayExpress

  12. ArrayExpress – two databases 12 ArrayExpress

  13. How much data in AE Archive? 13 ArrayExpress

  14. Archive by species 14 ArrayExpress

  15. Browsing the AE Archive ArrayExpress

  16. Browsing the AE Archive The date when the data were loaded in the Archive Number of assays Species investigated Curated title of experiment AE unique experiment ID loaded in Atlas flag Raw sequencing data available in ENA The direct link to raw and processed data. An icon indicates that this type of data is available. The total number of experiments and assay retrieved The list of experiments retrieved can be printed, saved as Tab-delimited format or exported to Excel or as RSS feed 16 ArrayExpress

  17. Browsing the AE Archive ArrayExpress

  18. Experimental factor ontology (EFO)http://www.ebi.ac.uk/efo • Application focused ontology modeling experimental factors (EFs) in AE • Developed to: • increase the richness of annotations that are currently made in AE Archive • to promote consistency • to facilitate automatic annotation and integrate external data • EFs are transformed into an ontological representation, forming classes and relationships between those classes • EFO terms map to multiple existing domain specific ontologies, such as the Disease Ontology and Cell Type Ontology 18 ArrayExpress

  19. Experimental factor ontology (EFO)An example ArrayExpress & Atlas

  20. Searching AE ArchiveSimple query - EFO 20 ArrayExpress

  21. Searching AE ArchiveSimple query • Search across all fields: • AE accession number e.g. E-MEXP-568 • Secondary accession numbers e.g. GEO series accession GSE5389 • Experiment name • Submitter's experiment description • Sample attributes, experimental factor and values, including species (e.g. GeneticModification, Mus musculus, DREB2C over-expression) • Publication title, authors and journal name, PubMed ID • Synonyms for terms are always included in searches e.g. 'human' and 'Homo sapiens’ 21 ArrayExpress

  22. AE Archive query output • Matches to exact terms are highlighted in yellow • Matches to synonyms are highlighted in green • Matches to child terms in the EFO are highlighted in pink

  23. AE Archive – experiment view 23 ArrayExpress

  24. Samples Sample annotation Genes Gene expression levels or count level data Gene annotations How does processed data look? 24 ArrayExpress

  25. AE Archive – SDRF file 25 ArrayExpress

  26. SDRF file – sample & data relationship 26 ArrayExpress

  27. AE Archive – ADF file 27 ArrayExpress

  28. AE Archive – Old interface 28 ArrayExpress

  29. AE Archive – all files 29 ArrayExpress

  30. AE Archive – all files 30 ArrayExpress

  31. Searching AE ArchiveAdvanced query • Combine search terms • Enter two or more keywords in the search box with the operators AND, OR or NOT. AND is the default search term; a search for kidney cancer' will return hits with a match to ‘kidney' AND ‘cancer’ • Search terms of more than one word must be entered inside quotes otherwise only the first word will be searched for. E.g. “kidney cancer” • Specify fields for searches • Particular fields for searching can also be specified in the format of fieldname:value 31 ArrayExpress

  32. Searching AE ArchiveAdvanced query - fieldnames 32 ArrayExpress

  33. Searching AE ArchiveAdvanced query • Filtering experiments by counts of a particular attribute • Experiments fulfilling certain count criteria can also be searched for e.g. having more than 10 assays (hybridizations) 33 ArrayExpress

  34. Searching AE ArchiveAdvanced query – an example ArrayExpress & Atlas

  35. Exercise 1 ArrayExpress

  36. ArrayExpress – two databases 36 ArrayExpress

  37. The criteria we use for selecting experiments for inclusion in the Atlas are as follows: Array designs relating to experiment must be provided to enable re-annotation using Ensembl or Uniprot (or have the potential for this to be done) High MIAME scores Experiment must have 6 or more hybridizations Sufficient replication and large sample size EF and EFV must be well annotated Adequate sample annotation must be provided Processed data must be provided or raw data which can be renormalized must be available Gene Expression AtlasExperiment selection criteria 37 ArrayExpress

  38. Gene Expression AtlasAtlas construction • New meta-analytical tool for searching gene expression profiles across experiments in AE • Data is taken as normalized by the submitter • Gene-wise linear models (limma) and t-statistics are applied to calculate the strength of genes’ differential expression across conditions across experiments • The result is a two-dimensional matrix where rows correspond to genes and columns correspond to conditions, rather than samples. • The matrix entries are p-values together with a sign, indicating the significance and direction of differential expression ArrayExpress

  39. Gene Expression AtlasAtlas construction ArrayExpress

  40. Gene Expression AtlasAtlas construction • up-regulated • down-regulated •  no change

  41. Gene Expression Atlas ArrayExpress

  42. Atlas home pagehttp://www.ebi.ac.uk/gxa/ Restrict search by direction of differential expression Query for gene(s) Query for condition(s) The ‘advanced search’ option allows building more complex queries ArrayExpress

  43. Atlas home pageThe ‘Genes’ search box & auto-complete function 43 ArrayExpress

  44. Atlas home pageThe ‘Conditions’ search box & ontology browsing 44 ArrayExpress

  45. Atlas home pageA single gene query 45 ArrayExpress

  46. Atlas gene summary page 46 ArrayExpress

  47. Atlas experiment page Experimental factors list Expression plot Table containing gene information and drop down menus for searching within the experiment 47 ArrayExpress

  48. Atlas experiment page – HTS data ArrayExpress & Atlas

  49. Atlas home pageA ‘Conditions’ only query ArrayExpress & Atlas

  50. Atlas heatmap view ArrayExpress

More Related