1 / 55

ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data

ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data. Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL gabry@ebi.ac.uk. Talk structure. Why do we need a database for functional genomics data? ArrayExpress database Archive Gene Expression Atlas

mauli
Download Presentation

ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics TeamEBI-EMBL gabry@ebi.ac.uk

  2. Talk structure • Why do we need a database for functional genomics data? • ArrayExpress database • Archive • Gene Expression Atlas • ArrayExpress content • How to query the database • How to download data • How to submit data ArrayExpress

  3. What is functional genomics (FG)? The aim of FG is to understand the function of genes and other parts of the genome FG experiments typically utilize genome-wide assays to measure and track many genes (or proteins) in parallel under different conditions High-throughput technologies such as microarrays and high-throughput sequencing (HTS) are frequently used in this field to interrogate the transcriptome ArrayExpress

  4. What biological questions is FG addressing? When and where are genes expressed? How do gene expression levels differ in various cell types and states? What are the functional roles of different genes and in what cellular processes do they participate? How are genes regulated? How do genes and gene products interact? How is gene expression changed in various diseases or following a treatment? ArrayExpress

  5. Components of a FG experiment ArrayExpress

  6. ArrayExpresswww.ebi.ac.uk/arrayexpress/ • Is a public repository for FG data, which provides easy access to well annotated data in a structured and standardized format • Serves the scientific community as an archive for data supporting publications, together with GEO at NCBI and CIBEX at DDBJ • Facilitates the sharing of experimental information associated with the data such as microarray designs, experimental protocols,…… • Based on community standards: MIAME guidelines & MAGE-TAB format for microarray, MINSEQE guidelines for HTS data (http://www.mged.org/minseqe/) ArrayExpress

  7. Reporting standards for microarraysMIAME checklist Minimal Information About a Microarray Experiment The 6 most critical elements contributing towards MIAME are: Essential sample annotation including experimental factors and their values (e.g. compound and dose) Experimental design including sample data relationships (e.g. which raw data file relates to which sample, ….) Sufficient array annotation (e.g. gene identifiers, genomic coordinates, probe sequences or array catalog number) Essential laboratory and data processing protocols (e.g. normalization method used) Raw data for each hybridization (e.g. CEL or GPR files) Final normalized data for the set of hybridizations in the experiment ArrayExpress

  8. Reporting standards for sequencingMINSEQE checklist Minimal Information about a high-throughput Nucleotide SEQuencing Experiment The proposed guidelines for MINSEQE are (still work in progress): General information about the experiment Essential sample annotation including experimental factors and their values (e.g. compound and dose) Experimental design including sample data relationships (e.g. which raw data file relates to which sample, ….) Essential experimental and data processing protocols Sequence read data with quality scores, raw intensities and processing parameters for the instrument Final processed data for the set of assays in the experiment ArrayExpress

  9. MAGE-TAB is a simple spreadsheet format that uses a number of different files to capture information about a microarray experiment. We now adapted it to handle HTS data: Reporting standards for microarraysMAGE-TAB format ArrayExpress

  10. ArrayExpress – two databases ArrayExpress

  11. What is the difference between them? ArrayExpress ArrayExpress Archive • Central object: experiment • Query to retrieve experimental information and associated data Expression Atlas • Central object: gene/condition • Query for gene expression changes across experiments and across platforms

  12. ArrayExpress – two databases ArrayExpress

  13. ArrayExpress Archive – when to use it? • Find FG experiments that might be relevant to your research • Download data and re-analyze it. Often data deposited in public repositories can be used to answer different biological questions from the one asked in the original experiments. • Submit microarray or HTS data that you want to publish. Major journals will require data to be submitted to a public repository like ArrayExpress as part of the peer-review process. ArrayExpress

  14. How much data in AE Archive? ArrayExpress

  15. HTS data in AE Archive

  16. Browsing the AE Archive ArrayExpress

  17. Browsing the AE Archive The date when the data were loaded in the Archive Number of assays Species investigated Curated title of experiment AE unique experiment ID loaded in Atlas flag Raw sequencing data available in ENA The direct link to raw and processed data. An icon indicates that this type of data is available. The total number of experiments and assay retrieved The list of experiments retrieved can be printed, saved as Tab-delimited format or exported to Excel or as RSS feed ArrayExpress

  18. Browsing the AE Archive ArrayExpress

  19. Experimental factor ontology (EFO)http://www.ebi.ac.uk/efo • Application focused ontology modeling experimental factors (EFs) in AE – selected by default • Developed to: • increase the richness of annotations that are currently made in AE Archive • to promote consistency • to facilitate automatic annotation and integrate external data • EFs are transformed into an ontological representation, forming classes and relationships between those classes • EFO terms map to multiple existing domain specific ontologies, such as the Disease Ontology and Cell Type Ontology ArrayExpress

  20. Building EFO An example Take all experimental factors Find the logical connection between them Organize them in an ontology disease disease sarcoma is the parent term [-] neoplasm disease neoplasm cancer is a type of [-] cancer neoplasm cancer neoplasm is synonym of [-] sarcoma disease sarcoma cancer is a type of [-] Kaposi’s sarcoma Kaposi’s sarcoma Kaposi’s sarcoma sarcoma is a type of ArrayExpress

  21. Exploring EFO An example ArrayExpress

  22. Searching AE ArchiveSimple query ArrayExpress

  23. Searching AE ArchiveSimple query • Search across all fields: • AE accession number e.g. E-MEXP-568 • Secondary accession numbers e.g. GEO series accession GSE5389 • Experiment name • Submitter's experiment description • Sample attributes, experimental factor and values, including species (e.g. GeneticModification, Mus musculus, DREB2C over-expression) • Publication title, authors and journal name, PubMed ID • Synonyms for terms are always included in searches e.g. 'human' and 'Homo sapiens’ ArrayExpress

  24. AE Archive query output • Matches to exact terms are highlighted in yellow • Matches to synonyms are highlighted in green • Matches to child terms in the EFO are highlighted in pink

  25. AE Archive – experiment view MIAME or MINSEQE scores show how much the experiment is standard compliant Link to files available. This varies between sequencing and microarray data. For microarray experiments you also have array design file and you can view a graph of the experimental design Experimental factor(s) and its values ArrayExpress

  26. AE Archive – SDRF file ArrayExpress

  27. SDRF file – sample & data relationship ArrayExpress

  28. Searching AE ArchiveAdvanced query • Combine search terms • Enter two or more keywords in the search box with the operators AND, OR or NOT. AND is the default search term; a search for kidney cancer' will return hits with a match to ‘kidney' AND ‘cancer’ • Search terms of more than one word must be entered inside quotes otherwise only the first word will be searched for. E.g. “kidney cancer” • Specify fields for searches • Particular fields for searching can also be specified in the format of fieldname:value For more info see http://www.ebi.ac.uk/fg/doc/help/ae_help.html ArrayExpress

  29. Searching AE ArchiveAdvanced query – examples ArrayExpress

  30. ArrayExpress – two databases ArrayExpress

  31. Expression Atlas – when to use it? Find out if the expression of a gene (or a group of genes with a common gene attribute, e.g. GO term) change(s) across all the experiments available in the Expression Atlas; Discover which genes are differentially expressed in a particular biological condition that you are interested in. ArrayExpress

  32. The criteria we use for selecting experiments for inclusion in the Atlas are as follows: Array designs relating to experiment must be provided to enable re-annotation using Ensembl or Uniprot (or have the potential for this to be done) High MIAME scores Experiment must have 6 or more hybridizations Sufficient replication and large sample size EF and EFV must be well annotated Adequate sample annotation must be provided Processed data must be provided or raw data which can be renormalized must be available Expression Atlas constructionExperiment selection criteria ArrayExpress

  33. Expression AtlasconstructionAnalysispipeline • New meta-analytical tool for searching gene expression profiles across experiments in AE • Data is taken as normalized by the submitter • Gene-wise linear models (limma) and t-statistics are applied to calculate the strength of genes’ differential expression across conditions across experiments • The result is a two-dimensional matrix where rows correspond to genes and columns correspond to conditions, rather than samples. • The matrix entries are p-values together with a sign, indicating the significance and direction of differential expression ArrayExpress

  34. Expression Atlas construction ArrayExpress

  35. Expression Atlas construction ArrayExpress

  36. Expression Atlas ArrayExpress

  37. Atlas home pagehttp://www.ebi.ac.uk/gxa/ Restrict query by direction of differential expression Query for genes Query for conditions The ‘advanced query’ option allows building more complex queries ArrayExpress

  38. Atlas home pageThe ‘Genes’ and ‘Conditions’ search boxes ArrayExpress

  39. Atlas home pageA single gene query ArrayExpress

  40. Atlas gene summary page ArrayExpress

  41. Atlas experiment page ArrayExpress

  42. Atlas experiment page – HTS data ArrayExpress

  43. Atlas home pageA ‘Conditions’ only query ArrayExpress

  44. Atlas heatmap view ArrayExpress

  45. Atlas gene-condition query ArrayExpress

  46. Atlas query refining ArrayExpress

  47. Atlas gene-condition query ArrayExpress

  48. Atlas query refining ArrayExpress

  49. Atlas query refining ArrayExpress

  50. Data submission to AE ArrayExpress

More Related