1 / 65

ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data

ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data. Amy Tang PhD ArrayExpress Production Team Functional Genomics Group EMBL-EBI amytang@ebi.ac.uk. What’s covered this morning?. Why do we need a database for functional genomics data? ArrayExpress databases: Archive

rashad
Download Presentation

ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Amy Tang PhD ArrayExpress Production Team Functional Genomics GroupEMBL-EBI amytang@ebi.ac.uk

  2. What’s covered this morning? • Why do we need a database for functional genomics data? • ArrayExpress databases: • Archive • Gene Expression Atlas • What’s in each database, how to browse, search, interpret, download data • Hands-on exercises • (How to submit data to ArrayExpress?) ArrayExpress

  3. Functaionl genomics (FG) data • The aim of FG is to understand the function of genes and other (non-genic) parts of the genome • Often involved high-throughput technologies (microarrays, high-throughput sequencing [HTS]) • Questions addressed: • Gene expression - when? where? how much? changes? • Gene function - roles of different genes in cellular processes, pathways • Gene regulation - e.g. epigenetic modifications of histones or DNA ArrayExpress

  4. ArrayExpresswww.ebi.ac.uk/arrayexpress • Public repository for functional genomics data (both microarray and sequencing) • Together with GEO at NCBI and CIBEX at DDBJ, serves the scientific community as an archive for data supporting publications • Provides access to curated data in a structured and standardised format. Facilitates the sharing of experimental information • Submissions are curated based on community standards: • MIAME guidelines & MAGE-TAB format for microarray • MINSEQE guidelines & MAGE-TAB format for HTS data ArrayExpress

  5. Community standards for data requirement • MIAME = Minimal Information About a Microarray Experiment (http://www.mged.org/Workgroups/MIAME/miame_2.0.html) • MINSEQE = Minimal Information about a high-throughput Nucleotide SEQuencingExperiment (http://www.mged.org/minseqe) • The checklist: ArrayExpress

  6. What is an experimental factor? • The main variable(s) studied in the experiment • It often is the independent variable of the microarray or HTS experiment. Values of the factor (“factor values”) should vary. • Examples: ArrayExpress

  7. Reporting standards - MAGE-TAB format MAGE-TAB is a simple spreadsheet format that uses a number of different files to capture information about a microarray or sequencing experiment. ArrayExpress

  8. MAGE-TAB Example: IDF ArrayExpress & Atlas

  9. MAGE-TAB Example: SDRF ArrayExpress & Atlas

  10. ArrayExpress – two databases ArrayExpress

  11. What is the difference between them? ArrayExpress Archive • Central object: experiment • Contains both microarray and HTS experiments • Query to retrieve experimental information and associated data Expression Atlas • Central object: gene/condition • Contains data from mainly microarray experiments (HTS coming very soon!) • Query for up/downregulated genes across experiments and across platforms ArrayExpress

  12. ArrayExpress – two databases ArrayExpress

  13. ArrayExpress Archive – when to use it? • Find FG experiments that might be relevant to your research • Download data and re-analyse it yourself. Data deposited in public repositories may shed light on biological questions different from the one asked in the original experiments. • Submit microarray or HTS data that you want to publish. Major journals will require data to be submitted to a public repository like ArrayExpress as part of the peer-review process. ArrayExpress

  14. How much data in AE Archive?(as of September 2012) (up to Sept.) ArrayExpress

  15. HTS data in AE Archive(as of mid-September 2012) Microarray vs HTS RNA-, DNA-, ChIP-seq breakdown

  16. Browsing the AE Archivewww.ebi.ac.uk/arrayexpress ArrayExpress

  17. Browsing the AE Archive The date when the data were loaded in the Archive loaded in Atlas flag Species investigated Number of assays AE unique experiment ID Curated title of experiment Raw sequencing data available in ENA The total number of experiments and assay retrieved The direct link to raw and processed data. An icon indicates that this type of data is available. The list of experiments retrieved can be printed, saved as Tab-delimited format or exported to Excel or as RSS feed ArrayExpress

  18. Browsing the AE Archive ArrayExpress

  19. Experimental factor ontology (EFO)http://www.ebi.ac.uk/efo • An ontology modeling the relationship between experimental factors (EFs) and other data elements • Used in EBI databases: and external projects (e.g. NHGRI GWAS Catalogue) • Combine terms from a subset of well-maintained and compatible ontologies, e.g. Gene Ontology (cellular component + biological process terms), NCBI Taxonomy ArrayExpress

  20. Experimental factor ontology (EFO)http://www.ebi.ac.uk/efo EFO developed to: • increase the richness of annotations in databases • expand on search terms when querying ArrayExpress and Gene Expression Atlas • using synonyms (e.g. “cerebral cortex” = “adult brain cortex”) • using child terms (e.g. “bone”  “rib” and “vertebra”) • promote consistency (e.g. F/female/, 1day/24hours) • facilitate automatic annotation and integration of external data (e.g. changing “gender” to “sex” automatically) ArrayExpress

  21. Building EFO An example Take all experimental factors Find the logical connection between them Organize them in an ontology disease disease sarcoma is the parent term [-] neoplasm disease neoplasm cancer is a type of [-] cancer neoplasm cancer neoplasm is synonym of [-] sarcoma disease sarcoma cancer is a type of [-] Kaposi’s sarcoma Kaposi’s sarcoma Kaposi’s sarcoma sarcoma is a type of ArrayExpress

  22. Exploring EFO An example ArrayExpress

  23. Searching AE ArchiveSimple query • “Auto-complete” with suggestions (like Google search) • Avoid acronyms as search terms • Filter your search results by: • Species of interest • One array design (platform), • molecule (DNA, RNA, protein, etc) • technology (microarray or HTS) ArrayExpress

  24. Searching AE ArchiveSimple query • Search across all fields: • AE accession number e.g. E-MEXP-568 • Secondary accession numbers e.g. GEO series accession GSE5389 • Experiment title, submitter’s experiment description • Submitter's email address • Sample attributes, experimental factor and values, including species (e.g. GeneticModification, Musmusculus, DREB2C over-expression) • Publication title, authors and journal name, PubMed ID • Synonyms for terms are always included in searches e.g. 'human' and 'Homo sapiens’ ArrayExpress

  25. AE Archive query output • Matches to exact terms are highlighted in yellow • Matches to synonyms are highlighted in green • Matches to child terms in the EFO are highlighted in pink

  26. AE Archive – experiment view Experimental factor(s) and its values MIAME or MINSEQE scores show how much the experiment is standard compliant (* = compliant) Link to files available. This varies between sequencing and microarray data. For microarray experiments you also have array design file (ADF) ArrayExpress

  27. SDRF file – sample & data relationship ArrayExpress

  28. Searching AE ArchiveAdvanced query • Combine search terms • Join two or more keywords in the search box with the operators AND, ORorNOT (in CAPS), e.g. brain OR prostate NOT mouse • Search terms of more than one word must be entered inside quotes otherwise only the first word will be searched for, e.g. “kidney cancer” • Specify fields for searches • E.g. Search only for human assays on Agilent microarrays: species: “homo sapiens” AND array:Agilent* * For more details and examples, see http://www.ebi.ac.uk/fg/doc/help/ae_help.html ArrayExpress

  29. Hands-on exercise 1 Find RNA-seq assays studying human prostate adenocarcinoma ArrayExpress

  30. ArrayExpress – two databases ArrayExpress

  31. Expression Atlas – when to use it? Find out if the expression of a gene (or a group of genes with a common gene attribute, e.g. GO term) change(s) across all the experiments available in the Expression Atlas; Discover which genes are differentially expressed in a particular biological condition that you are interested in. Experiments in Archive are curated before being introduced into the Atlas ArrayExpress

  32. Array (platform) designs relating to the experiment must be provided. Probe annotation must be adequate to enable re-annotation of external references (e.g. Ensembl gene ID, Uniprot ID) At least 3 replicates for each value of the experimental factor Maximum 4 experimental factors Adequate sample annotation using EFO terms Presence of data files: CEL raw data files for Affymetrix assays, processed data files for non-Affymetrix ones Expression Atlas constructionExperiment selection criteria during curation ArrayExpress

  33. Expression AtlasconstructionAnalysispipeline Cond.1 Cond.2 Cond.3 A dummy example: genes Cond.1 Cond.2 Cond.3 Input data (Affy CEL, non-Affy processed) Linear model* (Bio/C Limma) Output: 2-D matrix 1= differentially expressed 0 = not differentially expressed * More information about the statistical methodology: http://nar.oxfordjournals.org/content/38/suppl_1/D690.full ArrayExpress

  34. Expression AtlasconstructionAnalysispipeline “Is gene X differentially expressed in condition 1 in this experiment?” = a single expression value for gene X Cond.1 mean Cond.2 mean Mean of all samples Cond.3 mean Compare and calculate statistic ArrayExpress

  35. Exp.1 Cond.1 Cond.2 Cond.3 Statistical test genes Exp. 2 Cond.4 Cond.5 Cond.6 Statistical test genes Cond.X Cond.Y Cond.Z Exp. n Statistical test Each experiment has its own “verdict” or “vote” on whether a gene is differentially expressed or not under a certain condition genes ArrayExpress

  36. Expression Atlas construction Summary of the “verdicts” from different experiments ArrayExpress

  37. Expression Atlas ArrayExpress

  38. Atlas home pagehttp://www.ebi.ac.uk/gxa Restrict query by direction of differential expression Query for conditions Query for genes The ‘advanced query’ option allows building more complex queries ArrayExpress

  39. Atlas home pageThe ‘Genes’ and ‘Conditions’ search boxes Conditions Genes ArrayExpress

  40. Atlas single gene querygene summary page ArrayExpress

  41. Atlas single gene query (cont’d)experiment page ArrayExpress

  42. Atlas single gene querygene summary page – jump to orthologs Orthology comes from EnsemblCompara database ArrayExpress

  43. Atlas single gene querycompare orthologs – heatmap view ArrayExpress

  44. Atlas ‘condition-only’ query ArrayExpress

  45. Atlas ‘condition-only’ query (cont’d)heatmap view ArrayExpress

  46. Atlas gene + condition query ArrayExpress

  47. Atlas query refining (method 1) What if there are no terms in the “REFINE YOUR QUERY” box which fit my biological question? ArrayExpress

  48. Atlas query refining (method 2) ArrayExpress

  49. Atlas query refining (method 2) ArrayExpress

  50. Atlas query refining (method 2) ArrayExpress

More Related