1 / 42

Gene expression data in VectorBase

Gene expression data in VectorBase. Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to EBI, Sanger and ND). Outline. Project goals What’s currently available Current challenges and future plans. Project goals. For vector biologists:

steffi
Download Presentation

Gene expression data in VectorBase

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to EBI, Sanger and ND)

  2. Outline • Project goals • What’s currently available • Current challenges and future plans

  3. Project goals • For vector biologists: • Easy access to gene expression data • consistent data processing • For array specialists: • ArrayExpress submission • Advanced analysis tools • Array annotation

  4. EXPRESSION DATA BULK LOADER STORAGE & ANALYSIS • BASE: BioArray Software Environment • http://base.thep.lu.se/ • Open source, active development and user community • LIMS, data storage, export and analysis • Web-based, user/group access control • BASE 2.x adoption will bring Affy support

  5. Data submission • Community submission guidelines available • First batch of experiments loaded by us • Bulk data loader • Sample/experiment annotation requires intervention from curators

  6. ArrayExpress EXPRESSION DATA BULK LOADER ‘PUBLIC’ STORAGE STORAGE & ANALYSIS • Data held in BASE is largely MIAME compliant • Script for semi-automated export in TAB2MAGE format • One experiment submitted so far

  7. ArrayExpress EXPRESSION DATA BULK LOADER ‘PUBLIC’ STORAGE STORAGE & ANALYSIS

  8. ArrayExpress EXPRESSION DATA BULK LOADER ‘PUBLIC’ STORAGE STORAGE & ANALYSIS DATA SUMMARIES • BASE web interface offers powerful and extendable analysis environment • Can be used for multi-site collaborations on pre-publication data • Steep learning curve/not 100% intuitive • Not easily linked to • We provide simpler views so the casual user can quickly draw biological inferences

  9. Standardised data All displayed data is processed in the same way: • Poor quality spots removed • Currently using submitted spot flags • Normalisation • “lowess” for two-colour experiments

  10. 3 probe types 6 array designs Mapping handled via Ensembl pipeline: Oligo  exonerate PCR  e-PCR cDNA  exonerate2genes ArrayExpress EXPRESSION DATA BULK LOADER PROBE MAPPING ‘PUBLIC’ STORAGE STORAGE & ANALYSIS DATA SUMMARIES

  11. VectorBase ArrayExpress EXPRESSION DATA GENOMIC DATA BULK LOADER PROBE MAPPING AUTOMATIC ANNOTATION GFF3 ‘PUBLIC’ STORAGE STORAGE & ANALYSIS DATA SUMMARIES GENOME BROWSER

  12. contigview

  13. featureview

  14. VectorBase ArrayExpress EXPRESSION DATA GENOMIC DATA BULK LOADER PROBE MAPPING AUTOMATIC ANNOTATION ‘PUBLIC’ STORAGE STORAGE & ANALYSIS DATA SUMMARIES GENOME BROWSER DATA MINING ARRAY BIOLOGISTS GENOME BIOLOGISTS VECTOR BIOLOGISTS

  15. BioMart • Beta version currently available • http://base.vectorbase.org:9999/biomart/martview • Improvements still needed: • experiment annotations • Alignments (i.e. handle split alignments) • Federation with current marts • Integration with new data?

  16. Current challenges and future plans • How do you want to query? • CVs & ontologies • APIs • Community submission • Manual annotation

  17. Querying strategy • What do you want to query on? • Fetch all genes upregulated under condition X • Fetch all experiments with gene X and condition Y • Fetch all probes with expression similar to probe X • All essentially boil down to: • Define probe (genes etc) • Define significant expression • ANOVA? • Up/down-regulation WRT what? • Define experimental conditions • Sample annotation • Experimental design

  18. ArrayExpress EXPRESSION DATA GENOMIC DATA BULK LOADER PROBE MAPPING AUTOMATIC ANNOTATION CV / ONTOLOGY ‘PUBLIC’ STORAGE STORAGE & ANALYSIS DATA SUMMARIES GENOME BROWSER DATA MINING ARRAY BIOLOGISTS GENOME BIOLOGISTS VECTOR BIOLOGISTS

  19. PROBE MAPPING AE API ? e! API ‘PUBLIC’ STORAGE STORAGE & ANALYSIS DATA SUMMARIES GENOME BROWSER MartJ / MQL DATA MINING ArrayExpress EXPRESSION DATA GENOMIC DATA BULK LOADER CV / ONTOLOGY AUTOMATIC ANNOTATION Array API ?

  20. Array API Perl / Java objects for retrieval / handling of array data • Dual purpose: • Consistency & efficiency of VB expression website • Computational access to VB data for all • Objects must be: • General, DB-independent • Compatible with pre-existing Bio API (BioPerl / BioJava) • Nb. May be pre-existing solution: • ArrayExpress API? • BioPerl-Expression? • MAGE-OM-stk • http://neuron.cse.nd.edu/vectorbase/index.php/Array_API_proposal

  21. Community data submission • Carrot? • Help with ArrayExpress submission • Analysis tools • Dissemination • Stick? • Outreach (courses, conferences) • Networking

  22. GE data  manual annotators • Gene-build designed arrays • Negative evidence less compelling • EST clone-based arrays • http://tinyurl.com/vlkwo

  23. Longer term plans • Host-parasite GE data integration & analysis • GE-clusters  “upstream” regions  regulatory elements, upstream TFs • RNAi phenotypes • Images

  24. CVs & ontologies • Integrate MGED and specialist ontologies for • Body parts • Developmental stages • Disease processes • … • Allows comparison across experiments with similar experimental conditions

  25. Most biomarts: Gene-based Mostly ‘binary’ data e.g. a gene either has a signal domain or doesn’t Easily linked with other (gene-based) biomarts VB Biomart: Probe based Many probes not aligned Exp data less clear e.g. define ‘differential expression’ Exports gene/trans IDs for linking to other Marts BioMart

  26. Clustering • A priority? • Easy to do on reporter level within experiments • Harder to do at gene level across all experiments • Binary gene profile: “yes/no differentially expressed in experiment” ? • Amazon-style links to “genes which may have similar expression profiles”?

  27. BASE 2.x • Adoption delayed, now in progress • Brings Affymetrix support • Cleaner/modern interface • Better API (Java)

More Related