1 / 34

MAGE-TAB workshop NCI, January 24, 2008

MAGE-TAB - a simple tab delimited format for describing microarray (and potentially other) experiments in a MIAME compliant way. MAGE-TAB workshop NCI, January 24, 2008. What is needed to describe a microarray (and potentially any) experiment adequately (MIAME)?.

barton
Download Presentation

MAGE-TAB workshop NCI, January 24, 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MAGE-TAB - a simple tab delimited format for describing microarray (and potentially other) experiments in a MIAME compliant way MAGE-TAB workshop NCI, January 24, 2008

  2. What is needed to describe a microarray (and potentially any) experiment adequately (MIAME)? • Description of biological sample (research subject, aliquot – i.e., biomaterial) properties • Description of the assay (e.g., microarray design) • Data from the assays - raw and processed • Material and data processing protocols • Experiment design – which sample went to which assay and produced which data file

  3. How to do this? • Description of biological sample (research subject, aliquot – i.e., biomaterial) properties – a list of properties – free text or ontology entries – table (spreadsheet) • Description of the assay (e.g., microarray design) – a list of features on the array and their properties – sequence, annotation, location – table (spreadsheet) • Data from the assays - raw and processed – files or tables • Material and data processing protocols – free text • Experiment design – which sample went to which assay and produced which data file – a graph (normally a DAG)

  4. 5. Experiment design graph Sample 1 Cy3 Data Hybridisation Sample 2 Cy5

  5. 5. Experiment design graph Protocol (Cy3) Sample 1 (Homo S., Brain Hybridisation (SMD array 1) Data Sample 2 (Homo S., Kidney) Protocol Protocol (Cy5)

  6. Normalisation protocol (P-XMPL-2) Material processing protocols (P-XMPL-1) liver 1 (Homo sap.) Hybridisation 1 (HG_U95A) Data1.cel liver 2 (Homo sap.) Kidney 1 (Homo sap.) Hybridisation 2 (HG_U95A) Data2.cel FGDM.txt Kidney 2 (Homo sap.) Brain 1 (Homo sap.) Hybridisation 3 (HGU_95A) Data3.cel Brain 2 (Homo sap.)

  7. Normalisation protocol (P-XMPL-2) Material processing protocols (P-XMPL-1) liver 1 (Homo sap.) Hybridisation 1 (HG_U95A) Data1.cel liver 2 (Homo sap.) Kidney 1 (Homo sap.) Hybridisation 2 (HG_U95A) Data2.cel FGDM.txt Kidney 2 (Homo sap.) Brain 1 (Homo sap.) Hybridisation 3 (HGU_95A) Data3.cel Brain 2 (Homo sap.)

  8. Important observation • In high throughput experiments the experiment design graphs are • Regular (similar subgraphs repeated many times) • For most nodes there is only small number of incoming and outgoing edge • They can be presented in ‘layers’ in a natural way (in fact any DAG can be represented in layers) • This makes their representation as a spreadsheet simple and natural

  9. B C A W F E G H I 5 4 3 2 1 0 Layers in a more complex DAG

  10. In reality things are often even simpler

  11. ... or even simpler than that

  12. l3 l2 v13 v11 l1 v12 (c113, c ...,) (c121, ..., c12m) (c111, ..., c11n) ... ... ... l3 l2 vk3 vk1 l1 vk2 (c113, ...,) (c121, ..., c12m) (ck11,..., c11n)

  13. Real world examples • Simplified E-TABM-234.sdrf

  14. Ontology usage • Characteristics can be either free text or an ‘ontology entry’ • Ontology entry is identified by a ‘source’ column following it

  15. Elements to describe an experiment • Description of biological sample (research subject, aliquot – i.e., biomaterial) properties – a list of properties – free text or ontology entries – table (spreadsheet) • Description of the assay (e.g., microarray design) – a list of features on the array and their properties – sequence, annotation, location – table (spreadsheet) • Data from the assays - raw and processed – files or tables • Material and data processing protocols – free text • Experiment design – which sample went to which assay and produced which data file – a graph (normally a DAG) – also a spreadsheet!

  16. MAGE-TAB • 1., 5. Sample properties and experiment design - SDRF • 2. Array design 2 - ADF (Array design file) • 4. Protocols – IDF (Investigation design file) • 3. Data files and data matrices

  17. Array Design (ADF) • ADF has been there around since MAGE-ML times as a way to describe an array • A (table) spreadsheet - one row per array feature listing feature coordinates, the sequence, biological annotation, etc

  18. ADF

  19. Investigation design file IDF • Lists general information about the experiment and gives (a free text) description of all the protocols

  20. Data files • Raw data – native formats (cel, genpix) • Normalised, summarised data – columns may be individual references

  21. Normalisation protocol (P-XMPL-2) Material processing protocols (P-XMPL-1) liver 1 (Homo sap.) Hybridisation 1 (HG_U95A) Data1.cel liver 2 (Homo sap.) Kidney 1 (Homo sap.) Hybridisation 2 (HG_U95A) Data2.cel FGDM.txt Kidney 2 (Homo sap.) Brain 1 (Homo sap.) Hybridisation 3 (HGU_95A) Data3.cel Brain 2 (Homo sap.)

  22. FGEM.txt:

  23. Experimental Factors • Most important experimental variables (e.g., organs – liver, kidney, brain – in the examples above) • Any column in the EDF can be marked as an experimental factor • These can be propagated down the edges of the graph to columns in FGEM • In FGEM they serve as concise annotation

  24. MAGE-TAB • Investigation design file – IDF • Array design file ADF • Experiment (sample) design file SDRF • Data files and data matrices (FGEM)

  25. Standard design templates • simple iterated design; • iterated design with technical replicates; • iterated design with pooling; • iterated designs for dual channel experiments; • dual channel iterated designs with dye swap; • dual channel iterated designs with a reference sample; • dual channel iterated design with a reference and dye swap; • dual channel iterated design with a pooled reference; • loop design; • loop design with die swap; • time series experiments; http://www.mged.org/Workgroups/MAGE/mage.html#mage-tab

  26. MAGE-TAB • Any experiment can be represented in MAGE-TAB in a structured way to MIAME granularity • Large experiments with a regular design can be represented in a natural way • It is possible to create MAGE-TAB files using generic spread-sheet software • It is flexible – the granularity of the experiment description can vary

  27. Some points for later discussion • What should be the level of prescription on ID column names (source, sample, extract, ..., summary data)? • For instance, it is very often confusing to me what is source and what is sample • Do we need the concept of Experimental Factors at all? • An alternative could be optional labelling of variable characteristics as ‘intentional’

  28. Acknowledgements • Tim Rayner, Helen Parkinson • Cathy Ball, Don Maier • Philippe Rocca-Sera (ADF) • Michael Miller, Ugis Sarkans, Paul Spellman, Anna Farne • Mohammad Shojatalob – MAGE-TAB export from ArrayExpress • MAGE working group Funding - NHGRI/NIBIB, MGED sponsors

More Related