1 / 18

Data Standards for Flow Cytometry

Data Standards for Flow Cytometry. Ryan Brinkman TFL, BC Cancer Research Centre. Why flow cytometry data standards?. Increasing data throughput… 1,000 samples * 6 parameters / day Data (information) processing has to increase Slow Error prone Not standardized

damon
Download Presentation

Data Standards for Flow Cytometry

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Standards for Flow Cytometry Ryan Brinkman TFL, BC Cancer Research Centre

  2. Why flow cytometry data standards? • Increasing data throughput… • 1,000 samples * 6 parameters / day • Data (information) processing has to increase • Slow • Error prone • Not standardized • Most limiting aspect of technology

  3. Our view of the solution • Standards • Exchange data and analyses between software applications & researchers • Allow for a flexible analysis pipeline • Consistent and thorough data annotation • Which sample under what conditions? • High throughput QA/QC • Early identification of problems essential in high throughput flow

  4. Data Standards for Flow Cytometry CollaborativeWeb Space Compendium Translation Graphs Statistics Visualization Rrflowcyt Java DescriptiveVocabulary Object Model Database Schema File Formats Experimental DescriptionChecklist Gating Specification XML UML SQL OWL www.flowcyt.org flowcyt.sf.net

  5. Experimental Checklist • Minimal information to describe a flow cytometry experiment • Experimental overview • Hypothesis, researcher • Description of biomaterial • Genus, species, contact details for sample • Instrument settings • Manufacturer, flow velocity, laser type, power • Data collection and analysis • FCS, sort gates, compensation • Provide enough information to compare/reproduce experiments www.flowcyt.org flowcyt.sf.net

  6. Proposal for FCS4 • Focus FCS on data only • no metadata • list mode, uncompensated • Focus on interoperability with a canonical approach • Single data type • External Data Representation (XDR) • Single precision floating point & big endian • No user defined keywords or segments • $PAR (# of parameters) • $PnN (short name for parameter) • $PnR (range for each parameter) www.flowcyt.org flowcyt.sf.net

  7. So where does all the other (meta)data go? • XML • Gates • Compensation • Transformation • FCS • Data Not how to gate, but record what was gated Software Tool #1 Software Tool #2 42 42

  8. Gating-ML • XML based description of gates • Supported gate types • Rectangular gates (n dimensions) • Polygon gates (2 dimensions) • Polytope gate (n dimensions, convex only) • Ellipsoid gates (n dimensions) • Decision trees • Boolean collections of any of the types of gates www.flowcyt.org flowcyt.sf.net

  9. Outstanding gating issues • As defined, gates are meaningless without FCS files • All gates are sort gates • Data file + filter  Result (unique answer) • No probabilistic descriptions • No concept of a re-useable gate • (e.g., lymphocyte) www.flowcyt.org flowcyt.sf.net

  10. Transformation-ML • XML based description of parameter transformation • Gated using different scales • Data visualizations issues • Predefined transformations • Linear, quadratic, log (base e, 10, or any other), hyperlog, bi-exponential, logicle, split-scale • Support for universal transformation description (MathML) www.flowcyt.org flowcyt.sf.net

  11. FlowRDF • Resource Description Framework (RDF) • W3C standardized methodology on how to provide metadata to virtually “anything” • Based on RDF statements (triplets): • subject, predicate, object. • XML encoded (RDF/XML) • Common reusable concepts • (e.g., Dublin Core) • Direct ontology links • Web Ontology Language (OWL) builds on RDF • Links immutable FCS files to metadata through Life Science Identifier (LSID) www.flowcyt.org flowcyt.sf.net

  12. Ontology A structured vocabulary describing relationships between things • What is an ontology for? • To allow reuse of information across multiple applications • Aid researchers in the collection of metadata surrounding each flow cytometry experiment • To conduct structured queries on elements of flow cytometry experiments

  13. Coordinated ontology effort • OBI (formerly FuGO) • Ontology for Biomedical Investigations • Creating a general standard in which to encode data for functional genomics experiments OBI Biological Materials Hypotheses Protocols Cytometers Gating Flow Cytometry Microarray SNP obi.sf.net

  14. Object Model • Focus of another talk by Josef Spidlen

  15. Implementations: Java & R Tools for data file manipulation and analysisReference implementation of standards • Java • File format translation • Process XML files and output results • Facilitate the exchange of experimental details • R (rflowcyt) • Analyze FCM data using R statistical package • Standard and novel visualizations of data • Automated QA/QC • Implemented as part of BioConductor

  16. Reproducible Research Database Fasdf sl sddslf asd ktrt gkut apw dsakfji jkef fdskfjsio f skdf sdkfj srt erdis f. Msdfosjs sdkf ei dke fwoef kwsfnvnue sdf h eutr eiu fhdksfu sdf ief, it wqp ddk ei fdkf jdfie kcxv dkjfier kfjief. XML <Dataset>BrinkmanLab 123</Dataset> <Gate>123.12</Gate><Select>I..II</Select> Fasdf sl sddslf asd ktrt gkut apw dsakfji jkef fdskfjsio f skdf sdkfj srt erdis f. Msdfosjs sdkf ei dke fwoef kwsfnvnue sdf h eutr eiu fhdksfu sdf ief, it wqp ddk ei fdkf jdfie kcxv dkjfier kfjief. AnalysisTool

  17. Implications of FCM Standards • Promote & reinforce open scientific inquiry • Exchange of data & analyses • Automated, traceable and flexible analyses • Provide dataset larger any single lab set • Mechanism for new discoveries • Facilitate basic and clinical research • Provide dataset larger than any single lab

  18. Acknowledgements • NIH/NIBIB (EB5034) • Michael Ochs • Thomas Moloshok • Robert Gentleman • Clayton Smith • Perry Haaland • Adam Treister • Josef Spidlen • Nolwenn LeMeur • ISAC • IEEE www.flowcyt.org flowcyt.sf.net

More Related