320 likes | 330 Views
Explore the importance of standards in biological computing, including semantics, scaleability, and standard tools like ExperiBase. Learn about data interchange, ontologies, and schema for effective experimental biology.
E N D
A Unique Opportunity in Biological Information StandardsC. Forbes Dewey, Jr.Massachusetts Institute of Technology ExperiBase
Query ? Experiments - K p Databases + K p Interpretation + + Models K B AD - K B AT K B AT - K B AD 0.6 + 0.5 K PAD K A - D G+ * x 0.4 K D - A K PAD G - x K cell speed (microns/min.) K mD - A - 0.3 mA - D F+ K PAT - * K b 0.2 + + K b K PAT F - bovine endothelium 0.1 * human melanoma x mouse fibroblast 0 0 0.4 1 0.2 0.6 0.8 polymer fraction Our view of experimental biology
Driving issues in experimental biological computing • Large data sets Terabytes in every lab Petabytes at national labs • Large calculations Petaflop level computing for days • Time is critical Biologists want infrastructure yesterday • Interchange is crucial Unshared data is unused data We need standards
Keys to biological computing standards • Semantics Investigators can agree on meaning Ontologies for standardizing meaning Curation of ontologies – the LSID • Schema Share schema and concepts • Scaleability The ability to scale to larger problems in the future • Standard tools Ontologies and schema for storage and query Possibility to write reusable software!!!
ExperiBase • Based on ontology standards • Conceptual consistency between different experimental methods • Reuse of concepts between different experimental methods • Portable platform independent of OS “DICOM for Biology”
ExperiBase top-level design Study Plan Administration Most “silo” applications Experiment High Level Analysis Sample
Supported Object Models for Experimental Biology • Gel Electrophoresis • Western Blot • 1D Gel • 2D Gel • Flow Cytometry / FACS • Microarray Experiments • Mass Spectrometry • Microscope Images ………….…………..HUPo ..…CytometryML BASE, MAGE-OM …………..…..HUPo ..……………..OME Complete In progress Preliminary
Cell suspension Treated Cell Fluorescence detector Analysis Display Data Storage Dichroic mirror Computer Flow cell Side Scatter Forward scatter Laser Lens (typ) Sample (Cell) Sample Treatment Binding Species Reactive Func. Hardware (Parts Info) Parameter Detector Beam-Splitter Emission-Filter Amplifier Light-Source Excitation-Filter Settings Experiment Description Protocol Data File (FCS) Method Meta Data Histogram Dot Plot Density Plot Contour Plot FACS Experiments
CytometryML--Robert C. Leif, Suzanne B. Leif, et al., XML_Med, a Division of Newport Instruments
FACS IOD Ref: Leif, Leif, and Leif, Cytometry 54A 56-65 (2003)
Separation of data from analysis Gel electrophoresis example Database Image analyzed Analysis saved with object
MicroArray IOD --Based on Stanford Microarray Database
Microscope Image IOD Converted from OME
ExperiBase XML Object-Relational Database Schema <?xml version="1.0" encoding="UTF-8"?> <params:Parameter xmlns:params="parameters.xsd" xsi:schemaLocation="parameters.xsd"> <Dectector_Info> <Detector>PMT</Detector> <Detector_Setting>600</Detector_Setting> <Detector_Units Prefix="none" Si_Unit_Name="volt"/> <Measurement>Flourescence</Measurement> <Beam_Splitter_Info Prefix="nano" Unit="meter"> <Beam_Splitter>Dichroic_Reflect_Low</Beam_Splitter> <Low_Cut_Off_1>505</Low_Cut_Off_1> <Description>505DRLP</Description> <Item_General_Info> <Manufacturer>Omega Optical</Manufacturer> <Model_Name>XF2010</Model_Name> </Item_General_Info> </Beam_Splitter_Info> <Emission_Filter_Info Prefix="nano" Unit="meter"> <Emission_Filter>Band_Block</Emission_Filter> <Band_Width_Location>unknown</Band_Width_Location> <Peak_1>535</Peak_1> <Band_Width_1>45</Band_Width_1> <Description>535AF45</Description> <Item_General_Info> <Manufacturer>Omega Optical</Manufacturer> <Model_Name>XF3084</Model_Name> </Item_General_Info> </Emission_Filter_Info> </Dectector_Info> </params:Parameter> CREATE TYPE detector_desc_t UNDER detector_info_t AS (detector varchar(64), detector_setting real, detector_unit_pref REF(unit_prefix_t), detector_unit REF(unit_t), measurement varchar(64)) MODE DB2SQL; CREATE TYPE beam_splitter_t UNDER detector_info_t AS (beam_splitter varchar(64), low_cut_off_1 real, high_cut_off_1 real, low_cut_off_2 real, high_cut_off_2 real, low_cut_off_3 real, high_cut_off_3 real, unit_prefix REF(unit_prefix_t), unit REF(unit_t), description varchar(64), item_info REF(item_info_t)) MODE DB2SQL; XML Schema <?xml version="1.0" encoding="UTF-8"?> <params:Parameter xmlns:params="parameters.xsd" xsi:schemaLocation="parameters.xsd"> <Dectector_Info> <Detector>PMT</Detector> <Detector_Setting>600</Detector_Setting> <Detector_Units Prefix="none" Si_Unit_Name="volt"/> <Measurement>Flourescence</Measurement> <Beam_Splitter_Info Prefix="nano" Unit="meter"> <Beam_Splitter>Dichroic_Reflect_Low</Beam_Splitter> <Low_Cut_Off_1>505</Low_Cut_Off_1> <Description>505DRLP</Description> <Item_General_Info> <Manufacturer>Omega Optical</Manufacturer> <Model_Name>XF2010</Model_Name> </Item_General_Info> </Beam_Splitter_Info> <Emission_Filter_Info Prefix="nano" Unit="meter"> <Emission_Filter>Band_Block</Emission_Filter> <Band_Width_Location>unknown</Band_Width_Location> <Peak_1>535</Peak_1> <Band_Width_1>45</Band_Width_1> <Description>535AF45</Description> <Item_General_Info> <Manufacturer>Omega Optical</Manufacturer> <Model_Name>XF3084</Model_Name> </Item_General_Info> </Emission_Filter_Info> </Dectector_Info> </params:Parameter> XML Document
Recommendations and implementation • Consensus on ontological standards • LSID • OWL • Backing of major players • Industry • Government • International • Semantic Web • Use RDF to represent data in ExperiBase and make the data available through web services • Use OWL for a collaborative semantic network
Ubiquitous Networked Biological Computing Sponsored by a continuing grant from DOE (PNNL) Additional sponsorship by the NIH and DARPA Put your company logo here
The informaticscollaborators Aidan Downes Howard Chou Shiva Ayyadurai Ngon Dao Pat McCormack Jeannette Stephenson Catherine Howell Shixin Zhang Ben Fu
Data integration today • Database federation and distributed intelligence • Correlation of data in disparate databases • Archiving and analysis of derived data • Integration of higher-level analyses • Imaging and image analysis • Multiple-protein interactions
Open Microscopy Environment (OME)http://openmicroscopy.org/index.html • The Open Microscopy Project (OME) is an open source software project to develop a database-driven system for the quantitative analysis of biological images. • Founders: Ilya Goldberg (MIT/NIH), Jason Swedlow (Welcome Trust Biocentre- Dundee), and Peter Sorger (MIT)
General transformation process Experiment Data File Data Transformer ExperiBase Data Description File
MIAMExpress transformation ExperiBase Specific Component Request Dispatcher ExperiBase Translator MiamExpress Storage Database
Storage Database ExperiBase Translator Translator MAGE-ML MAGE-ML ArrayExpress Feeding ArrayExpress
Typical user page: Pacific Northwest National Laboratory ExperiBase
Web Pages http://schiele.mit.edu:8080/ExperiBase/