280 likes | 431 Views
SBML Level 2 Version 2 and Beyond. Andrew Finney Physiomics PLC, UK Mike Hucka California Institute of Technology. Overview. Two parts New Features in SBML Level 2 Version 2 Proposed Features for SBML Level 3 and beyond This is NOT a tutorial on SBML assumes basic knowledge of
E N D
SBML Level 2 Version 2and Beyond Andrew Finney Physiomics PLC, UK Mike Hucka California Institute of Technology
Overview • Two parts • New Features in SBML Level 2 Version 2 • Proposed Features for SBML Level 3 and beyond • This is NOT a tutorial on SBML • assumes basic knowledge of • SBML Level 2 Version 1
List of Major Changes inSBML L2V2 • Application Software Specific Annotations • One element per XML namespace per annotation element • Interoperable Annotations • sboTerm field for relating SBML elements to Systems Biology Ontology (SBO) • MIRIAM based scheme for relating SBML elements to Biochemical concepts • Units • Removal of offset from Unit Definitions • Abstraction • CompartmentType • SpeciesType • Mathematical Features • Constraints • Initial Assignment Structures • Reaction Symbols • Ordering of Assignments • Use of Id Attribute on SimpleSpeciesReference
Single Element per XML Namespace per Annotation Structure • In L2V1 round-trip editor support of annotation elements was problematic • Editor supposed to preserve existing elements within annotation elements • Editor only adds and removes elements in namespaces associated with the editor • Issues • Should the ordering of elements be preserved? • Where should new elements be inserted? • Solution • Simplify editors task by constraining the structure of the annotation elements: • Only one top level element per namespace per annotation • Order of top level elements is not significant • Content of top level elements is only relevant to associated applications. • Top level elements can’t be in SBML namespaces
Motivation for Interoperable Annotations • Existing parallel ad-hoc annotation of SBML with controlled vocabulary (CV) terms • Annotation important to effectiveness of applications • e.g. performance (hard coding of rate laws) • e.g. graphical representation in editors • Annotation interoperability extremely limited • Essential for Future Applications of SBML • Example: Development of model databases • Requires CV annotation for effective queries • Requires standard as databases would like to become repositories of models from a wide range of models created by a range of software applications over a long period of time • Example: Development of model manipulation software
Why Annotations? • Why not develop these CVs as part of SBML Standard? • For many biological concepts there exist good CVs • CVs will evolve faster than the core mathematical elements of SBML • Many tools are not going to care about CV term annotations • CVs have a more constrained form that can support rapid change more robustly than a classic object oriented scheme like SBML • CVs can be supported by libSBML and other software via a single generic API rather than large set of new classes
Two Types of Interoperable Annotations • Reference to Systems Biology Ontology (SBO) • see Mike Hucka’s SBO talk • Reference to Existing Controlled Vocabularies • see my MIRIAM talk
Annotation with SBO terms: Semantics • The thing encoded by the SBML object is an instance of the class defined by the referenced SBO term • SBO terms that are referenced by particular SBML classes is restricted • Example: the KineticLaw in reaction R1 is a first-order irreversible mass action rate law • sboTerm never overrides the SBML mathematical representation • Can be ignored by analysis tools • VERY BAD practice to annotate with an SBO term that contradicts SBML encoded mathematical representation • Key aspect of model curation
Issues with sboTerm field • Pros • Very simple syntax and semantics • Covers many of the CV use cases of existing applications • Does preclude other schemes including scheme for referencing biological and biochemical resources • Cons • SBML structure can only be associated with one SBO term • SBO is constrained to support this • Only supports ‘is a’ relationship • SBO is constrained to support this
Removal of offset field from Unit structures • SBML team found it impossible to arrive at an scheme for the conversion of UnitDefinition structures that contained offset field values without more complex semantic restrictions • offset only makes sense for temperature conversions • Simplest solution is to exclude the offset field • offset is not deprecated • small backwards incompatibility between L2V2 and L2V1
CompartmentType • CompartmentType structures allow the formal indication that a set of Compartment structures are of the same type • In this proposal the model class is extended to have a list of compartment types • CompartmentType structures are entirely optional • CompartmentType structures do not affect any representation of the model dynamics • CompartmentType structures still used to declare variables • Compartment structures can optionally refer to CompartmentType structures
SpeciesType • SpeciesType structures allow the formal indication that species located in more than one compartment are in fact pools of the same chemical entity type • In this proposal the model class is extended to have a list of species types • SpeciesType structures are entirely optional • SpeciesType structures do not affect any representation of the model dynamics • Species structures still used to declare variables • Species structures can optionally refer to SpeciesType structures
Constraint - Concept • Concept • Define constraints that enable the detection of internal inconsistencies in a model and/or external perturbations of variables and parameters which render a model invalid. • Simply math expressions that are either true or false given some subset of variables and constant defined in the model. • Constraints are not structured to facilitate the definition of the time course behaviour of the modelled system but may faciliate other types of analyses e.g. flux balance analysis. • Requirement • Constraints must be clearly separated from other structures that are used directly in defining simulation/time course behaviour. • To facilitate the use of constraints as ‘assertions’ an error string can be optionally associated with the constraints. • Example • Define quantitatively the assumption in a rate law that the product concentration is much lower than that of the enzyme • If the product concentration becomes large enough to render the rate law invalid during a simulation the simulator can notify the user
Constraint Structure • Model has an optional list of Constraint structures • A Constraint structure has • mandatory math boolean MathML field • optional message HTML field • No other universal semantic rules • Semantics only defined loosely in simulation • when the constraint becomes false • the message may be displayed to the user • ideally the simulation time of violation would be reported • Other analyses may be driven by the constraints
InitialAssignment • Allows the calculation of the initial value of a symbol from the initial value of other symbols • symbol may be constant or variable • Model contains optional list of InitialAssignment structures • InitialAssignment structure contains • mandatory symbol Sid field • mandatory math MathML field • must return numeric result – not boolean! • Semantic Rules • symbol field must contain the value of a species, compartment or parameter id field • only executed once with the results applied at t=0 • any initial value on symbol declaring structure should be ignored • other constraints described later in presentation
Reaction Symbols • The id value of a reaction structure can be used as a symbol value in MathML expressions • Represents the flux of the reaction • the direct result of a rate law • substance / time units • Cannot assign a value to the symbol • e.g. can’t occur in variable field of an AssignmentRule • Other Semantic Rules discussed later
Constraints on Assignments • KineticLaw, InitialAssignment and AssignmentRule structures form a set of assignment statements • AssignmentRule structures are not longer constrained to be in topological order • Interaction with reaction assignments can’t constrain algebraic loops • Assignment statement set must not contain algebraic loops • Numerical analysis software will need to topologically sort assignment statements • Not that difficult actually!
id on SimpleSpeciesReference • Create new id attribute on simpleSpeciesReference • Has type SId • Required by diagram layout proposal • Value is unique amongst all id values declared at the global level • Optional name attribute of type string as well • In addition to existingspecies attribute that refers to a species
Proposals for SBML Development • Model Composition • Arrays • Sets • Multicomponent Species • Diagram Layout • Controlled Vocabularies • Assertions • Other ad-hoc features • See www.sbml.org Wiki and Forums for documentation
Model Composition Proposals • Proposals from • Martin Ginkel, MPI Magdeburg • Jonathan Webb, BBN • Andrew Finney • Common idea: compose larger models from smaller ones • Model contains • Submodel definitions (or at least references to them) • Instances of submodels • Arbitrary links between objects inside instances and objects in enclosing model • Links are directional and define attribute overload • Direct links allow objects to refer directly to each other • Issues • Do we need interfaces? • Should language define legality of links? • Obviously can only link objects of same type • Are XML standards, such as XLink, appropriate? • Do we need to support arbitrary depth links? • What are the semantics of links? • Reference Implementation under development
Model CompositionExample of Direct Links Instance A of Model Z Instance B of Model Z i f f g g h h
Model CompositionExample of Direct Links Flattened i a/f b/f b/g a/g b/h a/h
Model CompositionExample of Link Overloading Instance A of Model Z Instance B of Model Z i f f g g h h
Model CompositionExample of Link Overloading Flattened i a/f b/g a/g a/h
Multicomponent Species Proposal: Limitations of SBML Level 2 • Species represents • single state • in a specific compartment • Species states have to be enumerated • Reactions are specific to a compartment • Composition of species from components not represented, thus for example • Bond types between components in complexes not represented • Reactions forming complexes not represented • Reaction cannot be generalized to apply to a set of states
Multicomponent Species Proposal: New Features • Hierarchical SpeciesType • Biochemical entity type • Independent of compartment • Reaction generalized across compartments • SpeciesType is graph of SpeciesTypeInstance nodes • Arcs are pairs of binding sites • Unspecified association represented by disconnected parts • State Generalized Reactions • Products and Reactants are also graphs • Graphs contain ‘wildcards’ copied from products to reactants • State Generalization is already being employed by • Alpha Project, Molecular Sciences Institute, USA (Lok) • T10 Group, Los Alamos National Laboratory, USA (Hlavacek etal) • StochSim, University of Cambridge, UK (Firth, Le Novère, Shimizu) • Cell Systems Initiative, University of Washington, USA (Loriaux etal) • BIOCHAM, INRIA, France (Fages, Chabrier, Soliman) • others ?
Examples of Multicomponent Species Proposal: Species Types t • Empty SpeciesType is atomic • Simple Association • Binding sites • Binding u t t n m v A x v y p q 0 C A D
Examples of Multicomponent Species Proposal: State Generalized Reactions • Basic Reaction • Generalized Reaction • Generalized Reaction simplified v w v w p p + o o 0 0 A A B B y v v y q + r r G q 0 0 G C D A C A D y v v y s + t t s 0 0 D A A D