280 likes | 360 Views
Minimum Information Requested In the Annotation of biochemical Models
E N D
Minimum Information Requested In the Annotation of biochemical Models Le Novère N., Finney A., Hucka M., Bhalla U., Campagne F., Collado-Vides J., Crampin E., Halstead M., Klipp E., Mendes P., Nielsen P., Sauro H., Shapiro B., Snoep J.L., Spence H.D., Wanner B.L. Nature Biotechnology (2005), 23: 1509-1515
Overview • The Problem • MIRIAM as the Solution • Scope • Rules of Reference Correspondence • Annotation • Example • Encoding scheme in SBML
The Problem • SBML and CellML define standard encoding formats for model to enable their exchange • A model encoded in these formats does not guarantee that • The encoding corresponds to a published model in a useful way • Reproduces results from the article • Contains all the components as described in the article • The encoding contains information about who created it or what published model it corresponds to • The encoding contains information about what biochemical concepts or entities are represented by the model elements • Difficult to store and search model collections efficiently • In Short • Can I trust this model or set of models?
The Solution • MIRIAM • Minimum Information Requested In the Annotation of biochemical Models • a set of rules for curating encodings of quantitative models
Compliance • MIRIAM can be thought of as defining a curation process the result of which is a MIRIAM compliant model encoding. • A MIRIAM compliant model encoding • Contains attribution annotation • With a reference to model documentation • Can be directly analyzed by tools as described by the documentation giving the same results as the documentation • Typically simulation • Contains some degree of unambiguous links from model components to corresponding concepts • Links should correspond to documentation • Encapsulation • The output of the curation process is encapsulated in the model encoding • Model and annoatations can be exchanged as a single entity
Scope of MIRIAM • Only relevant to models linked to a unique reference description • Undocumented models assembled from documented models are not covered • MIRIAM does not involve an assessment of the scientific content of the model • This is the Peer Reviewer’s job, they should: • review the models’ ability to predict and represent the quantitative behavior of biological systems and/or • review the theoretical contribution • MIRIAM focuses on the correspondence of the encoded model to its associated description • MIRIAM restricted to predictive models • correspondence of predictions key component of MIRIAM compliance
Components of MIRIAM • Minimum annotation of encoding for attribution • Rules for model correspondence • Scheme for annotating encoding with identifiers for concepts
The Rules for Correspondence • The model must be encoded in a public, standardized, machine-readable format (SBML, CellML, GENESIS ...) • The model must comply with the standard in which it is encoded! • The model must be clearly related to a single reference description. If a model is composed from different parts, there should still be a description of the derived/combined model. • The encoded model structure must reflect the biological processes listed in the reference description. • The model must be instantiated in a simulation: All quantitative attributes have to be defined, including initial conditions. • When instantiated, the model must be able to reproduce results given in the reference description within an epsilon (algorithms, round-up errors)
Attribution Annotation • The model encoding must be annotated with • A citation of the reference description (complete citation, unique identifier, unambigous URL). The citation should permit to identify the authors of the model. • the authors publish a description of the model • The name and contact of model creators must be joined. • the creators create the curated encoding of the model • The date and time of creation and last modification should be specified. An history is useful but not required. • A precise statement about the terms of distribution. MIRIAM does not require “freedom of use” or “no cost”.
External resource annotation • The annotation unambiguously relates biological or biochemical concepts or entities to model constituents • This achieved by encoding triples in the model Object Relation Subject Model Component Verb Term or Identifier for biochemical or biological concept or entity E.g. Species C is a version of CyclinD
Relations • Relations define the link between the model component and entity or concept • A series of relations is required to support model annotation: “has a”, “is version of”, “is homolog to” etc. • More later…
Use of Unique Resource Identifiers (URIs) to Systematically Encode Terms or Identifiers • Terms or Identifiers representing concepts or entities are encoded as Unique Resource Identifiers (URIs). This is different from a URL which references a physical resource. • A terms or identifiers are grouped into concept classes • A concept class is written as a Unique Resource Identifier (URI) • E.g. http://www.ebi.ac.uk/IntEnz/ for EC terms • Individual concepts within a class are identified with an identifier string which is unique within the concept class • E.g. 3.1.4.11 for IP3 production • A concept class URI and an identifier string for a specific concept in that class are combined in a single URI to create a URI for the concept • E.g. http://www.ebi.ac.uk/IntEnz/3.1.4.11 • These URIs can be translated into physical URLs
Encoding MIRIAM • The following components are required to encode MIRIAM • Model encoding formats • SBML, CellML, Genesis etc • Relation elements • URI set • mapping to physical resources is advisory • Systems Biology Ontology (SBO) • The standardization processes of the above formats are all independent of each other
Annotating a Model in SBML • Two parts: • Use of sboTerm • Just for the SBO concept class • see SBML talk • A standard annotation encoding format for everything else • including the attribution data
Standard Annotation formatin SBML L2V2 • Embedded in annotation element: looks just like another application specific annotation • Uses deliberately constrained form of RDF • Some overloaded semantics won’t be detected by RDF processors
Reference Annotation Format: Syntax <SBML_ELEMENT +++ metaid="SBML_META_ID" +++ > +++ <annotation> +++ <rdf:RDF > <rdf:Description rdf:about="#SBML_META_ID"> [MODEL_HISTORY] <RELATION_ELEMENT> <rdf:Bag> <rdf:li rdf:resource="URI" /> ... </rdf:Bag> </RELATION_ELEMENT> ... </rdf:Description> +++ </rdf:RDF> +++ </annotation> +++ </SBML_ELEMENT>
Reference Annotation Format: Direct Relation Elements • bqmodel:is • The object encoded by the SBML component is the subject of the referenced resource. • For instance, this qualifier should be used to link the model to a copy of the model in a model database. • bqmodel:isDescribedBy • The object encoded by the SBML component is described by the referenced resource. • This relation should be used to link SBML components to literature that describes the component.
Reference Annotation Format: Biochemical Relation Elements • bqbiol:is • The object represented by the SBML component is the subject of the referenced resource. • This relation could be used to link a reaction to its exact counterpart in KEGG or Reactome for instance. • bqbiol:hasPart • The object represented by the SBML component includes the subject of the referenced resource, either physically or logically. • bqbiol:isPartOf • The object represented by the SBML component is a physical or logical part of the subject of the referenced resource. • bqbiol:isVersionOf • The object represented by the SBML component is a version or an instance of the subject of the referenced resource. • bqbiol:hasVersion • The subject of the referenced resource is a version or an instance of the object represented by the SBML component. • bqbiol:isHomologTo • The object represented by the SBML component is a homolog to the referenced resource.
Reference Annotation Format: Hidden Semantic • Multiple relation elements of the same relation on the same SBML element represent alternatives • The relation statements are NOT simultaneously true.
Reference Annotation Format: Example <reaction id="adenineProd" metaid="adeprod"> <annotation> <rdf:RDF> <rdf:Description rdf:about="#adeprod"> <bqbiol:hasPart> <rdf:Bag> <rdf:li rdf:resource="http://www.ebi.ac.uk/intenz/#EC 2.5.1.22"/> <rdf:li rdf:resource="http://www.ebi.ac.uk/intenz/#EC 3.2.2.16"/> </rdf:Bag> </bqbiol:hasPart> <bqbiol:hasPart> <rdf:Bag> <rdf:li rdf:resource="http://www.genome.jp/kegg/reaction/#R00178"/> <rdf:li rdf:resource="http://www.genome.jp/kegg/reaction/#R01401"/> </rdf:Bag> </bqbiol:hasPart> </rdf:Description> </rdf:RDF> </annotation> </reaction>
Attribution Format <dc:creator rdf:parseType="Resource"> <rdf:Bag> <rdf:li rdf:parseType="Resource"> [[ +++ <vCard:N rdf:parseType="Resource"> <vCard:Family>FAMILY_NAME</vCard:Family> <vCard:Given>GIVEN_NAME</vCard:Given> </vCard:N> +++ [<vCard:EMAIL>EMAIL_ADDRESS</vCard:EMAIL>] +++ [<vCard:ORG> <vCard:Orgname>ORGANIZATION_NAME</vCard:Orgname> </vCard:ORG>] +++ ]] </rdf:li> ... </rdf:Bag> </dc:creator> <dcterms:created rdf:parseType="Resource"> <dcterms:W3CDTF>DATE<dcterms:W3CDTF> </dcterms:created> <dcterms:modified rdf:parseType="Resource"> <dcterms:W3CDTF>DATE<dcterms:W3CDTF> </dcterms:modified> ... Any Order
Attribution Example <model metaid="_180340" id="GMO" name="Goldbeter1991_MinMitOscil"> <annotation> <rdf:RDF> <rdf:Description rdf:about="#_180340"> <dc:creator rdf:parseType="Resource"> <rdf:Bag> <rdf:li rdf:parseType="Resource"> <vCard:N rdf:parseType="Resource"> <vCard:Family>Shapiro</vCard:Family> <vCard:Given>Bruce</vCard:Given> </vCard:N> <vCard:EMAIL> bshapiro@jpl.nasa.gov </vCard:EMAIL> <vCard:ORG> <vCard:Orgname> NASA Jet Propulsion Laboratory </vCard:Orgname> </vCard:ORG> </rdf:li> </rdf:Bag> </dc:creator> <dcterms:created rdf:parseType="Resource"> <dcterms:W3CDTF>2005-02-06T23:39:40</dcterms:W3CDTF> </dcterms:created> <dcterms:modified rdf:parseType="Resource"> <dcterms:W3CDTF>2005-09-13T13:24:56</dcterms:W3CDTF> </dcterms:modified> </rdf:Description> </rdf:RDF> </annotation>
References • MIRIAM • Le Novère N., Finney A., Hucka M., Bhalla U., Campagne F., Collado-Vides J., Crampin E., Halstead M., Klipp E., Mendes P., Nielsen P., Sauro H., Shapiro B., Snoep J.L., Spence H.D., Wanner B.L. Nature Biotechnology (2005), 23: 1509-1515 • Set of URIs • http://sbml.org/wiki/MIRIAM_URI_Set • Set of Relations (Qualifiers) • http://sbml.org/wiki/Biomodels_Qualifiers • SBML L2V2 (draft) • http://sbml.org/wiki/sbml-level-2-version-2.pdf • Universal Resource Identifier • http://www.w3.org/Addressing/URL/URI_Overview.html
The Dream • Current Biomodels curation process uses MIRIAM in post-hoc mode • Initial encoding is non-compliant • Not necessarily created by author • The right way is the peer-review process includes submission by the author of a MIRIAM compliant model • Ensures the reviewers actually review the MIRIAM model • Not guarented at all by current review process • Reviewers responsible for what is now model correspondence curation • Requires journal co-operation • Current mechanisms are weak • Annotation with links to external resources will probably never be done by authors
History of MIRIAM • Ad-hoc meeting at ICSB 2004 Heidelberg • Representatives of standardization and emerging model curation groups represented: • SBML • Mike Hucka and Andrew Finney • BioModels • Nicolas Le Novere • CellML and CellML repository • Edmund Crampin • JWS Online • Jacky Snoep • RegulonDB • Julio Collado-Vides • DOQCS • Upinder Bhalla