570 likes | 714 Views
What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern the terms and relations Glorified glossary – with terms organised by is_a relations (class/subclass) into hierarchy
E N D
What is EDAM? • EMBRACE Data and Methods • Ontology for bioinformatics tools and data • A set of defined terms, relationships between terms and rules that govern the terms and relations • Glorified glossary – with terms organised by is_a relations (class/subclass) into hierarchy • Controlled vocabulary for describing: • Web services e.g. WSDL files • Standalone tools • Web servers • Databases • Data, e.g. XSD data schema associated with a WSDL file • Data syntax and file formats • Aims to describe (coarse level) all major bioinformatics databases, data and tools in use • The "beta" release covers tools (and associated data) in the EMBRACE Registry: • http://www.embraceregistry.net/
Scope • EDAM includes 7 sub-ontologies (branches of terms in their own namespace) In the domain of "bioinformatics tool and data description“: • biological entity – “Any biological thing (or part of a thing) with a physical existence, a physical part, region or feature that can be mapped to such a thing, a collection of such things or an observable phenonema or occurrence” • topic – “A general field of bioinformatics study, data, processing and analysis or technology.” • operation – “A specific, singular function or process performed by a tool, for example a WS operation. What is done, but not (typically) how or in what context.” • data resource – “A category of content of a data source including databases and ontologies.” • data – “A semantic description of a data entity (datum) commonly used in bioinformatics.” • format – “A reference (typically a URL) of a data format specification.” • Required terms not specific to this domain might (eventually) be removed – including the entity branch (which provides biological context for other branches).
Conceptual model Bold text within a box indicates a namespace (top-level term) Non-bold text within a box indicates a minor branch Text next to lines indicates a relation between two terms
Design Principles • It wasn’t just thrown together (honestly) … • Clearly defined scope • A purpose-independent design, not tied to a particular use case • Relevant to annotation of current: • WSDL files • XSD schema • Standalone databases, servers and tools • Comprehensive, with enough terms to be useful • Comprehensible, with terms and relations that are simple and intuitive • Uncluttered, including only commonly used terms use and with as few relation types as possible • Navigable, with a simple class (is_a) hierarchy • General, including terms of general use and excluding fine-grained specialised concepts. • Complementary to (not duplicate) other established ontologies. • Compatible (e.g. cross-referenced) with existing resources • Integrity, compatible (so far as possible) with "upper level" ontologies • Extensible, with clear guidelines for developers • Convenient, with clear guidelines for annotators • Ideally, support automated logical inference (reasoning software) • Validatable • There is a compromise between “ontological correctness” and usability – a pragmatic approach is essential!
Limitations • EDAM is/does not: • Describe syntax or file formats in detail (syntax namespace will provide references) • Define data structures. Although has_part / is_part_of relations are defined they are not currently used. • Include terms for every conceptual part of things. Typically a datatype is only listed if it known to be in common use • A catalogue of individual data structures, databases etc. Terms correspond to classes; specific instances are not included. • A full-strength ontology. Many relations and other domain features that could be expressed, e.g. in OWL format, are not modelled. • A way (in itself) to identify or unify all services and data (but it might help). • Complete (and arguably never can be).
Sources (current version) • Software collections and registries: • EMBRACE Web Services • EBI Web Services • EBI databases and retrievable fields known to the EB-eye web services () • EMBOSS including EMBASSY packages (>200 applications) • WHAT-IF data and services (see also WHAT-IF help) • Lists of tools from the Web • Domain ontologies: • myGrid ontology • NAR Databases • NAR web servers • Sequence (sequence-related terms) • Sequence service (sequence service terms) • Database-related terms: • dbxref.txt (databases cross-referenced in UniProtKB/Swiss-Prot) • List of databases collated by the ELIXIR project • Lists of databases from the web • Other (not used as source of terms): • MI (molecular interactions) • MIRIAM Resources • bio2rdf
Sources (to consider) 1. BioMoby: BioMoby Object Ontology (datatypes) BioMoby Namespace Ontology (namespaces) BioMoby service types (analysis types) BioMoby web service registry (Moby-compliant services) 2. Tool collections and registries: PSICQUIC services Web services lists and registries Services supported by the bio* projects 3. Domain ontologies: PDBML Schema (Protein Data Bank Markup Language) Sequence Ontology (sequence annotation and annotation exchange) BioPAX ontology (biological pathway data) Ondex ontology DAS (sequence annotation) Map (biological map-related terms from Gramene database) 4. XML formats: BSML MACSIM HSAML BEAST MSAML PHYLIP JalView 2 Project AlignmentML EBI Application XML UniProtKB RDF 5. Other: MSD/PDBe API OMG LSR documents
Download “Beta" version in OBO (Open Biomedical Ontologies) format: http://sourceforge.net/projects/edamontology/files/
Status • “Beta” version intended primarily for testing and feedback • Starting point for service nomenclature • Coverage is quite broad in general and quite deep for sequence analysis: • ~2000 terms with definitions • 8 basic types of relation (plus inverse relations) • Relations are defined but not used in many term definitions. Relations will be added in the future depending on requirements. • Maturing nicely through iterative cycles of development • Term names, definitions and hierarchy (is_a relations) in all branches are reasonably stable • Future versions will not be a fundamental departure • EDAM is being actively developed: • OBO uses IDs to uniquely identify terms. EDAM IDs will persist between versions: a given ID is guaranteed to identify the same concept. This does *not* imply term names, definitions and other fields will remain constant, but they will remain true to the concept. • Obsolete terms will also persist (they will not be removed and will maintain their ID). • Suggestions, requirements and collaborations welcome!
License • EDAM is made available to all without any constraint or license on its use or redistribution other than: • EDAM is clearly acknowledged as the source of the product. • EDAM files displayed publicly include the publication date and/or version number. • EDAM files are not altered and subsequently redistributed under their original name or with the same term identifiers.
Documentation • Documentation at: • http://edamontology.sourceforge.net/ • Including clear statement of: • Branches of terms (namespaces / sub-ontologies) • Relations • Rules (governing rules and relations) • Guidelines for Developers • Guidelines for Annotators (basic) • And more …
Viewing • EDAM may be viewed in: • Any text editor • Ontology editor • OBO Ontology Editor (OBOEdit) Version 2 • http://oboedit.org • Web-based browsers: • NCBO Ontology Browser • http://bioportal.bioontology.org/visualize/42800 • EBI Ontology Look-up Service (coming soon) • http://www.ebi.ac.uk/ontology-lookup/ • SRS • EBI SRS server • http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+LibInfo+-lib+EDAM
Viewing in Text Editor • Any text editor
Viewing in Ontology Editor • Ontology editor • OBO Ontology Editor (OBOEdit) Version 2 • http://oboedit.org
Viewing in Web-based Browser • Web-based browsers: • NCBO Ontology Browser • http://bioportal.bioontology.org/visualize/42800 • EBI Ontology Look-up Service (coming soon) • http://www.ebi.ac.uk/ontology-lookup/
Viewing in SRS EDAM is in EBI SRS server: http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+LibInfo+-lib+EDAM And from the EBI dbfetch: http://wwwdev.ebi.ac.uk/Tools/dbfetch/ Which allows the terms to be addressed : http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000352 (plain text view) or http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000352?style=html (HTML view) These views are the term “end-points”
Guidelines for Annotators • Which EDAM branch to use? • “topic” for coarse-grained annotation of tools, databases, servers and so on • “operation" for fine-grained annotation of tool functions • “data resource" for annotating data resources such as databases and servers into broad categories based on content-type • “data" and “format" for annotating data in semantic and syntactic terms respectively • Picking terms • Familiarise yourself with EDAM (use a text editor or OBOEdit) • Identify the correct branch/namespace (“operation", “data" etc. see above) • Search EDAM using keywords to find candidate terms. Use synyonyms, alternative spellings etc. • Pick the most specific term(s) available (some concepts are necessarily overlapping or general!) • Only pick a correct term (if it doesn't exist it can be added) • Use other ontologies • Use EDAM alongside other ontologies where possible and desirable. • For example, an operation that predicts specific features of a molecular sequence could be annotated with GO terms for the features.
Annotation of Web Services • Model of a Web Service • A WS is considered as an arbitrary (but usually related) set of one or more operations, reducing the problem of WS interoperation to one of compatibility between operations. • Operation • Discrete unit of functionality performing (typically) one or more definite functions • Reads an input • Writes an output • Uses zero or more data resources • Input • Payload of SOAP message passed in operation call • Name and (ideally) description is given in WSDL file • Input has one or XML elements which must be set (input values) • Output • Payload of SOAP message returned from operation call • Name and (ideally) description is given in WSDL file • Output has one or XML elements which are written (output values) • XML elements • Simple or complex XSD types given in XSD schema associated with a WSDL file • Correspond to values that are input or output by a service • Name and (ideally) description of element is given in schema • Element values are instances of a particular datatype with a semantic type and a specific syntax. • Most element values have a syntax fully specified by the schema • Some element values correspond to text in a specific file format which is not specified by the schema. Such reports may be a composite of different semantic types. • Data resources • Databases or ontologies used in the background • Not passed in a WS call • Might be specified indirectly via a parameter. For example an operation reads a database, the name of which is specified
Annotation of Web Services • Levels of annotation • Annotation of a WSDL file or associated XSD schema is possible at several levels. • Assuming SAWSDL annotation, the XML elements that may be annotated are: • Service (<wsdl:portType>) • Ideally one “Topic" term for the service as a whole • Operation (<wsdl:operation>) • Ideally one "Operation" term for each WSDL operation (more than one in exceptional circumstances) • Input (parameter) values (<xs:element>, <xs:complexType>, <xs:simpleType>, <xs:attribute>) • One "Data" term • One “Format" term • Output values (<xs:element>, <xs:complexType>, <xs:simpleType>, <xs:attribute>) • One "Data" term • One “Format" term • The expectation is for annotation of operation inputs and outputs to go into XSD schema although the WSDL file (<input> and <output> elements) might also be used. The following annotations might be useful but are not supported by SAWSDL: • Web service (<wsdl:service>) • One or more "Topic" terms to describe the general area(s) the service operates in • One or more “Data resource" terms to describe the data resources used by the service • Operation input (<input>) • One or more "Data" terms for the input(s) of each operation (if needed) • Operation output (<output>) • One or more "Data" terms for the output(s) of each operation (if needed)
Annotation of EMBOSS EMBOSS (European Molecular Biology Open Software Suite) >200 applications for (mostly) molecular sequence analysis Application descriptions are kept in ACD (Application Command Definition) file ACD file includes: 1 “Application definition” 1 or more “Data definitions” ACD files are annotated with EDAM terms Application definition: >=1 “topic” term >=1 “operation” term Data definition: >=1 “data” term
EMBOSS Service Annotation Annotated WSDL files (and associated XSD data schema) are available from: http://wwwdev.ebi.ac.uk/soaplab/typed/services/list You will see a list of service end-points with WSDL URLs. For example: http://wwwdev.ebi.ac.uk/soaplab/typed/services/alignment_consensus.cons.sa?wsdl To see the data schema associated with a WSDL, you must replace "?wsdl" with "?xsd=1", "?xsd=2" or "?xsd=3" For example: http://wwwdev.ebi.ac.uk/soaplab/typed/services/alignment_consensus.cons.sa?xsd=1
SAWSDL annotation The proposed format of SAWSDL annotation includes the term namespace, unique identifier and URN pointing to the term definition: <element name="elementName" sawsdl:modelReference="http://purl.org/edam/namespace/id"> Where ... * element is the XML element being annotated * elementName is the name of the XML element * namespace is the namespace of the EDAM term, e.g. "operation" * id is the unique identifier of the term, e.g. "0000295" The term name, if required, could be given as an XML comment after the annotated element: <element name="elementName" sawsdl:modelReference="http://purl.org/edam/namespace/id"> <!-- term_name --> This is not recommended however as term names are not guaranteed to remain constant. The value of the sawsdl:modelReference attribute is a URN pointing to the term definition. Proposal is to use PURLs (Persistent Uniform Resource Locators) which include the term namespace.
EDAM term end-points When pasted into a browser, the PURLs: http://purl.org/edam/topic/0000182 http://purl.org/edam/operation/0000292 http://purl.org/edam/data/0000863 ... will (eventually) resolve to: http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182 http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000292 http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000863 These are complete OBO term statements in plain text (OBO format). PURLs support text extensions allowing a format specifier to be added. For example these PURLs: http://purl.org/edam/topic/0000182?style=html http://purl.org/edam/operation/0000292?style=html http://purl.org/edam/data/0000863?style=html ... will resolve to OBO term statements in HTML such that terms referred to in the statements (via relations) will be clickable to allow navigation: http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182?style=html http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000292?style=html http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000863?style=html
EDAM term end-points • The eventual final list of end-points will provide other formats/views: • Plain text in OBO format (default) • HTML • XML • JSON • The term in a web browser, e.g. NCBO Ontology Browser. • http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182?style=html • http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182%format=xml • http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182%format=txt • http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182%format=json • http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182%format=browser (default) • For now, you can see this in action for this term: • http://purl.org/edam/entity/0000002 • http://purl.org/edam/entity/0000002?style=html
Parallel Developments • (and other applications) • These include: • BioXSD • EMBRACE Registry / BioCatalogue • Taverna • BioNEMUS • Ondex • ELIXIR