120 likes | 242 Views
Working towards common naming conventions for use in CV and ontology engineering. Susanna-Assunta Sansone (EBI) on behalf of many other people acknowledged in the last slide http://msi-ontology.sf.net/recommendations. PSI and MSI ontology WGs – Our scenario.
E N D
Working towards common naming conventions for use in CV and ontology engineering Susanna-Assunta Sansone (EBI) on behalf of many other people acknowledged in the last slide http://msi-ontology.sf.net/recommendations
PSI and MSI ontology WGs – Our scenario • Proteomics and Metabolomics Standards Initiatives (PSI, MSI) • Large collaborative, multi domain efforts including: • Database/software developers, vendors, manufacturers • Experimentalists (biological/biomedical applications) • Minimal requirements, XML exchange formats, ontology WGs - Experimental workflow, data produced and analysis: -> Design, sample characteristics, treatments, instruments, protocols -> Protein modification and interactions (PSI) • We create CVs to augment the PSI and MSI XML formats • List of terms, definitions organized as taxonomy (OBO format) - PSI CVs are currently being used by EBI and other databases • We build a ontology as part of OBI to minimize duplications • Share common terminology, where applicable, with other domains
PSI and MSI ontology WGs – Our needs • We use a modular engineering approach • To create orthogonal but integrable CVs - Ontology WGs are divided in subWGs, according to expertise -> Sample processing/separation (e.g. chromatography, gel) -> Instruments specific (e.g. MNR, MS) -> Data analysis -> (PSI-MOD) Protein modification -> (PSI-MI) Protein interactions • We need to use common naming conventions • To facilitate communication among PSI and MSI ontology subWGs - Heterogeneous background and no formal ontology training • To rely on such common conventions with the larger OBI group • We need to observe common design procedures • To harmonize the appearance and design of the CVs modules
What common conventions do we need? • To talk about the representation (reference terminology) • Name the representational artefact* types, clarify differences - E.g. thesaurus, CV, ontology • Name specificrepresentational units* within artefacts across representation languages (OBO, OWL) and semantics - E.g. classes vs. concepts or properties vs. relations • To name and define what we represent (domain things) • In a common and consistent manner • E.g.: ‘7_transmembrane_domain_receptor’ vs ‘G-Protein coupled receptor’, vs ‘GPC_receptors’ vs ‘GPCR_class’ • E.g. ‘sample_temperature_in_autosampler’ vs ‘sample’ vs ‘temperature’ and ‘autosampler’ linked by relations *Towards a Reference Terminology for Ontology Research and Development in the Biomedical Domain. Smith, Kusnierczyk, Schober and Ceusters. KR-MED 2006.
Why these do not exist? What is available? • Representational artefacts built according to different: • Engineering methodologies - MethOntology, TOVE, ENTERPRISE • Representation languages and semantics - OBO, OWL and CLIPS-Frames • Engineering ‘schools’ - GO, semantics web/DL, Protégé Frame, IFOMIS realism-based • As diverse as these backgrounds are the naming schemes! • Variety of ad hoc conventions out there, e.g. • BioPax Manual, GO style guide, ISO guidelines • Various references and material disperse in web pages, e.g. • Law and order: Assessing and enforcing compliance with ontological modeling principles in the Foundational Model of Anatomy (FMA) - S Zhang, O Bodenreider, Computers in Biology and Medicine 36 (2006)
Finding adequate documentation is hard • Implementation specific, limited coverage or scope, e.g. • BioPax Manual: - Naming conventions for classes, identifiers and instances are discussed at implementation level (Protégé/OWL) -> page 53, Technical Notes RDF:ID - Does not cover conventions for naming relations • GO style guide: - Has its own definition for namespace and its abbreviation which differs from the one in OWL/semantic web - Refers to terms and not classes - Does not cover conventions for naming relations • Visibility is also a limiting factor, e.g. • Information is dispersed or embedded into many documents - GO namespace, term names, and identifiers are explained in different documents - GO editor style and OBO edit web pages • Acceptance is ‘limited’ to the target community
We have created our own documentation (!) • Theory document: to name what we represent and the representation (work in progress) • “Working towards naming conventionsfor use in controlled vocabulary and ontology engineering” - Implementation and format independent document - Created for MSI, PSI Ontology WGs also target the larger OBI group - A straw man proposal…
We have created our own documentation (!) • Theory document: to name what we represent and the representation (work in progress) • “Working towards naming conventionsfor use in controlled vocabulary and ontology engineering” - Implementation and format independent document - Created for MSI, PSI Ontology WGs also target the larger OBI group - A straw man proposal… • Practice document: design principles(final review, soon used) • “Guidelines for the development of controlled vocabularies” - Implementation and format (OBO) specific document - Internal policy document for MSI and PSI ontology WGs -> Uses key words “MUST,” “MUST NOT,” “REQUIRED,” “SHALL,” “SHALL NOT,” “SHOULD,” “SHOULD NOT,” “RECOMMENDED,” “MAY,” and “OPTIONAL” to be interpreted as described in RFC-2119. S. Bradner, Key words for use in RFCs to Indicate Requirement Levels, Internet Engineering Task Force, RFC 2119, http://www.ietf.org/rfc/rfc2119.txt, March 1997
…to elaborate this further… • Wider accepted common naming conventions could • Facilitate access to ontology through meta-tools - Reduce diversity with which meta tools have to contend with -> E.g. OLS, NCBIO Portal, PROMPT (text mining tools?) • Assist in the integration • Comparison, alignment and mapping • Certainly, serve as guidelines for new communities We have started seeing the benefits • Appearance of what we represent has been normalized • But it is not just a matter of aesthetics • Communication has improved • Between developers from different domains and backgrounds • In geographically distributed, collaborative efforts
Acknowledgements and Resources • Authors and those contributing to the discussion • Daniel Schober*, Waclaw Kusnierczyk, Barry Smith, Chris Mungall, Philippe Rocca-Serra, Suzi Lewis, Robert Stevens, Dietrich Rebholz, Frank Gibson, Luisa Montecchi-Palazzi, Jane Lomax • Members of MSI, PSI and OBI ontology working groups • http://msi-ontology.sf.net • http://psidev.sf.net • http://obi.sf.net • Funding sources • *UK BBSRC e-Science, EU NuGO and CarcinoGENOMICS grants • *Semantic Mining NoE (visits to IFOMIS and Manchester) http://msi-ontology.sf.net/recommendations