120 likes | 202 Views
SeaDataNet Ontology Use Case. Coastal Atlas Interoperability Workshop, Corvallis, July 17-19 2007. Roy Lowry British Oceanographic Data Centre. (+ Lessons Learned). Summary. What is SeaDataNet? Some SeaDataNet semantic issues What has SeaDataNet done? What is SeaDataNet going to do?
E N D
SeaDataNet Ontology Use Case Coastal Atlas Interoperability Workshop, Corvallis, July 17-19 2007 Roy Lowry British Oceanographic Data Centre (+ Lessons Learned)
Summary • What is SeaDataNet? • Some SeaDataNet semantic issues • What has SeaDataNet done? • What is SeaDataNet going to do? • Is SeaDataNet relevant to CAI?
What is SeaDataNet? • SeaDataNet in a Nutshell • Combine over 40 oceanographic data centres across Europe into a single interoperable data system • Approach is to adopt established standards and technologies wherever possible • Two phases: • One brings 12 centres together with centralised metadata and distributed data as files. Due fully operational in autumn 2008 (beta next February) • Two introduces data virtualisation, aggregation, cutting and 30 more centres. Due in 2010 • Project is well on its way up the interoperability operational implementation curve
SeaDataNet Semantic Issues • The major problem facing the project is heterogeneous legacy content • SeaDataNet inherited 3 independently-developed metadatabases • Each is heavily populated (3000-30000 records) • Each had its own independently developed controlled vocabularies • These vocabularies • Covered overlapping domains • Said similar things in different ways • Provided a shining example of how NOT to manage vocabularies
Brief Diversion • Vocabularies can have two types of governance • Content governance • Mechanism for making decisions on vocabulary population • Expected deliverables include: • Vocabulary standards and internal consistency • Change on a timescale matching the needs of the user community • Terms with definitions!!! • Technical governance • Vocabulary storage, maintenance and serving • Expected deliverables include: • Convenient access to up to date vocabularies • Clear, rigorous vocabulary versioning • Version history through audit trails • Maintenance that doesn’t break user systems
SeaDataNet Semantic Issues • Vocabulary content governance • Done by individuals who were often inadequately qualified to do the job • Metadata entry form with an ‘Add to Vocabulary’ button used by students • Vocabulary technical governance • Scattered files on servers or inaccessible database tables • Multiple data models (e.g. some with abbreviations, some without) • No versioning • Vocabularies updated by destructive overwrites • Harmonisation required for related vocabularies • Within centralised metadata • Between partner local systems and centralised metadata
What has SeaDataNet Done? • Established content governance • Within SeaDataNet (TTT e-mail list) • Further afield (SeaVoX e-mail list) • Established technical governance • Adopted the NERC DataGrid Vocabulary Server • Heavily defended Oracle back end • Automated version and audit trail management • Web Service API front end plus clients e.g. http://vocab.ndg.nerc.ac.uk/client/vocabServer.jsp • Currently serving out 75 lists • Established a Mapping Infrastructure • List entries connected by SKOS RDF triples • Operational mappings between parameter vocabularies (GCMD science keywords, CF Standard Names)
What is SeaDataNet Going To Do? • Harmonise centralised metadata vocabularies or map if too hard • Map centralised vocabularies to partner system vocabularies • Build metadata crosswalks and generators (e.g. from CF) that include semantics (Use case 1) • Implement ‘Smart Discovery’ for legacy plaintext. E,g. search for pigment, find chlorophyll (Use case 2) • Establish URLs to represent vocabularies and individual entries delivering XML – probably SKOS – documents • Extend mapping efforts to other areas such as ‘devices’ • Release a much improved Vocabulary Server API (mid-August)
Is SeaDataNet Relevant to CAI? • This workshop is about building a coastal atlas ontology that brings together semantic resources that say similar things in different ways • The vocabulary entry semantic content may be different from oceanographic parameters, but the problem is essentially the same • If it works for SeaDataNet it will probably work for the CAI community • More important – if it didn’t work for SeaDataNet then it probably won’t work for CAI
Is SeaDataNet Relevant to CAI? • What has worked for SeaDataNet: • The NERC DataGrid Vocabulary Server • Content governance through a MODERATED e-mail list (also works pretty well for CF Standard Names) • Representing vocabulary terms by URNs in metadata documents • What I believe will work in the next 12 months: • Semantic interoperability through mappings • The conceptual framework of RDF in general and SKOS in particular • 21st Century tooling
Is SeaDataNet Relevant to CAI? • What hasn’t worked for SeaDataNet: • Weak content governance • Examples • Terms without definitions • Vocabularies without strict entity definitions populated by mixed entities e.g. • helicopter = class • RRS Discovery = instance • Vocabularies without managed deprecation • Poor technical governance • Example • A vocabulary served by: • Dynamic web page from database • Static HTML page • ASCII file as e-mail attachment • Each having a different number of entries….
That’s All Folks! Thank you for your attention Any questions? Morals Always provide definitions for your terms If you are going to use vocabularies to build an ontology make sure that they are properly governed