120 likes | 255 Views
Possible solutions. Menzo.Windhouwer@mpi.nl. Which data category to use?. If it fits your needs try to pick one which is bound for standardization: Metadata: owned by Peter Wittenburg or Daan Broeder Morphosyntax : owned by Gil Francapoulo Terminology: owned by Sue Ellen Wright
E N D
Possible solutions Menzo.Windhouwer@mpi.nl
Which data category to use? • If it fits your needs try to pick one which is bound for standardization: • Metadata: owned by Peter Wittenburg or Daan Broeder • Morphosyntax: owned by Gil Francapoulo • Terminology: owned by Sue Ellen Wright • If they are close to your needs you can now still contact these owners to discuss modifications • Once standardized you’ll have to issue a Change Request • Or pick one published by a group (maybe you can join the group or form a group, so you collectively maintain the DC(S)) • Create your own • Add a narrower/broader relationship to the RR CLARIN-NL - Call 1 - ISOcat status
Reminder: Data category types complex: open constrained closed writtenForm grammaticalGender email string string string Constraint: .+@.+ neuter feminine masculine simple: CLARIN-NL - Call 1 - ISOcat status
Which data category types? • TDGs give DC types based on some reference model: • Metadata: CMDI • Morphosyntax: LMF • Terminology: TBX • POS field (closed DC) of the lexical entry “walk” gets the value ‘verb’ (simple DC) • If the DC type doesn’t fit your needs: • Verb (open DC) feature of a feature structure gets the value “walk” • Unfortunately the DCR data model hasn’t yet facilities to share a semantic core between various types • Create your own and add a sameAs relationship to the RR CLARIN-NL - Call 1 - ISOcat status
Data category value domains • Is the value you need not yet known? • Contact owner to add your simple DC • Create your own closed DC, reuse existing simple DCs, add your own simple DC(s) • Add a sameAs relationship to the RR • It’s easy to create subset of a value domain, but not to create a superset … CLARIN-NL - Call 1 - ISOcat status
Granularity issues • How much (application) context to take into account? • Generic data categories (context insensitive): • Pro: reusable; the data model of your application provides the proper context, i.e., determines that only a subset of all the possible instances of the data category are interesting for your application domain; this context might be further described in the future by a new type of data categories: container data categories • Con: if too generic the data category is no more then a data type, e.g., “this is a date” • Specific data categories (context incorporated in the specification): • Pro: (semantic) search can be based on a relationship with a specific data category without the need to take the context into account (inference about the context) • Con: if too specific the data category may not be reusable in any other resource, and has almost no use for semantic interoperability (although this might be remedied by relationships in the RR with more generic data categories) • In search of the balance: • Go at least one level above the data type, e.g., /untilDate/ • If your data category is one-of-a-kind make it specific, maybe in the future others will ask you to generalize it CLARIN-NL - Call 1 - ISOcat status
Granularity • A resource listing actors in a play • Which data category to associate with the name of an actor? • String (data type) • Naam (data field) • Naam van eenpersoon • Naam van eenacteur • Naam van eentoneelspeler • Naam van eentoneelspeler in mijn type resources • Naam van eentoneelspeler in mijn resource generic specific CLARIN-NL - Call 1 - ISOcat status
Composite values • Some values are actually composites, e.g., “first plural exclusive vernacular” • If the composite is the lowest level in you data model you can’t link the parts to data categories as: • [a data category is the] result of the specification of a given data field [or its value](ISO 12620:2009) • However, you can link the composite value in the RR to multiple data categories or concepts; maybe using partOf or subClassOf relationships CLARIN-NL - Call 1 - ISOcat status
Relation Registry • The Relation Registry is basically a triple store: • Subject: resource 1 • Predicate: relationship • Object: resource 2 • You can use Turtle (or N3 or RDF/XML or …) to specify these triples: @prefix rel : <http://www.isocat.org/rr/relations#> . @prefix isocat : <http://www.isocat.org/datcat/> . # /first plural exclusive vernacular/ is-a /vernacular/ isocat:DC-1234 rel:subClassOf isocat:DC-4 . # /first person/ part-of /first plural exclusive vernacular/ isocat:DC-1 rel:partOf isocat:DC-1234 . # /plural/ part-of /first plural exclusive vernacular/ isocat:DC-2 rel:partOf isocat:DC-1234 . # /exclusive/ part-of /first plural exclusive vernacular/ isocat:DC-3 rel:partOf isocat:DC-1234 . CLARIN-NL - Call 1 - ISOcat status
Relation types • rel:related • rel:sameAs • (rel:distinct) • rel:subClassOf/rel:superClassOf • rel:narrower/rel:broader • rel:partOf, rel:directPartOf, rel:indirectPartOf • … Inspired by OWL and SKOS, but maybe other relation types are needed or other sets should be used? CLARIN-NL - Call 1 - ISOcat status
Relation type taxonomy • rel:related • rel:sameAs • rel:narrower • rel:superClassOf • rel:broader • rel:subClassOf • rel:partOf • rel:directPartOf • rel:indirectPartOf • … CLARIN-NL - Call 1 - ISOcat status
Relationships to other concepts • The Relation Registry accepts all kinds of URIs • So relationships can go outside of ISOcat: • Dublin Core elements • GOLD concepts • … • or your own public OWL ontology or SKOS taxonomy or … CLARIN-NL - Call 1 - ISOcat status