120 likes | 260 Views
CLARIN? ISOCAT! Ineke Schuurman ISOcat content co ö rdinator CLARIN-NL Amsterdam 30-08-2012. 1. ISOcat general use in CLARIN An example Your task wrt ISOcat. Overview. 2. ISOcat: Data Category Registry defining widely accepted data categories (DCs) http://www.isocat.org
E N D
CLARIN? ISOCAT! Ineke Schuurman ISOcat content coördinator CLARIN-NL Amsterdam 30-08-2012 1
ISOcat general use in CLARIN An example Your task wrt ISOcat Overview 2
ISOcat: Data Category Registry defining widely accepted data categories (DCs) http://www.isocat.org Registry that stores DCs for language resources and their metadata, together with properties of the DCs (definition, administration, examples, etc.) ISOcat 3
Use in CLARIN what is in resource A meant with DC X ? There may be several (valid) definitions !!! Does X have the same meaning in resources A and B ? In CLARIN needed first and foremost for tools (so that they ‘know’ what the meaning of elements in resources are) Especially important for: search in data and metadata But also for other tools that apply to data (cf. last talk on TTNWW) Human use is only secondary, but … humans must after all fill the ISOcat registry, and make the right mappings 4
Have a look at these two tags: WW(pv,tgw,ev) N(soort,ev,dim,onz,stan) All parts of such tags, like ev, are to be included in ISOcat. The full tags are to be included as well. ev, enkelvoud, sg, sing, singular, singulier, … An example with ‘ev’
All these representations can be mapped on one DC: singular -DC-4918 word form indicating that one entity is involved In full: http://www.isocat.org/datcat/DC-4918 singular
Other cats ISOcat: defining DCs ongoing RELcat: relating DCs started SCHEMAcat: a registry of Schemas, a schema being a description of the structure of your dataformat just started 7
Call 4 projects Each call 4 project must check, for each DC used in your resource or its metadata, whether a corresponding DC exists in ISOcat If not, extend ISOcat with such a DC, with all its properties (definitions, examples, etc.) create a schema with a mapping that maps each DC used in the resources and metadata to an ISOcat DC All this will be explained in tutorials 8
Call 4 projects do NOT underestimate this ISOcat task! Good news: DCs used in some common formats are already included in ISOcat CGN / D-Coi tagset TEI header elements Many DCs concerning metadata Contact ASAP a CLARIN-centre to help you with this OR contact the helpdesk (helpdesk@clarin.nl) 9
Thank you for your attention. Any questions? CLARIN-NL 10
CGN CGN-format <pw ref=“fn000248.20.4” w=“is” pos=“WW(pv,tgw,ev)” lem=“zijn” … pq=“man” /> VU-DNC FoLiA-format <w xml:id=“BAObi1.s.5.w.18”> <t>is</t> <lemma class=“zijn”/> <pos class=“WW(pv,tgw.ev)”> … <pos/> <t class=“ocroutput’>is</t> </w> XML-format