240 likes | 254 Views
This workshop explores the issues surrounding the adoption of existing data categories in ISOcat and CLARIN standards, as well as how to deal with larger amounts of data. Topics include when to adopt an existing data category, the use of flagged data categories, the relationship between data categories and profiles, and considerations for including details in ISOcat.
E N D
CLARIN-NL ISOcat workshop 2011part 2 Ineke Schuurman Menzo Windhouwer
Part A • Issues brought up by participants • When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data
Part B • ISOcat and CLARIN: Do’s and don’ts (version 0.1) • Introduction and discussion
Part 1 • When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data
When (not) to adopt an existing DC • It should ‘match’ with the way you use a specific notion in your annotation scheme, application, … • It should come with the same profile • It should handle the same phenomenon, SpeakerID =/= SingerID
Speaker vs Singer String→Name→Person→Singer→Opera → Opera singer→Tenor →Tenor in La Bohème First: too generic, last: too specific The others are candidates Note that SingerID and SpeakerID are siblings, whereas SingerID is subclass of both Singer and ID (RELcat!)
When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data
Standards • Within ISOcat currently there are little or no standards, Therefore • CLARIN NL and VL will set up their own set of ‘standardized DCs’, Ineke will be in charge (she will consult with others)
When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data
Flagged DCs • Never link with ‘deprecated’ DCs ! (in case of doubt: consult with Ineke or Menzo) • In other cases the flags show whether the DC specification is correct from a technical point of view. • Note that only DCs with a green marking are qualified for standardization
When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data
DC/DCS and profile • Profiles are not added automatically, a DCS may contain elements with various profiles • In case the profile you need is not yet available, contact Menzo and Ineke
When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data
What to include? • Cf slide on SingerID/SpeakerID • In general: all linguistically meaningful notions mentioned in your schema, manual, definition (cf part B) • Abbreviations (PST for /past tense/) are to be mentioned as Data Element Name
When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data
TEI, metadata, webservice • TEI: likely to be taken care of at ‘higher level’, if not YOU are to insert the TEI definitions you use. • Metadata: new in CMDI? In that case definition in ISOcat to be provided as well • Webservice: to be taken care of in CMDI
When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data
Larger amounts? in such a case: contact Menzo Windhouwer (menzo.windhouwer@mpi.nl)
Part B: do’s & don’ts Do’s: • Create a DCS for your scheme (name project, ann.scheme, …) • Provide clear definition (short, to the point) for your scheme, application, …. • Take care not to leave concepts used in your definition undefined or vague • Use appropriate vocabulary (per profile) • Check ‘adopted’ DC’s regularly till standardization !
Do’s (continued) When creating a DC, fill out • Justification: used in XYZ, part of tagset N • Language section • Always English language section • Strong recommendation: sections for object language(s), for working language manual • Sections in the various languages should match (+/- be translations of each other)
Do’s (continued) When creating a DC, fill out • Example section • Note that *negative* examples may be very helpful! (jongens, mannen, niet: gelovigen (is form of ADJ))
Example sections Suppose you want to illustrate a German phenomenon: • Ex.sec. in EN language section • German ex with transl in English • Ex.sec. in NL language section • German ex with transl in Dutch • Ex.sec. in EN linguistic section • EN example • Ex.sec. in NL linguistic section • NL example with translation in English
Don’ts • Confuse Language and Linguistic section • Latter contains language specific values for closed domains • Be (too) language specific in definition • Mention scheme in definition • Use several definitions in one DC • Circular definitions • Rely on authority • Rely on standardized status • Definition should fit YOUR scheme, etc
. -- End --