240 likes | 251 Views
CLARIN-NL ISOcat workshop 2011 part 2. Ineke Schuurman Menzo Windhouwer. Part A Issues brought up by participants When (not) to adopt an existing DC What about (CLARIN) standards What with ‘flagged’ DCs Relation DCS – profile
E N D
CLARIN-NL ISOcat workshop 2011part 2 Ineke Schuurman Menzo Windhouwer
Part A • Issues brought up by participants • When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data
Part B • ISOcat and CLARIN: Do’s and don’ts (version 0.1) • Introduction and discussion
Part 1 • When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data
When (not) to adopt an existing DC • It should ‘match’ with the way you use a specific notion in your annotation scheme, application, … • It should come with the same profile • It should handle the same phenomenon, SpeakerID =/= SingerID
Speaker vs Singer String→Name→Person→Singer→Opera → Opera singer→Tenor →Tenor in La Bohème First: too generic, last: too specific The others are candidates Note that SingerID and SpeakerID are siblings, whereas SingerID is subclass of both Singer and ID (RELcat!)
When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data
Standards • Within ISOcat currently there are little or no standards, Therefore • CLARIN NL and VL will set up their own set of ‘standardized DCs’, Ineke will be in charge (she will consult with others)
When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data
Flagged DCs • Never link with ‘deprecated’ DCs ! (in case of doubt: consult with Ineke or Menzo) • In other cases the flags show whether the DC specification is correct from a technical point of view. • Note that only DCs with a green marking are qualified for standardization
When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data
DC/DCS and profile • Profiles are not added automatically, a DCS may contain elements with various profiles • In case the profile you need is not yet available, contact Menzo and Ineke
When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data
What to include? • Cf slide on SingerID/SpeakerID • In general: all linguistically meaningful notions mentioned in your schema, manual, definition (cf part B) • Abbreviations (PST for /past tense/) are to be mentioned as Data Element Name
When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data
TEI, metadata, webservice • TEI: likely to be taken care of at ‘higher level’, if not YOU are to insert the TEI definitions you use. • Metadata: new in CMDI? In that case definition in ISOcat to be provided as well • Webservice: to be taken care of in CMDI
When (not) to adopt an existing DC • What about (CLARIN) standards • What with ‘flagged’ DCs • Relation DCS – profile • What should be included in ISOcat (level of detail, abbreviations, …) • What about TEI, metadata, webservice? • How to deal with larger amounts of data
Larger amounts? in such a case: contact Menzo Windhouwer (menzo.windhouwer@mpi.nl)
Part B: do’s & don’ts Do’s: • Create a DCS for your scheme (name project, ann.scheme, …) • Provide clear definition (short, to the point) for your scheme, application, …. • Take care not to leave concepts used in your definition undefined or vague • Use appropriate vocabulary (per profile) • Check ‘adopted’ DC’s regularly till standardization !
Do’s (continued) When creating a DC, fill out • Justification: used in XYZ, part of tagset N • Language section • Always English language section • Strong recommendation: sections for object language(s), for working language manual • Sections in the various languages should match (+/- be translations of each other)
Do’s (continued) When creating a DC, fill out • Example section • Note that *negative* examples may be very helpful! (jongens, mannen, niet: gelovigen (is form of ADJ))
Example sections Suppose you want to illustrate a German phenomenon: • Ex.sec. in EN language section • German ex with transl in English • Ex.sec. in NL language section • German ex with transl in Dutch • Ex.sec. in EN linguistic section • EN example • Ex.sec. in NL linguistic section • NL example with translation in English
Don’ts • Confuse Language and Linguistic section • Latter contains language specific values for closed domains • Be (too) language specific in definition • Mention scheme in definition • Use several definitions in one DC • Circular definitions • Rely on authority • Rely on standardized status • Definition should fit YOUR scheme, etc
. -- End --