230 likes | 348 Views
CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012). Ineke Schuurman Menzo Windhouwer. Issues brought up by participants Which elements are to be included in ISOcat (CLARIN) standards, TEI etc Type of DC When to create a new DC/adapt an existing one When to create several DCSs
E N D
CLARIN-NL ISOcat workshop 2012part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer
Issues brought up by participants • Which elements are to be included in ISOcat • (CLARIN) standards, TEI etc • Type of DC • When to create a new DC/adapt an existing one • When to create several DCSs • Name of DC, several DCs with same name • How to deal with larger amounts of data
What to include? • ALL concepts dealing with linguistics/ metadata • Van Dale EN-NE include (overgankelijk werkwoord) 1) omvatten 2) (mede) opnemen ==> 'overgankelijk werkwoord' / 'transitive verb' is to be included, same for 'overg.ww', 'trns.v.' • One and the same DC!
What to include? ‘transitive verb’ • Several entries in ISOcat • DC-1405 A verb which takes a direct object; that is, a verb that expresses an action which directly affects another person or thing. • DC-3532 A transitive verb is a verb that takes a direct object, and describes a relation between two participants [Crystal 1997: 397; Payne 1997: 171] • And several more, so... which one to select?
When (not) to adopt an existing DC • It should ‘match’ with the way you use a specific notion in your annotation scheme, application, … • It should come with the same profile and type • That being said • Reuse a CLARIN NL/VL DC when possible (contact Ineke when such a definition is incorrect)
Same name • Not really a problem when it are good DCs, not even when coming with the same profile • PositivePolarity • In general, positive polarity refers to an assertion that contains no marker of negation [Crystal 1980: 299]. (DC-3405) • the property of a word or concept to express positive sentiment (myDC-xx) • Whether you can reuse DC-3405 depends on your use of the concept!
Same name • Do not avoid reuse of a name when it is the name commonly used! • Another type of duplicate names where one concept entails the other one: • meewerkend voorwerp • meewerkend en belanghebbend voorwerp • event (also called 'eventuality', and including 'state') • event (sister of 'state')
What defines a good DC? • Reusable definition • NOT • conversation (DC-2661) • Communication event with more than two participants • mother tongue (DC-2955) • […] a speaker’s mother tongue
What defines a good DC? Correct definition NOT (?) Actor (DC-4146) a participant in an action or process Question: is an addressee to be considered an actor? (used in DC-4158, no proper definition yet)
What defines a good DC? • Meaningful definition • NOT • annotation format (DC-2562) • Specifies the annotation format that is used … • source language (DC-2494) • Indicates if a language is a source language
Not that good examples • Mother tongue (DC-2955) • Specifies whether the language is a speaker’s mother tongue • Mother’s language (DC-4516) • […] NOT necessarily the mother tongue […] • - There is no definition of concept ‘mother tongue’ • (Relation with /home language/ , /primary language/, • /heritage language/?) • - And why ‘speaker’?
Rule Make your definition • as general as possible • as specific as necessary
Standards • Within ISOcat currently there are little or no standards, Therefore • CLARIN NL and VL will set up their own set of ‘standardized DCs’, Ineke will be in charge, selecting new flag “recommended by CLARIN NL/VL”
Standards Another issue wrt standards 'included' in ISOcat - Athens Core DC's (recommended by metadata/CMDI): we are currently adapting them in order to avoid tautologies and/or correct smaller ‘errors’ Target language: indicates if the language is the target language Conversation: […] three or more participants Same may be necessary for TEI Headers etc
DC/DCS and profile • Profiles are not added automatically, a DCS may contain elements with various profiles (although you may decide to create several DCSs) (do select proper names!) • In case the profile you need is not yet available, contact Menzo and Ineke
Part B: do’s & don’ts Do’s: • Create a DCS for your scheme (name project, ann.scheme, …) • Provide clear definition (short, to the point) for your scheme, application, …. • Take care not to leave concepts used in your definition undefined or vague • Use appropriate vocabulary (per profile) • Check ‘adopted’ DC’s regularly till standardization !
Do’s (continued) When creating a DC, fill out • Justification: used in XYZ, part of tagset N • Language section • Always English language section • Strong recommendation: sections for object language(s), for working language manual • Sections in the various languages should match (+/- be translations of each other)
Do’s (continued) When creating a DC, fill out • Example section • Note that *negative* examples may be very helpful! (jongens, mannen, niet: gelovigen (is form of ADJ))
Example sections Suppose you want to illustrate a German phenomenon: • Ex.sec. in EN language section • German ex with transl in English • Ex.sec. in NL language section • German ex with transl in Dutch • Ex.sec. in EN linguistic section • EN example • Ex.sec. in NL linguistic section • NL example with translation in English
Don’ts • Confuse Language and Linguistic section • Latter contains language specific values for closed domains • Be (too) language specific in definition • Mention scheme in definition • Use several definitions in one DC • Circular definitions • Rely on authority • Rely on standardized status • Definition should fit YOUR scheme, etc
. -- End --