1 / 23

CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)

This workshop discusses the elements to include in ISOcat standards, when to create new data categories or adapt existing ones, how to deal with larger amounts of data, and other relevant topics. It also provides guidelines for creating good data categories.

jthornton
Download Presentation

CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CLARIN-NL ISOcat workshop 2012part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer

  2. Issues brought up by participants • Which elements are to be included in ISOcat • (CLARIN) standards, TEI etc • Type of DC • When to create a new DC/adapt an existing one • When to create several DCSs • Name of DC, several DCs with same name • How to deal with larger amounts of data

  3. What to include? • ALL concepts dealing with linguistics/ metadata • Van Dale EN-NE include (overgankelijk werkwoord) 1) omvatten 2) (mede) opnemen ==> 'overgankelijk werkwoord' / 'transitive verb' is to be included, same for 'overg.ww', 'trns.v.' • One and the same DC!

  4. What to include? ‘transitive verb’ • Several entries in ISOcat • DC-1405 A verb which takes a direct object; that is, a verb that expresses an action which directly affects another person or thing. • DC-3532 A transitive verb is a verb that takes a direct object, and describes a relation between two participants [Crystal 1997: 397; Payne 1997: 171] • And several more, so... which one to select?

  5. When (not) to adopt an existing DC • It should ‘match’ with the way you use a specific notion in your annotation scheme, application, … • It should come with the same profile and type • That being said • Reuse a CLARIN NL/VL DC when possible (contact Ineke when such a definition is incorrect)

  6. Same name • Not really a problem when it are good DCs, not even when coming with the same profile • PositivePolarity • In general, positive polarity refers to an assertion that contains no marker of negation [Crystal 1980: 299]. (DC-3405) • the property of a word or concept to express positive sentiment (myDC-xx) • Whether you can reuse DC-3405 depends on your use of the concept!

  7. Same name • Do not avoid reuse of a name when it is the name commonly used! • Another type of duplicate names where one concept entails the other one: • meewerkend voorwerp • meewerkend en belanghebbend voorwerp • event (also called 'eventuality', and including 'state') • event (sister of 'state')

  8. What defines a good DC? • Reusable definition • NOT • conversation (DC-2661) • Communication event with more than two participants • mother tongue (DC-2955) • […] a speaker’s mother tongue

  9. What defines a good DC? Correct definition NOT (?) Actor (DC-4146) a participant in an action or process Question: is an addressee to be considered an actor? (used in DC-4158, no proper definition yet)

  10. What defines a good DC? • Meaningful definition • NOT • annotation format (DC-2562) • Specifies the annotation format that is used … • source language (DC-2494) • Indicates if a language is a source language

  11. Not that good examples • Mother tongue (DC-2955) • Specifies whether the language is a speaker’s mother tongue • Mother’s language (DC-4516) • […] NOT necessarily the mother tongue […] • - There is no definition of concept ‘mother tongue’ • (Relation with /home language/ , /primary language/, • /heritage language/?) • - And why ‘speaker’?

  12. Rule Make your definition • as general as possible • as specific as necessary

  13. Standards • Within ISOcat currently there are little or no standards, Therefore • CLARIN NL and VL will set up their own set of ‘standardized DCs’, Ineke will be in charge, selecting new flag “recommended by CLARIN NL/VL”

  14. Standards Another issue wrt standards 'included' in ISOcat - Athens Core DC's (recommended by metadata/CMDI): we are currently adapting them in order to avoid tautologies and/or correct smaller ‘errors’ Target language: indicates if the language is the target language Conversation: […] three or more participants Same may be necessary for TEI Headers etc

  15. DC/DCS and profile • Profiles are not added automatically, a DCS may contain elements with various profiles (although you may decide to create several DCSs) (do select proper names!) • In case the profile you need is not yet available, contact Menzo and Ineke

  16. Part B: do’s & don’ts Do’s: • Create a DCS for your scheme (name project, ann.scheme, …) • Provide clear definition (short, to the point) for your scheme, application, …. • Take care not to leave concepts used in your definition undefined or vague • Use appropriate vocabulary (per profile) • Check ‘adopted’ DC’s regularly till standardization !

  17. Do’s (continued) When creating a DC, fill out • Justification: used in XYZ, part of tagset N • Language section • Always English language section • Strong recommendation: sections for object language(s), for working language manual • Sections in the various languages should match (+/- be translations of each other)

  18. Do’s (continued) When creating a DC, fill out • Example section • Note that *negative* examples may be very helpful! (jongens, mannen, niet: gelovigen (is form of ADJ))

  19. Example sections Suppose you want to illustrate a German phenomenon: • Ex.sec. in EN language section • German ex with transl in English • Ex.sec. in NL language section • German ex with transl in Dutch • Ex.sec. in EN linguistic section • EN example • Ex.sec. in NL linguistic section • NL example with translation in English

  20. Don’ts • Confuse Language and Linguistic section • Latter contains language specific values for closed domains • Be (too) language specific in definition • Mention scheme in definition • Use several definitions in one DC • Circular definitions • Rely on authority • Rely on standardized status • Definition should fit YOUR scheme, etc

  21. Procedure - 1

  22. Procedure - 2

  23. . -- End --

More Related