160 likes | 240 Views
Discussion, Outlook and Further Directions. Topics. Container Data Categories Relation Registries Data Category Concepts …. partOfSpeech. Lemma. writtenForm. writtenForm. Word Form. grammaticalGender. lexicalType. Container Data Categories - I. wordOrder. grammaticalGender. Lexicon.
E N D
Discussion, Outlook and Further Directions Standardizing Data Categories in ISOcat - Implementing Group Work for Thematic Domains
Topics • Container Data Categories • Relation Registries • Data Category Concepts • … Standardizing Data Categories in ISOcat - Implementing Group Work for Thematic Domains
partOfSpeech Lemma writtenForm writtenForm Word Form grammaticalGender lexicalType Container Data Categories - I wordOrder grammaticalGender Lexicon 1..* A (schema for a) typological database Lexical Entry 1..* 0..* Form Sense 0..* A LMF (ISO 24613:2008) compliant (schema for a) lexicon Standardizing Data Categories in ISOcat - Implementing Group Work for Thematic Domains
Container Data Categories - II • A (TC 37) meta model which is instantiated with a domain/application specific data category selection into a data model • An proprietary data model with a related data category selection • A tweaked standardized meta model: • e.g., additional classes to the LMF meta model • Problem: where are the semantics of these ‘containers’ described? • LMF meta model in ISO 24613:2008 • But no standard place for own adaptations/models Standardizing Data Categories in ISOcat - Implementing Group Work for Thematic Domains
Container Data Categories - III • Use the administrative and descriptive parts to manage standardization and describe the containers (components/tables/classes/objects/inner nodes…) of a meta/data model in the DCR • But the relationships between components and complex data categories wouldn’t be stored in the DCR (maybe in the RR) Standardizing Data Categories in ISOcat - Implementing Group Work for Thematic Domains
Relation Registries - I • Value domain membership • Subsumption relationships between simple data categories (legacy) • Relationships between complex data categories are not stored in the DCR partOfSpeech string pronoun personal pronoun Standardizing Data Categories in ISOcat - Implementing Group Work for Thematic Domains
Relation Registries - II • Rationale for not storing ontological relationships in the DCR: • Relation types and modeling strategies for a given data category may differ from application to application; • Motivation to agree on relation and modeling strategies will be stronger at individual application level; • Integration of multiple relation structures in DCR itself could lead to endless ontological clutter. Standardizing Data Categories in ISOcat - Implementing Group Work for Thematic Domains
Relation Registries - III • TC 37 needs ontological relationships: • resurrect ‘broader generic concept’ • is-a relationships (between complex DCs?) • … • Bridges • within the DCR: • users create the same (or very) close DCs • between ISOcat and ISO/CDB • ISOcat PID vs IRDI PID • between various registries: • interoperability between various communities • same-as relationships • Resource discovery needs context: • granularity of DCs • the /title/ of a book or the /title/ of a … • has-a relationships Standardizing Data Categories in ISOcat - Implementing Group Work for Thematic Domains
Relation Registries - IV MPI RR Typological Database System RR Relation registries MPI DCR ISO DCR Data category registries resource TDS database MPI archive Linguistic resources Standardizing Data Categories in ISOcat - Implementing Group Work for Thematic Domains
Data Category Concepts - I • TDGs create DCs from with a certain domain modeling view: • CMDI • LMF • TBX • … • DC get types based on these views. However, users with other (proprietary) data model might want to use the DC, but the type doesn’t fit. • POS field (closed DC) of the lexical entry “walk” gets the value ‘verb’ (simple DC) • Verb (open DC) feature of a feature structure gets the value “walk” • Both DCs could be semantically equivalent • Decouple some of the semantics of the DC specification and move it to a (DC) Concept so multiple DCs, with different types, can reuse it? Standardizing Data Categories in ISOcat - Implementing Group Work for Thematic Domains
Data Category Concepts - II • GOLD has been put into the DCR, but • Only some of the ontological relationships can be maintained • Only some concepts make sense as DCs, e.g., the upper ontology is too abstract • But you might still want to share/standardize these semantics and maintain these relationships … Standardizing Data Categories in ISOcat - Implementing Group Work for Thematic Domains
Data Category Concepts - III Linguistic resource (schema) Linguistic knowledge base Data categories Containers Concepts Relation Standardizing Data Categories in ISOcat - Implementing Group Work for Thematic Domains
Data Category Concepts - IV <lmf:lexiconxml:lang=“jp” alphabet=“ipa”> <lmf:entry> <lmf:lemma> <lmf:writtenForm>nihongo</…> … </…> … </…> … </…> Standardizing Data Categories in ISOcat - Implementing Group Work for Thematic Domains
Data Category Concepts - V Data model Knowledge base lexicon language entry alphabet japanese ipa lemma writtenForm Standardizing Data Categories in ISOcat - Implementing Group Work for Thematic Domains
Data Category Concepts - VI • Data Category Concepts use the administrative and descriptive parts of the DCR data model; Complex and simple DCs stay as they are now • Complex and simple DCs wrap around the semantics of a Data Category Concept and add information specific to their type • Is that possible? Or does the semantic description reflect the type? Standardizing Data Categories in ISOcat - Implementing Group Work for Thematic Domains
Data Category Concepts - VII • DCR would move to or include a concept registry • Relationship to ISO/CDB? • standardized snapshot is also available in ISO/CDB • grassroots approach leads to possibly many more non-standardized concepts available in ISOcat • alignment of PIDs using the RR Standardizing Data Categories in ISOcat - Implementing Group Work for Thematic Domains