90 likes | 190 Views
Merging Terminology and Disambiguation. Tadej Štajner, Mārcis Pinnis MLW-LT WG Meeting, 24.1.2013, Prague. Summary. 1) We can combine Terminology and Disambiguation under one umbrella, but at the same time do not loose backwards compatibility.
E N D
Merging Terminology and Disambiguation TadejŠtajner, MārcisPinnis MLW-LT WG Meeting, 24.1.2013, Prague
Summary • 1) We can combine Terminology and Disambiguation under one umbrella, but at the same time do not loose backwards compatibility. • 2) There are compromises, however (in what can and what cannot be annotated)!
Current state • term, termInfo*, • disambigIdent* • disambigClass* • disambigGranularity (entity|lexicalConcept|ontologyConcept) • disambigConfidence • Can’t support simultaneous annotation on multiple granularity levels • Can there be more granularity levels?
Common ground • Both data categories model things in the form of <fragment, relationship, URI> • The current relationships that we model in both DCs are term, entity, lexical concept, ontology concept, class type • With some exceptions: • its:term=“yes” • its:disambigSource, its:disambigIdent
Scenario A • Add ‘term’ as another disambigGranularity level, keep its:term as a flag to allow for the simple use case • Fits in same framework • Precludes use of multiple levels on the same node, so it fails on this constraint
Suggestions • Divide Disambiguation in smaller parts dividing the "granularities" (or levels) in separate parts to accomodate multiple simultaneous levels • each of the levels serves a different purpose for different target audiences and scenarios • Treat Terminology as equally important level • This is where we end up under one umbrella • Implement this in the spec as a set of attributes per level that have a similar pattern for every level • This will allow independency of the different levels • Keep the cardinality to 1 value for every attribute: • users should use one tool for one text analysis level - if they use multiple, they have to create a wrapper that is able to combine the outputs. Otherwise you will end up having conflicting annotations anyway – we shouldn‘t encourage that. • Keep the inheritance of annotations as-is - no hierarchical entities
Scenario B • Keep Terminology as-is, drop disambigGranularity, and encode the levels in the attributes themselves: • entityIdent*, lexicalConcept*, ontologyConcept* - many new attributes, following the same pattern • Allows annotating multiple levels on same node • Every “level” gets its own set of attributes: • *ref • *refPointer • * + *source • *confidence Same pattern appears everywhere – this would make adoption easier compared to declaring granularity.
Hierarchical named entities? • Difficult to interpret on the individual node level: • <b disambigClassRef=“nerd:Person”>Mayor of <b disambigClassRef=“nerd:City”>London</b></b> • “London” would have classes Person and City. Can we really say that London is a Person? (Answer: not without looking at the surrounding nodes) Painful to implement!
Other possible levels • Proposal: add a loosely-defined “keyword” level for annotations which don’t fit in others