240 likes | 421 Views
making sense of content TM. Porting terminologies to the Semantic Web (aka: the Semiotic Web) bernard.vatant@mondeca.com. Mondeca at a glance. Facts and figures Established : 1999 - Founder and CEO : Jean Delahousse - Staff 2010 : 22
E N D
making sense of content TM Porting terminologies to the Semantic Web (aka: the Semiotic Web) bernard.vatant@mondeca.com ISKO Linked Data Event - London - 2010-09-14
Mondeca at a glance • Facts and figures • Established : 1999 - Founder and CEO : Jean Delahousse - Staff 2010 : 22 • Bernard Vatant has been Senior Consultant for Mondeca since 2000 • Products • Intelligent Topic Manager (Vocabularies and Knowledge base management) • CA Manager (Content integration through semantic annotation) • Services • Consulting and training in Semantic Web technologies deployment • Modeling, data and vocabulary migration and integration • References • Publication, territorial management, tourism, public sector, health • Lexis Nexis, Wolters Kluwer, Thomson, BnF, Documentation Française, OPOCE … • Participation in many national and european research projects • Including DataLift http://datalift.org/ (just about to quick off) • Ongoing participation in Semantic Web standards and linked data community • From Topic Maps (2000-2001) to OWL, SKOS, … • In the Cloud : geonames.org, lingvoj.org ontologies ISKO Linked Data Event - London - 2010-09-14
Summary • A semiotic view of terminology • « Every sign is a thing » : signs (terms) are resources (business objects) • The semiotic triangle : terms, concepts and referents • Current approaches to term representations • SKOS-XL, BS 8723, ISO 25964 • The Eurovoc model : a term is a denotation of a concept • Lexvo.org : a term is a sign defined by string + language • ISO TC-37 standards (LMF) only XML schemas, no ontology • Moving forward • Limits of current approaches • A strawman « Simple Term System » • Introducing explicit « meaning » objects (aka : references or significations) ISKO Linked Data Event - London - 2010-09-14
The pervasive Web – quick reminder • Internet (ca.1970) • Network of identified, connected and addressable computers • Technical support : IP addresses • Web 1.0 (ca. 1990) • Network of identified, connected and addressable resources • Technical support : URLs, http • Semantic Web (ca. 2010) • Network of identified, connected and addressable representations • Technical support : URIs, RDF, content negociation • Just about anything can be represented and connected • People (Social Web), Devices (Web of Things), Places (GeoSemantic Web), Concepts (Web of Vocabularies) … « Everything is a Thing » Everything? Even signs? ISKO Linked Data Event - London - 2010-09-14
Every sign is a thing (& vice versa) Impasse Saint-Quentin http://fr.wikipedia.org/wiki/Fichier:Impasse_%C3%A0_sens_unique.jpg ISKO Linked Data Event - London - 2010-09-14
The semiotic triangle : road signs impasse, cul-de-sac, voie sans issue, no through road, dead end, 死路 … have to get out using the path you get in … sometimes no way to get out at all « signifié » representation denotation « signifiant » « référent » ISKO Linked Data Event - London - 2010-09-14
The semiotic triangle : lexical signs (terms) L’Arctique est la région entourant le pôle Nord de la Terre à l’intérieur et aux abords du cercle polaire nord (Wikipédia) « signifié » representation denotation ‘Arctique’@fr « signifiant » « référent » ISKO Linked Data Event - London - 2010-09-14
Sorting out Terms, Concepts and Things • Terms are lexical entities (signifiants) • Generally used as denotations for concepts or things • If possible qualified by terminologists • Expressed in some identified natural language • Devil in the details : encoding system, scripting system. • Concepts are specific representations of « things » • In a certain view of the world • For a specific functional purpose • Indexing, classification, search, inference • Things are ... just things • What users are about at the end the day (people, places, products, ideas …) • Terms, Concepts and Things should all be first-class citizens in the Semantic Web • Switching from a term-centric to a concept-centric view … • Like in SKOS and ISO 25964 • … does not mean that terms and terminology are out of the picture! • They simply need to be defined and managed at a different level ISKO Linked Data Event - London - 2010-09-14
Translation into Semantic Web languages Something « référent » owl:Thing http://dbpedia.org/resource/Arctic foaf:focus represents Concept « signifié » skos:Concept http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb11940481m denotes lvont:means skosxl:prefLabel Term « signifiant » skosxl:Label http://lexvo.org/id/term/fra/Arctique skosxl:literalForm ‘Arctique’@fr ISKO Linked Data Event - London - 2010-09-14
Concept-centric approach of terms (SKOS) • The concept-centric approach put concepts at the center of discourse • Terms are denotations of concepts • Standalone terms can be considered in theory, but not in practice • Minimal, shallow level of description of terms • Basic properties : lexical form + language • No support for proper lexical properties • Part of speech, lemma, tokenization, variant • Basic expressivity for term-to-term relationships • skosxl:labelRelation is just an abstract superproperty • Good expressivity of the term-to-concept relationships • But clearly asserted from a concept viewpoint • No support for context • Implicit context : the term-concept relationship inside a given concept scheme • Similar approach used by BS 8723 and ISO 25964 • Also used in EUROVOC model with customized extensions ISKO Linked Data Event - London - 2010-09-14
Concept-centric approach to homographs • A term can denote more than one concept • aka: homography, ambiguity … issue • Q : Are homograph terms (denoting different concepts) the same resource, or not? • In other words : should they be given the same URI? • The SKOS-xl approach • SKOS-xl statement : If two instances of the class skosxl:Label have the same literal form, they are not necessarily the same resource. • IOW : Existence of distinct terms (distinct URIs) bearing the same literal form in the same language is not forbidden. • « table@en » can be the literal form of different terms (different URIs), e.g., denoting different concepts such as « table (furniture) », « table (data base) » … • SKOS-xl does not enforce this distinction, either • Using the same term (same URIs) for different concepts is not forbidden ISKO Linked Data Event - London - 2010-09-14
Concept-centric model : EUROVOC • EUROVOC model is built as extension of SKOS • Subclasses of skosxl:Label • eu:ThesaurusTerm, eu:PreferredTerm, eu:SimpleNonPreferredTerm … • Type of term defined by the type of relationship to a concept • No « standalone definition » of a term : a term is attached to a single concept • Specific relationships between terms • Translation, Permuted lexical form • Full name/short name, Acronym/expansion • No lexical (grammatical) level properties • Neither POS, lemma, variants … • Homographs are distinct terms • Hence homographs attached to different concepts • Have different URIs … • … are not linked whatsoever, except appearing as sibling results of a query … • … should not occur since EUROVOC should be a unique name space ISKO Linked Data Event - London - 2010-09-14
A concept representation in EUROVOC User language choice(25 languages available) as seen in Mondeca back-office (ITM) pref label in current language concept attributes preferred term in current language preferred terms in other languages concept schemes hierarchy (domains and microthesauri) related concepts ISKO Linked Data Event - London - 2010-09-14
A concept representation (continued) non-preferred termsin various languages broader-narrower hierarchyDisplay uses terms in current user language ISKO Linked Data Event - London - 2010-09-14
Term representation level User language choice(25 languages available) lexical form term type term attributes The term « meaning » concept Display uses the preferred termin current user language relationships between terms ISKO Linked Data Event - London - 2010-09-14
The term-centric (semiotic) approach • As used by Lexvo.org • A term is uniquely defined by a string and a language • This definition is made functional in the URI structure • Example : http://lexvo.org/id/term/fra/Arctique • A term can have zero or more declared « meanings » • Values of the « lvont:means » property • The URI is functional whether there is zero, one or more declared « meanings » • Simple approach, but the number of meanings is to everyone guess • http://www.lexvo.org/id/term/eng/hubject • No meaning found in the data base, but the world is open • http://www.lexvo.org/id/term/eng/photosphere • Two meanings found, linked by a lexvont:nearlySameAs relationship • http://www.lexvo.org/id/term/eng/table • How many meanings? ISKO Linked Data Event - London - 2010-09-14
What « table@en » means many more of the same… ISKO Linked Data Event - London - 2010-09-14
ISO TC-37 terminology standards • Build up on top of various other (ISO) standards • Define a lot of data models or schemas • Either UML or XML schemas • Dwelve in deep complex lexical details • Addressing fine-grained terminology management issues • But provide no interoperability with the Semantic Web universe • Not even as informative annexes • Example : Lexical Markup Framework • An attempt to produce an OWL representation of LMF model • Neither normative nor even OWL-conformant • Been sitting useless on LMF website for two years. • Any feedback? Does anyone really care? http://www.lexicalmarkupframework.org/ • Even if published in Semantic Web formats • Chances of mainstream adoption are weak • Due to their sheer complexity… ISKO Linked Data Event - London - 2010-09-14
Adding context to the semiotic triangle http://sw.opencyc.org/2009/04/07/concept/en/Table_PieceOfFurniture « signifié » representation denotation Furniture « context » ‘table’@en « référent » « signifiant » ISKO Linked Data Event - London - 2010-09-14
Context of meaning in existing approaches • In SKOS and concept-centric models • The context of the meaning is the Concept Scheme<http://id.loc.gov/authorities/sh85131792#concept>a skos:Concept [ skos:prefLabel ‘Table@en’ skos:inScheme http://id.loc.gov/authorities#topicalTerms> ] • Reads from the viewpoint of the term • ‘Table’ is the english preferred term for concept ‘ #sh85131792’in the context of LCSH topical terms • In the purely semiotic approach of Lexvo.org • The only context is the declared language • Ambiguity is assumed, but not resolved • A term description is a bag of possible meanings ad translations • Useful, but not enough • In a nutshell, regarding context • Concept-centric approach is too restrictive … • Lexvo.org approach is too open … ISKO Linked Data Event - London - 2010-09-14
Trying to capture context • Context can be more than an implicit skos:ConceptScheme • A language • A country, a community • A document or corpus lexical context • Any combination of the above … • Actually a context might be any kind of relevant resource • Including list of resources • Neither term or concept should be linked directly to a context • Need to define « reference » or « meaning » resources • Linking one term to one concept and one context • Allowing attachement of metadata (e.g., Dublin Core) ISKO Linked Data Event - London - 2010-09-14
Requirements for « STS » • STS = « Simple Terminology System » • aka : « Simple Terminology Semiotics » • As simple as SKOS is for representation of concepts • And as extensible • Based on core classes of LMF or any relevant ISO TC-37 model • Simpler than LMF but extensible to capture all LMF subtleties • Interoperable with concept layers formats (SKOS and SKOS-xl) • As open and robust as the semiotic approach of Lexvo.org • Including representation of context/meanings/references • And of course recommended by a relevant standard body • Food for another W3C recommandation track? ISKO Linked Data Event - London - 2010-09-14
STS draft model (built upon lexvo ontology) dcterms:* Dublin Core metadata skos:Concept sts:Context sts:contextPropery anything geo:SpatialThing sts:spatialContext sts:signified sts:inContext sts:timeContext time:Period sts:Meaning sts:signifier lvont:language lvont:Language skosxl:literaForm rdf:Literal lvont:Term sts:lexicalProperty anything skos-xl:Label extensions to fit e.g., TC-37 LMF schemasor EUROVOC management specifics … ISKO Linked Data Event - London - 2010-09-14
Ready for a standardization track ? ISKO Linked Data Event - London - 2010-09-14