130 likes | 147 Views
Learn how to add annotations in HTML using ITS 2.0 and NIF for automated language processing. Explore its benefits for localization, language technologies, and internationalization.
E N D
“How to put an annotation in HTML?” Ioannis Stavrakantonakis
Outline • Research question • ITS 2.0 • NIF • What about Microdata? • Demo • References
Research question We want to annotate Springfield with an URI to make sure that the computer understands we mean the Springfield in Massachusetts. HTML: <p>It is well known, that Springfield has mild summers and short, but hard winters.</p> HTML with annotation (something like that): <p>It is well known, that <span about="http://sws.geonames.org/4951788/">Springfield</span> has mild summers and short, but hard winters.</p> We don't want to add whole triples, but just annotate the HTML and say "this element refers to the following URI". From: Denny Vrandečić Sent: Wednesday, April 24, 2013 1:59 PM To: semantic-web at W3C Subject: How to put an annotation in HTML?
ITS 2.0 • International Tag Set (ITS) [2] • enhances the foundation to integrate automated processing of human language into core Web technologies; • focuses on HTML, XML-based formats in general, and can leverage processing based on the XML Localization Interchange File Format (XLIFF), as well as the Natural Language Processing Interchange Format (NIF); • is a technology to add metadata to Web content, for the benefit of localization, language technologies, and internationalization (see more in [5] regarding localization (l10n) and internationalization (i18n))
ITS 2.0 • Potential Users of ITS [2]: • Schema developers starting a schema from the ground up(proposals for attribute and element names to be included in their new schema) • Schema developers working with an existing schema(should check whether their schemas support the markup proposed in this specification, and, where appropriate, add the markup proposed here to their schema) • Vendors of content-related tools (e.g. tools for authoring, translation, etc.) • Content producers (may be used by them to mark up specific bits of content) • Machine Translation Systems • Text Analytics (automatically generated metadata for improving localization, data integration or knowledge management workflows) • Localization Workflow Managers
ITS 2.0 The Text Analysis use case: • This data category is used to annotate content with lexical or conceptual information for the purpose of contextual disambiguation. • 3 pieces of annotation: • Confidence: The confidence of the agent (that produced the annotation) in its own computation – XSD double data type (e.g. 0.63) • Entity type: The type of entity, or concept class of the text analysis target – IRI (e.g. http://nerd.eurecom.fr/ontology#Location[8]) • Entity identifier: A unique identifier for the text analysis target – IRI or String (e.g. http://dbpedia.org/page/Innsbruck or the identifier for “Capital” from Wordnet [9])
ITS 2.0 Rendered HTML: HTML with ITS metadata: <html xmlns="http://www.w3.org/1999/xhtml"><body><h2 translate="yes">Welcome to <span its-ta-ident-ref="http://dbpedia.org/page/Innsbruck" its-within-text="yes" translate="no">Innsbruck</span> in <b translate="no" its-within-text="yes">Austria</b>!</h2></body></html>
ITS 2.0 • Conversion to NIF [2]: • Convert XML or HTML documents that contain ITS metadata to the RDF-based format based on NIF. The conversion results in RDF. • The conversion algorithm to generate NIF consists of seven steps. The output of the algorithm uses the ITS RDF ontology [7]. • The conversion to NIF is a possible basis for a natural language processing (NLP) application that creates, for example, named entity annotations. • To integrate the RDF annotations into the original input document is given in [6] (NIF2ITS).
NLP Interchange Format (NIF) • NIF is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. • NIF will soon be a normative part of the ITS 2.0 • NIF and its community project NLP2RDF serve as an umbrella project liaising with other community of practices, especially: • LOD2 FP7 EU project • MultilingualWeb-LT Working Group • Best Practices for Multilingual Linked Open Data Community Group • Ontology-Lexica Community Group • Named Entity Recognition and Disambiguation (NERD) • Ontologies of Linguistic Annotation (OLiA) • University of Leipzig
How is it different to Microdata annotations? <divitemscope itemtype="http://schema.org/Place"> What is the latitude and longitude of the <spanitemprop="name">Empire State Building</span>? </div> What is the latitude and longitude of the <span ?=?>Empire State Building</span>? Microdata + schema.org ITS2.0 + dbpedia resource <span its-ta-ident-ref="http://live.dbpedia.org/page/Empire_State_Building">Empire State Building</span>
How is it different to Microdata annotations? What is the latitude and longitude of the <span ?=?>Empire State Building</span>? Semantics of Microdata annotations: Specify the type of information that is presented. Microdata ITS2.0 Semantics of ITS2.0 annotations: Specify entity identifiers (IRIs) for the presented information item.
Hands-on / Demo • HTML with ITS metadata • Transformation of HTML with ITS metadata to NIF Notes: • Based on the XSLT files shared by the W3C Working Group member Felix Sasaki (@fsasaki) [4] • The Java internal XSLTC processor fails to compile the XSLTs. Use Saxon 9 HE.
References [1] W3C semantic web list thread: http://lists.w3.org/Archives/Public/semantic-web/2013Apr/0218.html [2] ITS 2.0 W3C working draft: http://www.w3.org/TR/its20/ [3] NIF Core Ontology: http://persistence.uni-leipzig.org/nlp2rdf/ [4] Felix Sasaki ITS 2.0 extractor (github): https://github.com/fsasaki/its20-extractor [5] W3C, Localization vs. Internationalization: http://www.w3.org/International/questions/qa-i18n [6] W3C, Conversion NIF2ITS: http://www.w3.org/TR/its20/#nif-backconversion [7] W3C, ITS 2.0 / RDF Ontology: http://www.w3.org/2005/11/its/rdf-content/its-rdf.html [8] Named Entity Recognition and Disambiguation (NERD): http://nerd.eurecom.fr/ontology [9] WordNet Search 3.1: http://wordnetweb.princeton.edu/perl/webwn