1 / 13

“How to put an annotation in HTML?”

Learn how to add annotations in HTML using ITS 2.0 and NIF for automated language processing. Explore its benefits for localization, language technologies, and internationalization.

vickiedukes
Download Presentation

“How to put an annotation in HTML?”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “How to put an annotation in HTML?” Ioannis Stavrakantonakis

  2. Outline • Research question • ITS 2.0 • NIF • What about Microdata? • Demo • References

  3. Research question We want to annotate Springfield with an URI to make sure that the computer understands we mean the Springfield in Massachusetts. HTML: <p>It is well known, that Springfield has mild summers and short, but hard winters.</p> HTML with annotation (something like that): <p>It is well known, that <span about="http://sws.geonames.org/4951788/">Springfield</span> has mild summers and short, but hard winters.</p> We don't want to add whole triples, but just annotate the HTML and say "this element refers to the following URI". From: Denny Vrandečić Sent: Wednesday, April 24, 2013 1:59 PM To: semantic-web at W3C Subject: How to put an annotation in HTML?

  4. ITS 2.0 • International Tag Set (ITS) [2] • enhances the foundation to integrate automated processing of human language into core Web technologies; • focuses on HTML, XML-based formats in general, and can leverage processing based on the XML Localization Interchange File Format (XLIFF), as well as the Natural Language Processing Interchange Format (NIF); • is a technology to add metadata to Web content, for the benefit of localization, language technologies, and internationalization (see more in [5] regarding localization (l10n) and internationalization (i18n))

  5. ITS 2.0 • Potential Users of ITS [2]: • Schema developers starting a schema from the ground up(proposals for attribute and element names to be included in their new schema) • Schema developers working with an existing schema(should check whether their schemas support the markup proposed in this specification, and, where appropriate, add the markup proposed here to their schema) • Vendors of content-related tools (e.g. tools for authoring, translation, etc.) • Content producers (may be used by them to mark up specific bits of content) • Machine Translation Systems • Text Analytics (automatically generated metadata for improving localization, data integration or knowledge management workflows) • Localization Workflow Managers

  6. ITS 2.0 The Text Analysis use case: • This data category is used to annotate content with lexical or conceptual information for the purpose of contextual disambiguation. • 3 pieces of annotation: • Confidence: The confidence of the agent (that produced the annotation) in its own computation – XSD double data type (e.g. 0.63) • Entity type: The type of entity, or concept class of the text analysis target – IRI (e.g. http://nerd.eurecom.fr/ontology#Location[8]) • Entity identifier: A unique identifier for the text analysis target – IRI or String (e.g. http://dbpedia.org/page/Innsbruck or the identifier for “Capital” from Wordnet [9])

  7. ITS 2.0 Rendered HTML: HTML with ITS metadata: <html xmlns="http://www.w3.org/1999/xhtml"><body><h2 translate="yes">Welcome to <span its-ta-ident-ref="http://dbpedia.org/page/Innsbruck" its-within-text="yes" translate="no">Innsbruck</span> in <b translate="no" its-within-text="yes">Austria</b>!</h2></body></html>

  8. ITS 2.0 • Conversion to NIF [2]: • Convert XML or HTML documents that contain ITS metadata to the RDF-based format based on NIF. The conversion results in RDF. • The conversion algorithm to generate NIF consists of seven steps. The output of the algorithm uses the ITS RDF ontology [7]. • The conversion to NIF is a possible basis for a natural language processing (NLP) application that creates, for example, named entity annotations. • To integrate the RDF annotations into the original input document is given in [6] (NIF2ITS).

  9. NLP Interchange Format (NIF) • NIF is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. • NIF will soon be a normative part of the ITS 2.0 • NIF and its community project NLP2RDF serve as an umbrella project liaising with other community of practices, especially: • LOD2 FP7 EU project • MultilingualWeb-LT Working Group • Best Practices for Multilingual Linked Open Data Community Group • Ontology-Lexica Community Group • Named Entity Recognition and Disambiguation (NERD) • Ontologies of Linguistic Annotation (OLiA) • University of Leipzig

  10. How is it different to Microdata annotations? <divitemscope itemtype="http://schema.org/Place"> What is the latitude and longitude of the <spanitemprop="name">Empire State Building</span>? </div> What is the latitude and longitude of the <span ?=?>Empire State Building</span>? Microdata + schema.org ITS2.0 + dbpedia resource <span its-ta-ident-ref="http://live.dbpedia.org/page/Empire_State_Building">Empire State Building</span>

  11. How is it different to Microdata annotations? What is the latitude and longitude of the <span ?=?>Empire State Building</span>? Semantics of Microdata annotations: Specify the type of information that is presented. Microdata ITS2.0 Semantics of ITS2.0 annotations: Specify entity identifiers (IRIs) for the presented information item.

  12. Hands-on / Demo • HTML with ITS metadata • Transformation of HTML with ITS metadata to NIF Notes: • Based on the XSLT files shared by the W3C Working Group member Felix Sasaki (@fsasaki) [4] • The Java internal XSLTC processor fails to compile the XSLTs. Use Saxon 9 HE.

  13. References [1] W3C semantic web list thread: http://lists.w3.org/Archives/Public/semantic-web/2013Apr/0218.html [2] ITS 2.0 W3C working draft: http://www.w3.org/TR/its20/ [3] NIF Core Ontology: http://persistence.uni-leipzig.org/nlp2rdf/ [4] Felix Sasaki ITS 2.0 extractor (github): https://github.com/fsasaki/its20-extractor [5] W3C, Localization vs. Internationalization: http://www.w3.org/International/questions/qa-i18n [6] W3C, Conversion NIF2ITS: http://www.w3.org/TR/its20/#nif-backconversion [7] W3C, ITS 2.0 / RDF Ontology: http://www.w3.org/2005/11/its/rdf-content/its-rdf.html [8] Named Entity Recognition and Disambiguation (NERD): http://nerd.eurecom.fr/ontology [9] WordNet Search 3.1: http://wordnetweb.princeton.edu/perl/webwn

More Related