350 likes | 549 Views
NLP Interchange Format (NIF ). Presented by : Swaran Lata Email : slata@mit.gov.in Dated:1 st March 2013. Paradigm shift in the evolution of internet. “Internet is the network of networks.”. Web 1.0. Web 2.0. Web 3.0. Web 1.0.
E N D
NLP Interchange Format (NIF) Presented by : Swaran Lata Email : slata@mit.gov.in Dated:1st March 2013
Paradigm shift in the evolution of internet • “Internet is the network of networks.” Web 1.0 Web 2.0 Web 3.0
Web 1.0 • The first stage was linking web pages and sharing with web pages • The concept of Hyperlink was introduced in 1993 • Characteristics • Personal Web pages • Static web pages • HTML based sites • HTML forms sent via email • Use of framesets • The main type of connection was dialup having 50k bandwidth • Read only content • EgYoutube (Business Paradigm Shift in web) • Rebecca black Justin Beiber have become international stars overnight • Dhanush’sKolaveri D has become international hit
Web 1.0 era Portals Directories HTML static web pages Web 1.0 Content Management Systems Netscape
Web 2.0 • Web 1.0 graduated into Web 2.0 during 2003-06 • Web 2.0 is about user-generated content and the read-write web. People are consuming as well as contributing information through blogs • Concept of “prosumer” i.e. minimal differentiation between producer and consumer of content • Examples • Social Networking Sites – Hosted services • Blogs – Web Applications • Wikis – Mashups • Video Sharing Sites – Folksonomies
Web 2.0 era Web 2.0 RSS Feed
Web 3.0 • Will be metaverse • Will be a web development layer that includes characteristics • TV-quality open video • 3D simulations • augmented reality • human-constructed semantic standards • pervasive broadband, wireless, and sensors • a time when "the internet swallows the television.“ • Web 3.0 will allow the user to sit back and let the Internet do all of work for them
Web 3.0 (Contd..) • Web 3.0 Technologies (Semantic Web) Includes 1. Artificial intelligence 2. Automated reasoning 3. Cognitive architecture 4. Composite applications 5. Distributed computing 6. Knowledge representation 7. Ontology (computer science) 8. Recombinant text 9. Scalable vector graphics 10. Semantic Web 11. Semantic Wiki 12. Software agents
Web 3.0 era Cloud Ontologies Better Search Engines Web 3.0 SPARQL RDF Linked data Machine Readable data
What is Semantic web • Web of data • The Semantic Web, an extension of the current one[]. • It provides well-defined information, • Enabling computers and people to work in cooperation • Framework for sharing and reusing of data • Correlation of data with real world objects
Important components of Semantic Web • Major components: • Resource Description Framework (RDF) • Web Ontology language(OWL) • Linked Data • Vocabulary • SPARQL • Simple Knowledge Organization system (SKOS)
Resource Description Framework (RDF) • An XML-based language used to describe resources • Resources can include entities, concepts, properties and relations • Captures the meta data about the “externals” of a document • Can use a serialized model, RDF triplets, special notation, or graphs to describe data
Web Ontologies (OWL) • An ontology is an explicit specification of a conceptualization. • An ontology consists of a set of axioms which place constraints on sets of individuals (called "classes") and the types of relationships permitted between them. • To define an instantiate of Web ontologies. • OWL is a family of knowledge representation languages for authoring ontologies. • OWL differs from an XML schema in that it is a knowledge representation, not a message format. • Documents from different domains can be merged together to answer a user query.
Linked Data and it components • Linked Data describes a method of publishing structured data making it more useful & understand . • Linked Data publishes data on the web in such a way that it is machine readable. • Linked Data may be as diverse as databases maintained by two organisations in different geographical locations, or heterogeneous systems within one organisation that have not easily interoperated at the data level. Components: • URIs are used to identify things. • Use HTTP URIs so that these things can be referred to and looked up ("dereferenced") by people and user agents. • Provide useful information about the thing in the standard formats such as RDF/XML. • Include links to other, related URIs to improve discovery of other related information on the Web.
Linked open Data (LOD 2) Technology • The LOD2 stack is an integrated distribution of aligned tools which support the life-cycle of Linked (Open) Datafrom extraction, authoring/creation over enrichment, interlinking, fusing to visualization and maintenance. The life-cycle comprises in particular the stages : • Extraction of RDF from text, XML and SQL • Querying and Exploration using SPARQL • Authoring of Linked Data using a Semantic Wiki • Semi-automatic link discovery between Linked Data sources • Knowledge-base Enrichment and Repair
Linked open Data (LOD 2) Project • NLP2RDF is a LOD project that is developing the NLP Interchange Format (NIF). • NIF aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. • The output of NLP tools can be converted into RDF and used in the LOD Stack.
What is NIF • NLP Interchange Format (NIF) is an RDF/OWL-based format that allows to combine and chain several Natural Language Processing (NLP) tools in a flexible, light-weight way. The core of NIF consists of three parts: 1. A set of URI recipes, used to create unique and potentially stable URIs to anchor annotations in documents. 2. A vocabulary, which can represent Strings, Words and Sentences as RDF resources. 3. Transformations for the programmatic usage of the Ontologies of Linguistic Annotations (OLiA).
Important Components Of NIF • Structural Interoperability :URI recipes are used to anchor annotations in documents with the help of fragment identifiers. The URI recipes are complemented by two ontologies (String Ontology and Structured Sentence Ontology), which are used to describe the basic types of these URIs (i.e. String, Document, Word, Sentence) as well as the relations between them. • Conceptual Interoperability:The Structured Sentence Ontology (SSO) was especially developed to connect existing ontologies with the String Ontology and thus attach common annotations to the text fragment URIs. The NIF ontology can easily be extended and integrates several NLP ontologies. • Access Interoperability: A REST interface description for NIF components and web services allows NLP tools to interact on a programmatic level.
NIF – Integration Architecture NIF Wrapper NIF Wrapper NLP TOOL NLP TOOL NLP TOOL NIF Wrapper RDF Model Wordnet
Associated Standard • Web Ontology language(OWL) • NLP • Linked Data • RDF
How NIF Helps NLP Requirements of Web • All URIs created by the mentioned URI recipes should be typed with the respective OWL Class. • In each returned NIF model there should be at least one URI that relates to the document as a whole. • Each other annotated String should be related to the URI given to the Document with a property that is a sub property of str:subString. • For each annotation, a reference model should be used, so the annotations are machine-interpretable.
How NLP Tools are integrated with NIF Models • NLP tools can be integrated with NIF, if an adapter is created, that is able to parse a NIF Model into the internal data structure and also to output the NIF as a serialization. A NLP pipeline can then be formed by either: • Passing the NIF RDF Model from tool to tool • Passing the text to each tool and then merge the NIF output to a large model. The URI recipes of NIF are designed to make it possible to have zero overhead and only use one triple per annotation
The Structure of Word net Wn: word Wn: synset Wn: word Sense Wn :word Wn: has sense Wn: in synset Rdf: type Wn: lexical form संज्ञा(Noun) बातचीत Wn: word Wn: word Sense Wn: synset Wn :word Wn: has sense Rdf: type Wn: in synset क्रिया(Verb) Wn: lexical form कर्म Relation to other word senses, e.g. antonym Relation to other synset e.g. hypernym , hyponym
How word net is related to semantic web/RDF Data Base of different lexical and semantic web relation b/w Hindi words RDF OWL Linked Data Hindi Word Net
XML Model • XML is a tree-structured document • Nodes • Element nodes • Children can be ordered • Recursive elements (parts under parts) • Attribute nodes • Mandatory or optional • Edges • Sub-element edges • Attribute edges • IDRef edges • Constraints • References • Value restrictions, OneOf • Cardinality • Trees are more flexible than tables • Any number of nodes can be added anywhere without breaking the model
Future work • Wordnet to RDF format • Wordnet with other Ontologies like - Library - ISSN • Matching of Wordnet vis-à-vis generic ontology. • Proliferation of Semantic Web/Linked Data through creating awareness. • Development Semantic Web/Linked Data for use in Wordnet. • To evolve the opportunities for implementation of Semantic web in Indian Languages.
क കൂ କ ಕ ਕ క क గ ક ক ಕ ક ಕ କ ਕ ক क ક గ ಕ Thanks & Questions slata@mit.gov.in 91-11-24301272