1 / 37

Towards Data Driven Publishing Leveraging Knowledge Graphs and Text Analytics Contech: 2018

This article explores the use of knowledge graphs and text analytics in data-driven publishing, focusing on the Ontotext Platform. Learn how knowledge graphs enhance traditional relational databases and how text analytics can provide semantic disambiguation and annotation. Discover how the combination of graph-based reasoning and vector space similarity can improve content search, recommendation, and relevance ranking. Gain insights into dynamic semantic publishing and its benefits for content curation, search, recommendation, and workflow optimization.

lorah
Download Presentation

Towards Data Driven Publishing Leveraging Knowledge Graphs and Text Analytics Contech: 2018

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Data Driven Publishing Leveraging Knowledge Graphs and Text Analytics Contech: 2018 Jem Rayfield, November, 2018

  2. Outline • From; Unstructured Ambiguous Content • Knowledge Graphs • Ontotext Platform • To; Data driven publishing

  3. How can I get OP? From: Unstructured Ambiguous Content

  4. S S NP NP VP VP PP PP Adj N Adj N V P N V P N Stolen painting found by tree Stolen painting found by tree

  5. Knowledge and Graphs

  6. Traditional relational databases only store information...

  7. Graphs treat the connections between information with equal importance.

  8. Knowledge graphs represent information in a manner similar to how a human understands information.

  9. Ontotext GraphDb; uses graph statements to reason and infer additional knowledge. Vector space indices for similarity.

  10. Graph; Reasoning & Inference S = Berners-Lee P = type O = Person S = Berners-Lee P = type O = Mammal DATA (RDF) S = Person P = subClassOf O = Mammal NEW Implied DATA (RDF) KNOWLEDGE (ONTOLOGY)

  11. Graph & Vector Space; Entity Awareness, Similarity +

  12. Big Knowledge Graphs; Provide Awareness • Important airports near london? • Most popular banks in UK • People mentioned together with Apple in the news

  13. Vector Space; Similarity &Concordance • Find similar content • Find similar concepts and link • Find relevant concepts for content

  14. GraphDb Vector Space; Similarity & Concordance Documents Annotated With Graph Ids urn:Car urn:Car urn:Make urn:Engine urn:Make urn:MLModel urn:Tires urn:Model urn:Markov urn:SUV urn:Tires Vector Space Index Similarity urn:Car 0 1 1 0 1 urn:Engine 0 1 1 urn:Tires 0 0 0 0 urn:SUV 0 0 1 urn:Make 1 0 0 urn:Model 1 0 0 urn:MLModel 0 1 0 urn:Markov

  15. Ontotext Platform

  16. Analyses content Concept Suggestions Classification Content Text Analytics API Sentiment Relationships Relationships ...

  17. TA: Vocabulary Aware Semantic Disambiguation Annotate Content Get Suggestions Entity Detection from Vocab Apple : Organisation Tim Cook : Person, CEO Tim Cook : Person, Footballer Samsung : Organisation NLP Pipeline Language Detection Suggestions POS Disambiguation ... Apple CEO Tim Cook was at a conference with the CEO of Samsung. Tim explained how smart phones are changing the consumer electronics market. Vocabulary Gazetteer Apple : Organisation Tim Cook : Person, CEO Tim Cook : Person, Footballer Samsung : Organisation Dynamic Vocabulary ... ... GraphDB Vocabulary Disambiguation (ML Model) Relevance 87% - Tim Cook : Person, CEO 68% - Apple : Organisation 56% - Samsung : Organisation ... Relevance Ranking (Statistical)

  18. Automated (Governed) Machine Learning update model load Re-train Text Analytics Machine Learnt Model moderate Gold Standard Corpus [W3C Open Annotation] modify corpus suggest Curation Accept|Reject|Modify

  19. Annotates content with knowledge Content Content Semantic Fingerprint Open Annotation API

  20. Content Vocabulary AnnotationGraph Organisation mentions type relevance:56% textpos:123,142 tag USA Annotation type Samsung location target competitor NASDAQ Content exchange about Apple relevance:68% Computer Hardware target sector tag textpos:123,142 Annotation ceo target about relevance:87% Tim Cook Person tag textpos:123,142 Annotation

  21. Understands content USA USA UK exchange located in NASDAQ headquarters Content Apple industry about Content ceo mentions Computer Hardware about Knowledge Graph Samsung Tim Cook Tim Cook

  22. Understands users USA UK located in NASDAQ lives in headquartered in exchange User interested in Apple Inc industry User Data employed by ceo Computer Hardware Samsung Knowledge Graph Tim Cook

  23. Captures behaviour Events Event API User Event Index

  24. Understands behaviour concept:follow content:view User User Behaviour content:scroll User content:dwell Social Behaviour tweet:view hashtag:follow

  25. Mine social behaviour Events Social API User Event Index User Behaviour

  26. Behavioral + Contextual recommendation Behavioral similarity Reads

  27. Increased Engagement User Behaviour + + = User Data Content Social Behaviour Knowledge Graph Knowledge Graph

  28. Architecture Unstructured Content Content Concordance Search Annotation User Events Text Analytics Recommendation Knowledge Graph Structured Reference data Semantic Fingerprint OP APIs Tools & Visualisations Users + Events

  29. To; Data driven publishing

  30. Dynamic Semantic Publishing Authoring • Rapid high value, lower cost content curation • Capture knowledge and meaning as re-usable data Search & Discovery • Unambiguous semantic search • Recommendation and Similarity Product • Re-purpose and aggregate with Business context • Generate new revenue streams

  31. Enhanced Publishing Workflow Authoring Editorial Production Delivery Discover Related Content Annotate With Concepts & Relations Dynamic Data driven products Contextual Semantic Search Recommend Related Content Organise & Improve Workflow Content Transformation Add references Link to products & archive Domain Modelled IA Personalised Content Streams Add Context

  32. DSP - BBC Sport • Goals • Create a dynamic semantic publishing platform that assembles web pages on-the-fly using a variety of data sources • Deliver highly relevant data to web site visitors with sub-second response "The goal is to be able to more easily and accurately aggregate content, find it and share it across many sources. From these simple relationships and building blocks you can dynamically build up incredibly rich sites and navigation on any platform." John O’Donovan, Chief Technical Architect, BBC

  33. The IET • Goals • Manageable, discoverable, searchable; Journals, research papers and articles • Semantic search using existing taxonomies • Intelligent citations and data provenance • Automated, dynamic repurposing of content assets • Enable new revenue opportunities

  34. Thank you! Experience the technology with our demonstrators NOW: Semantic News Portalhttp://now.ontotext.com RANK: News popularity ranking for companieshttp://rank.ontotext.com FactForge: Knowledge graph of linked open data and news about People and Organizations http://factforge.net

More Related