1 / 37

Linked Enterprise Data

Linked Enterprise Data. Leveraging the Semantic Web stack in a corporate environment ISWC 2012 – Boston Fabrice LACROIX – lacroix@antidot.net. Antidot – who we are. French-based Software Vendor Since 1999 | Paris, Lyon, Aix-en-Provence Information access | Data management

wray
Download Presentation

Linked Enterprise Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linked Enterprise Data Leveraging the Semantic Web stackin a corporate environment ISWC 2012 – Boston Fabrice LACROIX – lacroix@antidot.net

  2. Antidot – who we are • French-based Software Vendor • Since 1999 | Paris, Lyon, Aix-en-Provence • Information access | Data management • Mission: Provide our customers with innovative customizable solutions that help them create value with their data, and make their employees more aware and efficient.

  3. Clients Enterprises Publishing E-commerce Healthcare

  4. Unstructured documents • files, ECM, collaborative spaces • intranet, extranet, Web sites • e-mails, instant messaging

  5. Structured data • CRM, ERP, directory • knowledge bases • business applications (production, support)

  6. IS are bloated • 1 practice => 1 need => 1 application => 1 silo • Information system is driven by the process • Data are numerous, various and scattered

  7. Solutions or workarounds? BI MDM SOA Search

  8. Solutions and workarounds • Enterprise Search brings little value to users • Document oriented • Does not solve real business problems Google like Verity like

  9. What we want

  10. What we want ERP CRM Production LDAP ECM Support Files

  11. Changing the paradigm • Switching from an application view to a data centric way of thinking.

  12. Bring out the implicit • Build the Giant Enterprise Graph

  13. LED • Linked Enterprise Data application of the Semantic Web technologies and Linked Data principles to the enterprise infrastructure

  14. What works for the Web… • Federating silos on the Web http://www.w3.org/People/Ivan/CorePresentations/RDFTutorial/Slides.html#(102)

  15. …can’t always be used • in corporate IS • Legacy apps can’t be "Sparql’ed" • 80% un- or semi- structured data don’t fit in the model as such • Defining vocabularies/ontologies for silos is too complex and expensive • Don’t want RDF per se but valuable information • External data is available in XML/JSON through Web Services • Staff trained for RDB, XML, Web apps. • No Risk and stability strategy: SemWeb technology considered as new and immature

  16. The RDF/storage approach • Setting up a global RDF repository does not work either • ITs are afraid by the "RDF everywhere" activists

  17. Semantic Web technology still is the right solution in corporate environment BUT it is not an aim JUST use it as a means

  18. Just do it • Think of it as a stream paradigm • build new objects using existing data • without interfering with the existing infrastructure • with SemWeb somewhere under the hood

  19. Enterprise Graph HowTo • Construct the graph • generate triples from data • create triples from documents • Leverage the graph • enrich • infer • Browse the graph • select resources • build objects • Trash the graph

  20. How: extract & normalize • Harvest and normalize • as in an ETL • fetch, clean, transform… • normalize records (names, IDs) to prepare the linking step • For databases • db2triples : an RDB2RDF implementation by Antidot (open source, W3C validated)

  21. How: semantize • Don’t transform everything in RDF • cherry-pick a subset of interesting fields for each object and create their RDF triples counterpart • interesting == needed for linking or inferring Semantize

  22. How: semantize • Triples generation • Be smart: avoid upfront ontology design, use small vocabularies • Be pragmatic: transform XML tags and field names to predicates • Be agile: only insert what you need. And when you need more, add more. • Semantic Web fuels the modeling, linking and information building process

  23. Enterprise Graph HowTo • Construct the graph • generate triples from data • create triples from documents • Leverage the graph • enrich • infer • Browse the graph • select resources • build objects • Trash the graph

  24. How: semantize • Unstructured documents • Extract metadata and transform them as needed to RDF. • Ex: author => dc:creator • Use of text-mining to extract named entities: people, organizations, products… • generate those entities list using the data sources: directory for employees, CRM for companies and people, ERP for products • create triples like doc_URI quotes entity_URI

  25. How: semantize • Unstructured documents • Compare documents using various and dedicated algorithms • is the same • is included • is similar • is related • Generates new triples • create triples like <docA> is_sub_version_of <docB>

  26. Enterprise Graph HowTo • Construct the graph • generate triples from data • create triples from documents • Leverage the graph • enrich • infer • Browse the graph • select resources • build objects • Trash the graph

  27. How: enrich • Enrich the graph • run specific algorithms to generate more links and triples (classifiers, topic detection, …) • insert external data gathered from the LOD or other external datasets or APIs

  28. How: infer • Create new knowledge • add rules according to your needs IF a coworker is quoted in documents AND this coworker belongs to a business unit THEN the business unit is bound to the documents

  29. Enterprise Graph HowTo • Construct the graph • generate triples from data • create triples from documents • Leverage the graph • enrich • infer • Browse the graph • select resources • build objects • Trash the graph

  30. How: build • Build • select resources corresponding to objects seeds (using Sparql queries) • for each seed, follow links smartly in order to create basic objects Build

  31. How: build • Finalize • decorate the new knowledge objects with data set apart (not loaded in the triplestore) • now we have rich user-actionable objects Build Finalize

  32. Enterprise Graph HowTo • Construct the graph • generate triples from data • create triples from documents • Leverage the graph • enrich • infer • Browse the graph • select resources • build objects • Trash the graph

  33. How: expose • Make the new information available to users and to the entire IS Enrich Semantize Harvest Relational DB RDF Triplestore (Linked Data) Normalize Classify Annotate AFS search engine Indexation

  34. Conclusion • It works! • The triples we create and the inference rules we add are dictated by the goal / application • usage and value oriented • We benefit from the lazy-flexible-dynamic modeling of RDF-RDFS-OWL • we are agile • What matters is the graph. But the graph is not the triplestore • storage independent

  35. There’s an app for that • Antidot Information Factory • a software solution designed specificallyto leverage structured and unstructured data • enable large-scale processing of existing data • automate publishing of enriched or newly created information. Harvest Normalize Semantize Enrich Build Expose

  36. The Giant Enterprise Graph • Now we have a path to let SemWeb enter the enterprise

  37. Discuss Understand Learn Exchange www.antidot.net info@antidot.net Thanks for your attention QUESTIONS?

More Related