1 / 31

Semantics for Archives & Records Management at OECD

the Semantically Enriched Archivist. Semantics for Archives & Records Management at OECD. 45th ICA / SIO Conference , Brussels, 22 May 2019. As archivists we often face issues….

voorhees
Download Presentation

Semantics for Archives & Records Management at OECD

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. the Semantically Enriched Archivist Semantics for Archives & Records Management at OECD 45th ICA / SIO Conference, Brussels, 22 May 2019

  2. As archivistsweoften face issues… Performance: http://www.riannegroen.com/sven-sachsalber-at-palais-de-tokyo.html (…by the way, it took the artist 18 hours to find the needle…)

  3. because information withoutcontext… …islike a fishwithout water

  4. Solution = Context + Structure Well, that’sexactly the Archivist’sbread and butter,

  5. …the Fundamentals of archival description… Provenance Business Context Series Dossier as metadata in… Content Type Status So, if weembed the: Principle of Provenance Principle of Structure

  6. a set of Corporate Taxonomies, wecan use them…

  7. to semanticallyempoweroursearch & discovery !

  8. Yes, but what about the backlog ?????

  9. Manualindexingis no longer an option…

  10. Weneed robots to help us. But isthat possible?

  11. Yes! How ? Through Semantic Analysis

  12. What do the semantic robots do?

  13. SemanticEnrichment = Structure the Unstructured

  14. How do wedevelopthese robots ? We develop on a set of test documents (Test corpus) We test on complete corpus and we put in production using Web Services We debug to correct patterns and disambiguate

  15. Some OECD ArchivalExamples Problem 1: Wedon’t know what type of document itis! Document Type Classification Problem 2: Wedon’t have resources to index scanned documents manually! (OCR-ed) Document Indexing Problem 3: Full textsearchgivestoomanyresults! Topics and Geographical Areas Classification

  16. Solution1 Document Type Classification Quality : 95 % Precision – 85 % Recall Is this document a Report, an Agenda, an Invoice ?

  17. Solution 2 (OCR-ed) DocumentIndexing

  18. (OCR-ed) Document Indexing … • Overallqualityisremarkably goodBUT…. • 100% is not possible • And OCR canbe a challenge…

  19. OCR = Problems Wecan normalise dates But titles are more difficult: (in French, lionceau = lion cub…)

  20. BUT… Our biggest issue is: The« COLLECTION » Stamp

  21. Solution 3 Topics and Geographical Areas Classification • Identify the 15 Best Topics and Geographicalareas usingthe Central OECD Taxonomies

  22. Topics and Geographical Areas Classification Works remarkablywell…. Evenon OCR-ed documents!

  23. How do we use all theseMetadata ?

  24. OECD Taxonomies and Ontologies

  25. NO !

  26. Taxonomies and Ontologies

  27. O.N.E Sight – OECD SemanticDiscovery Interface

  28. Architecture Semantic Layer Data hub

  29. Multi-view annotation graphs We tag a sameresourcein differentways Wecansee a sameresourcein contextfromdifferent « semantic » viewpoints We use several semantic robots, based on several different taxonomies (generic, innovation-oriented, etc…)

  30. The OECD SemanticTimeline

  31. Conclusion By becomingSemanticallyEnriched Archivists, Librariansor Information Scientists wereally have become : KnowledgeGardeners Semantics are: Indispensable for our profession Trueenablers for KnowledgeDiscovery

More Related