1 / 31

Linked Data beyond 2020

Linked Data beyond 2020. MAKING OFFICIAL STATISTICS DISCOVERABLE AND REUSABLE. Objective of the discussion. The DIME/ITDG is invited. to confirm the relevance of Linked Data (LD) for dissemination of official statistics in view of the objectives for the ESP 2021-27

kirsten
Download Presentation

Linked Data beyond 2020

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linked Data beyond 2020 MAKING OFFICIAL STATISTICS DISCOVERABLE AND REUSABLE

  2. Objective of the discussion The DIME/ITDG is invited to confirm the relevance of Linked Data (LD) for dissemination of official statistics in view of the objectives for the ESP 2021-27 To validate the need for a step-wise and community-based approach for taking benefit of LD opportunities and building ESS readiness To discuss and define ambitions for activities at ESS level (see section 4.2) To identify and prioritise actions prone to ESS collaboration (see section 4.3).

  3. Outline of the presentation ESS “as is” and challenges/opportunities Moving forward Discussion Context • Objective of the discussion • General timeline • Motivation • Technology trends • ESSnet • ESS benchmark • Eurostat and EC • Objectives • Proposed actions

  4. Context

  5. ESS LOD Timeline • Eurostat data sets catalogue published on EU data portal • NUTS as URIs (EU data portal) • First ESS benchmark and framework for deployment • LOD as an action strand of project DIGICOM • LOD Beyond 2020 • End of project report • Launching ESSnet • First Eurostat pilot contract : basic infrastructure • Architecture definition • Second Eurostat pilots : prototyping end user products, semanticsthickening • Eurostat metadata architecture projects • Benchmark ESS LOD maturity / attraction Early days 2018 2019 2020 Drivers Drivers Drivers • ESS vision 2020 objectives • Explore landscape • Build awareness • Deploy • Realize benefits • Define strategy and actions plans • Assess feasibility • Assess benefits • Build capability

  6. Policy context The innovation ESP 2021-2017 is being discussed at ESS level (ESSC, VIG, VIN) “Beyond ESS vision 2020: implementation of the next multiannual statistical programme 2021-2027” paper ESSC provides a list of domain for the future cooperative actions the ESS. In this context, one of the objective is to “Making it easier for users to access and understand statistics, including by providing attractive and interactive visualisations, more tailored services like on-demand data, and self-service analytics”. Pending further discussion and validation at VIN and ESSC levels, the LD principles and ecosystem can play an important role in operationalizing these objectives

  7. Context – motivation • Making EU statistics more discoverable mobilising the rich ESS metadata assets to support data search and exploration. • Bridge EU statistics with the wealth of data at NSI level and local government level collected with higher granularity and dimensionality • Enableofficial statistics to bequeriedtogetherwithother data on the web of data using Web open standards • Increasevisibility of official statistics on the Web of Data in particularmakingthemcrawlable by (data) searchengines • Improve user experienceby providingstatisticalindicators and contextual information crossing data silos

  8. Context– Linked Data Ecosystem • The LD standards support a human-centric way of exploring data sets by basing itself on triples (subject, object, predicate) known from natural languages. • Publishing as linked-data offers a flexible, non-proprietary, machine programmable means for providing advanced and dynamic querying and visualization capabilities • The statistical data and metadata expressed in RDF become (web) addressable allowing publishers and third parties to annotate and link to this data/metadata. • LD standards are widely used and promoted to expose datasets catalogs (DCAT-AP) on Open Data portal • LD enables combining data across datasetsand silos • LD ecosystem is based on open standards (W3C). Linked datasets have been steadily growing since the mid 2000 however the market seems to harness benefits from reduced and targeted implemented of standard like Json-LD and schema.org • Among the different target personas, the general public hardly reap the benefits of LOD due to complexity of querying. Intuitive interface and apps are needed.

  9. Directions for development Semantic Depth Interface/Tools Target and reach out • The semantic depth will drive the richness of the services/tools, which can be built on the basis of the triples. It includes • Reuse of existing reference ontologies to express relationship among data assets • The publishing of Statistics metadata standard as URIs Linkage to other web semantic ressources • The encapsulation of expertise in knowledge graph to support data search and discovery. • Contribution to Semantic Web Schema.org and other ontology hubs • This dimension defines the different user groups served for which knowledge background is essential: • Data scientists • Government officials / policy analyst / data scientists • Data journalists • Redistributors / Multipliers /Prosumers / Semantic specialists • General public including students • This dimension defines the level of ambition to provide different tools to the end-users and the other participants in the ecosystem, in particular:. • Portals:to expose data catalogs and data e.g; EU Open Data Portal, Google Data Search, LOSD platform • SPARQL client and API • Search engines and data exploration interface • Dedicated Apps for Specific domains realizing data integration

  10. ESS “as is” – challenges and opportunities

  11. ESSnet LOSD

  12. Benchmarking ESS LOS maturity

  13. Readiness for LOD High awareness but low readiness for adoption Based on the NSIs’ self-assessment, the majority of them (85%) have medium or higher levels of awareness of LOD. However, the NSIs’ readiness to adopt LOD is lower, with 45% of them and only 10% of them having a medium and higher level of readiness respectively. INSEE (FR) and DZS (HR) are the NSIs that consider themselves more prepared for LOD adoption. Only 20% of the surveyed NSIs are actively participating in national LOD related activities and 10% of them are actively participating in international LOD activities. The lack of capacity and the prioritisation of other tasks consist common obstacles to LOD adoption across ESS. The general blocking factors of initiating an LOD project in these NSIs are mostly the lack of staff, the limited external interest or an incomplete perception of the user needs related to LOD, as well as the prioritisation of other activities.

  14. Most important potential benefits LOD will improve user experience and engage new user groups The key potential LOD benefits for the NSIs are: A better user experience due to improved discoverability of data, being selected by 80% of the surveyed NSIs. The opportunity to engage new user groups (researchers, data journalists, etc.) was selected by 75% of the surveyed NSIs. A more efficient knowledge and metadata management as well as increased reuse of datawas selected by 65% of the NSIs.

  15. The impact of the ESSnet LOS activities 5 out of the 20 surveyed NSIs are not familiar with ESSnet LOS activities The use cases ‘NSI Statisticians Publishing LOSD’ and ‘NSI Collaboration with third Party Partner for Developing Added-Value Services’ are more important for the NSIs than the use case ‘General Public Users Accessing LOSD’. From the tools for LOD created and used by the ESSnet, ESSnet recommendations are the one NSIs are most familiar with. The remaining tools for LOD are not well known by the NSIs, which responded.

  16. Conclusions from benchmarking LOD is still in an early adoption stage One key success criteria for the implementation of LOD in the ESS is fulfilled by senior management support. 60% of respondents reported that their senior management is positive towards LOD. While a number of NSIs report that their management is indifferent towards LOD, not a single NSI reported that their management would see LOD negatively. There is a skill gap that some NSIs will have to close to be able to provide more LOD offerings. At the point of this analysis LOD has not been adopted widely in the ESS. Of the responding countries only 20% have operational LOD implementations. The outlook shows that a further 20% are planning or currently developing LOD services. This aligns with the adoption of standards for LOD from URIs to metadata and data management policies – e.g. only four of the surveyed NSIs have implemented a URI policy so far. The NSIs primarily ask Eurostat to provide additional training and use cases. There is clearly a demand to justify investment by providing business cases. A recommendation will be to prepare LOD success stories, which can be used by NSIs to prioritise their efforts and help them to address the right audiences and markets with their LOSD offerings.

  17. Eurostat and ECOpportunity and challenges

  18. Eurostat test triple store hosted on cloud (amazon) Sparkle end point Jypiter notebooks (python) Eurostat as is - Infrastructure • SPARQL query endpoint • http://63.34.157.226:8890/sparql • Hosted on cloud server • Jupyter interactive computing notebook • SPARQL kernel to connecting to the SPARQL endpoint • SPARQLWrapper (https://github.com/RDFLib/sparqlwrapper) interface in Python. 45 datasets 13 dimensions 17 concepts (LFS, SILC, HOUSING COST) Sources (manual import) Dissemination database Metabase Dictionaires DBpedia and Wikipedia

  19. Use EC corporate platform (EU portal – Vocbench) Vocbenchfor metadata management

  20. Metadata • Eurostat's metadata ecosystem consists of several partially linked systems RAMON/CODED Statistics Explained ESS Metadata Handler dissemination metabase registry

  21. ESS metadata assets : classifications ~550 correspondence tables ~90 expressed in a common model ~170 classifications ~70 expressed in a common model (SDMX)

  22. ESS Metadataopportunities • Integrate (reorganise) different Eurostat/ESS (SDMX) medadataassets • Expose and maintain key metadata as URIs (using existing EC infrastructure : EU vocabularies portal) in parallel with SDMX webservices EU Vocabularies Eurostat SDMX Registry OP infrastructure (Triple Store) Enrichment via VocBench

  23. Availability of high-quality, curated URIs representing different classifications. They can be used as a linking hub for other Linked Open data sources (MS and beyond) Development of ontologies and schemas that can be used beyond official statistics for the description of data and metadata assets Expected Benefits

  24. Semantic : From Statistics Explained Semantic description of concepts https://ec.europa.eu/eurostat/statistics-explained related data sources structured expert domain knowledge organised in statistical glossary

  25. towards knowledge graph and natural language dicovery equivalised disposable income after social transfer at-risk-of-poverty threshold 60 % national median equivalised disposable income after social transfers indicator measure wealth or poverty low income comparison residents country low standard of living share people equivalised disposable income before social transfers below at-risk-of-poverty threshold after social transfers Bag of words Household budget survey Living conditions At risk of poverty gap At risk of poverty or social exclusion At-risk-of-poverty rate Persons living in households with low work intensity Ilc_li01 Material deprivation Relative median at-risk-of-poverty gap Disposable income Ilc_mdes03 Relative median income ratio Equivalised disposable income Income quintile share ratio

  26. Moving forward

  27. Lessonslearned Obviousbenefits for cooperation (givenlowmaturity and front runners) Focus on use case wherelinked technologies canmake a difference and tangible enefits Key enablers • URI policy and governance • LOD platform and services (pipeline) • Common referencemetadata / vocabularies • Semantic • Training

  28. Operational objectives taking benefit of LD opportunities and continue building ESS readiness to demonstrate and realize some of the key benefits of LD by developing minimum viable products for a limited number of uses cases in specific statistical domains making official statistics more discoverable, more visible and attractive to create gradually the conditions for normalizing and bridging the open data sets published by the ESS to continue increasing awareness and maturity with respect to LD in the ESS to increase the role and the visibility of official statistics in the semantic web community

  29. Proposedapproach a step-wise, agile and community-based approach To set up an (virtual) expert group • Coordination • Communication (newsletter) • Maintain an inclusive community with researchers and partners • Organize / contribute to events (semstats, boot camp, datathons, …) • Keep track community developments and identify opportunities • Guidance for the implementation of standards (Jason LD markup, STAT DCAT-AP, URIs…) • Supervise ESTP trainings and ESS maturity

  30. Proposedapproach To set up ad hoc collaboration on selected developments e.g. • focus on minimum viable product and PoC (apps, semantic search) • Dedicated Apps on Sustainable Development Indicators • Semantic search prototype (Information Dialogue) • Ontology and knowledge graph developments

  31. Proposedapproach Eurostat to • Publish and maintainURIs for key metadataassets and strive for makinglinkothersemanticresources • Secure ESTP trainings • Host community on CROS and sharedrepositories • Explore ways to maintain the ESS LOD platform (migration on EC servers, synergies with EDP, cooperation model)

More Related