290 likes | 302 Views
LOD activities in Eurostat. As is and lessons learned Final LOSD ESSnet workshop Sofia 27-28 May 2019. Outline of the presentation. Context General timeline Context LOD development framework Eurostat « as is » and short term plans Infrastructure Data and metadata
E N D
LOD activities in Eurostat As is and lessonslearned Final LOSD ESSnet workshop Sofia 27-28 May 2019
Outline of the presentation • Context • General timeline • Context • LOD developmentframework • Eurostat « as is » and short term plans • Infrastructure • Data and metadata • Semantic • ESS « as is » and short term plan • Awareness & Maturity • Take up and institutional support • Conclusions (wayforward) Eurostat “as is” and next moves ESS “as is” and next moves Conclusions and way forward Context
Eurostat LOD Timeline • Eurostat data sets catalogue published on EU data portal • NUTS as URIs (EU data portal) • First ESS benchmark and framework for deployment • LOD as an action strand of project DIGICOM • LOD Beyond 2020 • End of project report • First Eurostat pilot contract : basic infrastructure • Architecture definition (motivational aspects, building blocks and maturity dimensions) • Launching ESSnet • RDF will be supported by new dissemination API • Second Eurostat pilots : prototyping end user products, semanticsthickening • ESS metadata architecture projects • Benchmark ESS LOD maturity / attraction Early days 2018 2019 2020 Drivers Drivers Drivers • ESS vision 2020 objectives • Explore landscape • Build capability and awareness • Deploy • Realize benefits • Define strategy and actions plans • Assess feasibility • Assess benefits • Build capability
Context– technology • The LOD standards support a human-centric way of exploring data sets by basing itself on triples (subject, object, predicate) known from natural languages. • Publishing as linked-data offers a flexible, non-proprietary, machine programmable means for providing advanced and dynamic querying and visualisation capabilities • The statistical data and metadata, based on RDF standards, become (web) addressable. This allows publishers and third parties to annotate and link to this data/metadata. • EU open data portal supports DCAT-AP specifications • Data exposed as LOD can be flexibly combined across datasets. The statistical data becomes an integral part of the broader web of linked data. • First experiences show that full LOD standards take up on the web of data is relatively modest. Most of the effort pertains to public administration and research community. Lighter standard like Json-LD and schema.org are more widely used with shorter term benefits • Among the different target personas, the general public hardly reap the benefits of LOD due to complexity of querying. Intuitive interface and apps are needed.
Context – motivation • Eurostat open data in its present form are not totally satisfactory from user perspective. Rich metadata assets cannot be mobilised to support data search and exploration. Linked data standards provide a framework to make these metadata active in the user experience • Official EU statistics released can be complemented by a wealth of data collected at NSI level and local government level with higher granularity and dimensionality • Linked data providesa standard and flexible wayto bridge official statisticsdata sets distributed over the web. • Linked open data standard are a way to increasevisibility of official statistics on the Web of Data
Directionsfor development Semantic Depth Interface/Tools Target and reach out • The semantic depth defines the level of ambition for making metadata available in Linked Open Data stores. • The semantic depth will drive the richness of the services/tools, which can be built on the basis of the triples. • DCAT profiles for cataloguing data • Provide a unified access to meta data (code lists, and statistics explained meta data) • Knowledge graph to support data search and discovery. • Contribution to Semantic Web Schema.org and other ontology hubs • Linkage to other web semantic ressources • This dimension defines the different user groups served for which knowledge background is essential: • Data scientists • Government officials / policy analyst / data scientists • Data journalists • Redistributors / Multipliers /Prosumers / Semantic specialists • General public including students • This dimension defines the level of ambition for the ESS community to provide different tools to the end-users and the other participants in the ecosystem, in particular:. • Portal: EU Open Data Portal, Google Data Search, LOSD platform • SPARQL client • Search engines and data exploration interface • Dedicated Apps for Specific domains realizing data integrations
Eurostat test triple store hosted on cloud (amazon) Sparkle end point Jypiter notebooks (python) Eurostat as is - Infrastructure • SPARQL query endpoint • http://63.34.157.226:8890/sparql • Hosted on cloud server • Jupyter interactive computing notebook • SPARQL kernel to connecting to the SPARQL endpoint • SPARQLWrapper (https://github.com/RDFLib/sparqlwrapper) interface in Python.
Use EC corporate platform (EU portal) Infrastructure next moves Vocbenchfor metadata managment Migration of LOSD platform on cloud
Data layer – 146888701 triplets, 45 datasets • 13 dimensions • 17 concepts (LFS, SILC, HOUSING COST) • Sources (manual import) • Dissemination database • Metabase • Dictionaires • DBpedia and Wikipedia Data and metadata: as is
Sustainable Development Goals Indicators • Concept definition and classification (RDF / URI’sing CODED and RAMON) Data and metadata : next move
Semanticnext moves • Integrate (reoganise) different Eurostat/ESS (SDMX) medadata assets • Expose and maintain keymetadata as URIs (using existing EC infrastructure : EU data portal) in parallel with SDMX webservices RAMON/CODED Statistics Explained ESS Metadata Handler dissemination metabase registry
Semantic next moves : Linking statistical concepts… Semantic description of concepts https://ec.europa.eu/eurostat/statistics-explained related data sources structured expert domain knowledge organised in statistical glossary
towards knowledge graph equivalised disposable income after social transfer at-risk-of-poverty threshold 60 % national median equivalised disposable income after social transfers indicator measure wealth or poverty low income comparison residents country low standard of living share people equivalised disposable income before social transfers below at-risk-of-poverty threshold after social transfers Bag of words Household budget survey Living conditions At risk of poverty gap At risk of poverty or social exclusion At-risk-of-poverty rate Persons living in households with low work intensity Ilc_li01 Material deprivation Relative median at-risk-of-poverty gap Disposable income Ilc_mdes03 Relative median income ratio Equivalised disposable income Income quintile share ratio
Semantic: nextmoves • Link to external web semanticassets • Publish standard definitioin and codelist • Contribute / develop official statisticsknowledge graph • Semanticsearch (NLP)
Benchmarking ESS LOS maturity With a view of creating an meaningful strategy and action plan for developing LOD capabilities in the ESS, Eurostat has re-assessed the level of maturity and awareness of the NSIs in the area of LOD, their readiness for Linked Open Data, as well as the impact of the recent LOD activities. A LOD survey, targeting managers of LOD related projects, heads of IT or communication departments as well as methodology and strategy leaders in the NSIs, was sent out to all ESS member states. In total, 21 responses were received from 19 MSs (closing date was 03/05/2019).
Most important potential benefits LOD will improve user experience and engage new user groups The key potential LOD benefits for the NSIs are: A better user experience due to improved discoverability of data, being selected by 80% of the surveyed NSIs. The opportunity to engage new user groups (researchers, data journalists, etc.) was selected by 75% of the surveyed NSIs. A more efficient knowledge and metadata management as well as increased reuse of datawas selected by 65% of the NSIs.
The impact of the ESSnet LOS activities 5 out of the 20 surveyed NSIs are not familiar with ESSnet LOS activities The use cases ‘NSI Statisticians Publishing LOSD’ and ‘NSI Collaboration with third Party Partner for Developing Added-Value Services’ are more important for the NSIs than the use case ‘General Public Users Accessing LOSD’. From the tools for LOD created and used by the ESSnet, ESSnet recommendations are the one NSIs are most familiar with. The remaining tools for LOD are not well known by the NSIs, which responded.
LOD adoption across ESS The majority of NSI has not adopted LOD yet A roadmap for the adoption of LOD exists only in 20% of the surveyed NSIs, but 35% of them are planning to create one. 45% of the surveyed NSIs do not have ongoing LOD projects or plans to carry out LOD projects in the future.
Overview of LOD projects A group of ‘early adopter’ NSIs have ongoing projects Only 20%of the surveyed NSIs have operational projects using LOD technologies. NSIs having been characterised as LOD pioneers before the launch of the ESSnet (INSEE (FR), ISTAT (IT) and ONS (UK)) together with SF (FI) are the ones which reported operational LOD projects. In addition to the above, CBS (NL), GUS (PL) and NSI-BG (BG) have pilot projects. DESTATIS (DE), DZS (HR), SORS (SK) and SURS (SI) have already planned LOD projects.
Scope of LOD projects Geospatial data is most widely used in LOD use cases The combination of statistical data with geospatial data is the common focus of many projects. A number of projects aims to link only statistical data (e.g. multi-domain statistical data) and a few NSIs (INSEE, SF) have projects focusing on linked metadata (definitions, classifications, etc.). The statistical data (linked or to be linked) in the context of these projects are mostly from the Geo and Census domains (in 64% and in 55% of the NSIs respectively). More experienced NSIs extend the scope of their projects to include data from other domains, besides the ones listed in the below table. An example is INSEE, which is experimenting with data from the Tourism and the SBS domains.
Data and metadata prerequisites for LOD adoption Defined URI policy, vocabularies and ontologies for statistical data Half of the NSIs have adopted or are going to adopt a URI policy, the other half has not done or planned it at moment. 25% NSIs have defined/adopted vocabularies and 50% plan on doing it. 65% of the NSIs have not defined/adopted ontologies nor planned to do so. Only INSEE has defined/adopted ontologies.
Organisational prerequisites Positive position of senior management, but a potential skill gap The majority of senior management is positive towards LOD, 30% is indifferent. Knowledge about standards used for LOD, programming skills and modelling skills are available in more than half of the NSIs. The experience with Semantic Web technologies is limited.
Readiness for LOD High awareness but low readiness for adoption Based on the NSIs’ self-assessment, the majority of them (85%) have medium or higher levels of awareness of LOD. However, the NSIs’ readiness to adopt LOD is lower, with 45% of them and only 10% of them having a medium and higher level of readiness respectively. INSEE (FR) and DZS (HR) are the NSIs that consider themselves more prepared for LOD adoption. Only 20% of the surveyed NSIs are actively participating in national LOD related activities and 10% of them are actively participating in international LOD activities. The lack of capacity and the prioritisation of other tasks consist common obstacles to LOD adoption across ESS. The general blocking factors of initiating an LOD project in these NSIs are mostly the lack of staff, the limited external interest or an incomplete perception of the user needs related to LOD, as well as the prioritisation of other activities.
Initial conclusions from benchmarking LOD is still in an early adoption stage One key success criteria for the implementation of LOD in the ESS is fulfilled by senior management support. 60% of respondents reported that their senior management is positive towards LOD. While a number of NSIs report that their management is indifferent towards LOD, not a single NSI reported that their management would see LOD negatively. There is a skill gap that some NSIs will have to close to be able to provide more LOD offerings. At the point of this analysis LOD has not been adopted widely in the ESS. Of the responding countries only 20% have operational LOD implementations. The outlook shows that a further 20% are planning or currently developing LOD services. This aligns with the adoption of standards for LOD from URIs to metadata and data management policies – e.g. only four of the surveyed NSIs have implemented a URI policy so far. The NSIs primarily ask Eurostat to provide additional training and use cases. There is clearly a demand to justify investment by providing business cases. A recommendation will be to prepare LOD success stories, which can be used by NSIs to prioritise their efforts and help them to address the right audiences and markets with their LOSD offerings.
Lessonslearned and wayforward Obviousbenefits for cooperation (givenlowmaturity and front runners) Select one ESS use case wherelinked technologies canmake a difference to getbuy in from management Identify quick winswith tangible benefits (stepwiseapproach) Key enablers • URI policy • LOD platform and service • Common referencemetadata / vocabularies • Semanticdepth