YourDataStories: Transparency & Corruption Fighting through Data Interlinking & Visual Exploration

YourDataStories: Transparency and CorruptionFighting through Data Interlinking and VisualExploration Georgios Petasis1, Anna Triantafillou2, Eric Karstens3 1 National Centre for Scientific Research (NCSR) “Demokritos”, Athens, Greece 2 Athens Technology Center (ATC), Athens, Greece 3 European Journalism Centre (EJC), Maastricht, The Netherlands

Overview • Motivation • The YourDataStories approach • Implementation • Evaluation • Conclusions and future directions

Open Data

Open Data…

Open Data: Current Status

Open Data: Current Status • We have data! • Many datasets available • Many areas covered • Open data has the potential to: • spur economic innovation; • spur social transformation; and • to spur fresh forms of political and government accountability • But… heterogeneous & immature

Motivation • Economic open data: Current status • Diverse and Incompatible • Lack of standardization • Plethora of vocabularies in use • Lack of visibility of existing data • Citizens/users do not seem to be attracted by the existing solutions and tools • With great potential! • We need tools andinfrastructure to alleviatethese problems

YourDataStories Solution • A platform for data exploration focused on the financial flows • that are critical for transparency, collaboration, participation

Our Approach (1) • The starting point is a triple store • We assume that economic data have been already crawled, cleaned, converted to RDF following a common ontology and stored in a triple-store • A SPARQL endpoint is analysed • Through a set of predefined queries • Aiming to retrieve the underlying data model • Classes, properties, types, cardinality • Analysis at the RDFS level

Our Approach (2) • The result of the analysis is a set of graphs • “Top” nodes • Representing classes • Nodes representing • Properties • Data types • Nodes have: • Scale • Role • Cardinality

Our Approach (3) • The “top” concepts are heuristically identified • Using information like graph centrality • The information contained in graphs is used to automatically support operations • Data selection • Data visualisation • Analytics

Data Selection (1) • Graphs are used to extract an indexing schema • Currently Apache Solr is supported • Instances of top concepts are converted into JSON-LD objects and indexed • JSON-LD objects can contain instances from non-top concepts • The YDS “advanced search” application is configured • For querying indexed resources

Data Selection (2)

Data Visualisation (1) • Graphs are used to extract visualisation information • What properties can be used as x/y axes, in which plot types • The YDS “Workbench” application is configured • For generating custom plots of various types

Data Visualisation (2)

Architecture Input Data Model Analysis SPARQL ModelSpecification Views Search Configuration SPARQL Cache(MySQL) JSON-LD API JSON Cache (MySQL) Web Applications Analytics Developers … Search

Components and Applications

Evaluation (1) • The proposed approach has been used to create a set of applications • On some preselected datasets • Several development cycles were evaluated • More than 60 users • Primarily journalists, public sector employees, representatives of NGOs and business • Two scenarios: • Complete half-open scenarios • Explore the solution on their own

Evaluation (2) • Evaluation Results

Conclusions and Future Work • Evaluation results suggest that: • Non-experts can access and analyse data with minimum time and effort • Integrating data from different sources, along with the powerful navigation and visualisations, enables insights that cannot be gleaned from original sources • Test our approach on more datasets • On domains other than economic data • Assess the development of analytics for a new domain

http://www.yourdatastories.euhttp://platform.yourdatastories.euhttp://www.yourdatastories.euhttp://platform.yourdatastories.eu Thank you!

YourDataStories: Transparency & Corruption Fighting through Data Interlinking & Visual Exploration

YourDataStories: Transparency & Corruption Fighting through Data Interlinking & Visual Exploration

Presentation Transcript

1. 2. 3.

. :, :1; 2; 3

1 : 2: 3:

3 1-2-1 :

Zimmerman1989,1990 : 1., 2. 3. ,Zimmerman1994 : 1. 2., 3.

,,: 1-1 :, 1-2 :, 2-1 :,, 2-2 :, 2-3 :,

1. 2. 3.

1 2 3

Mica Estrada-Hollenbeck 1 Anna Woodcock 2 David Morella 3 Wesley Schultz 1

0.25, 1, 1 0, 3, 8 1, 3/2, 2 1/2, 2, 3

Eric Van Cutsem, 1 Cathy Eng, 2 Josep Tabernero, 3 Elzbieta Nowara, 4 Anna Świeboda-Sadlej, 5

Judy Triantafillou

1 2 3

{… , – 3, – 2, – 1, 0, 1, 2, 3, …}

LASE 2 Anna mun olla-1

Eric Dombrowsky 1 , Andreas Schiller 2 , Kirsten Wilmer-Becker 3