250 likes | 261 Views
Discover how YourDataStories uses data interlinking and visual exploration to promote transparency and fight corruption. This platform allows users to explore financial flows and analyze economic data, empowering them to make informed decisions.
E N D
YourDataStories: Transparency and CorruptionFighting through Data Interlinking and VisualExploration Georgios Petasis1, Anna Triantafillou2, Eric Karstens3 1 National Centre for Scientific Research (NCSR) “Demokritos”, Athens, Greece 2 Athens Technology Center (ATC), Athens, Greece 3 European Journalism Centre (EJC), Maastricht, The Netherlands
Overview • Motivation • The YourDataStories approach • Implementation • Evaluation • Conclusions and future directions
Open Data: Current Status • We have data! • Many datasets available • Many areas covered • Open data has the potential to: • spur economic innovation; • spur social transformation; and • to spur fresh forms of political and government accountability • But… heterogeneous & immature
Motivation • Economic open data: Current status • Diverse and Incompatible • Lack of standardization • Plethora of vocabularies in use • Lack of visibility of existing data • Citizens/users do not seem to be attracted by the existing solutions and tools • With great potential! • We need tools andinfrastructure to alleviatethese problems
YourDataStories Solution • A platform for data exploration focused on the financial flows • that are critical for transparency, collaboration, participation
Our Approach (1) • The starting point is a triple store • We assume that economic data have been already crawled, cleaned, converted to RDF following a common ontology and stored in a triple-store • A SPARQL endpoint is analysed • Through a set of predefined queries • Aiming to retrieve the underlying data model • Classes, properties, types, cardinality • Analysis at the RDFS level
Our Approach (2) • The result of the analysis is a set of graphs • “Top” nodes • Representing classes • Nodes representing • Properties • Data types • Nodes have: • Scale • Role • Cardinality
Our Approach (3) • The “top” concepts are heuristically identified • Using information like graph centrality • The information contained in graphs is used to automatically support operations • Data selection • Data visualisation • Analytics
Data Selection (1) • Graphs are used to extract an indexing schema • Currently Apache Solr is supported • Instances of top concepts are converted into JSON-LD objects and indexed • JSON-LD objects can contain instances from non-top concepts • The YDS “advanced search” application is configured • For querying indexed resources
Data Visualisation (1) • Graphs are used to extract visualisation information • What properties can be used as x/y axes, in which plot types • The YDS “Workbench” application is configured • For generating custom plots of various types
Architecture Input Data Model Analysis SPARQL ModelSpecification Views Search Configuration SPARQL Cache(MySQL) JSON-LD API JSON Cache (MySQL) Web Applications Analytics Developers … Search
Evaluation (1) • The proposed approach has been used to create a set of applications • On some preselected datasets • Several development cycles were evaluated • More than 60 users • Primarily journalists, public sector employees, representatives of NGOs and business • Two scenarios: • Complete half-open scenarios • Explore the solution on their own
Evaluation (2) • Evaluation Results
Conclusions and Future Work • Evaluation results suggest that: • Non-experts can access and analyse data with minimum time and effort • Integrating data from different sources, along with the powerful navigation and visualisations, enables insights that cannot be gleaned from original sources • Test our approach on more datasets • On domains other than economic data • Assess the development of analytics for a new domain
http://www.yourdatastories.euhttp://platform.yourdatastories.euhttp://www.yourdatastories.euhttp://platform.yourdatastories.eu Thank you!