140 likes | 238 Views
Introduction to Open Data A generic approach. Iraklis Varlamis Harokopio University of Athens varlamis@hua.gr. s upported by:. Open Data. Open Data flow. A growing trend among scholars, government bodies and organizations to share data outputs, codebooks and software.
E N D
Introduction to Open Data A generic approach • IraklisVarlamis • Harokopio University of Athens • varlamis@hua.gr 2nd SemaGrow Hackathon (in conjunction with IRSS14) supported by:
Open Data Open Data flow A growing trend among scholars, government bodies and organizations to share data outputs, codebooks and software. 2nd SemaGrowHackathon (in conjunction with IRSS14) Publish data in a machine readable format!
Open data value Open Data lifecycle 2nd SemaGrow Hackathon (in conjunction with IRSS14) Publish Publish data and keep them updated!
Increase open data value Organization C Agency A Collect & Aggregate Data repository Agency B 2nd SemaGrow Hackathon (in conjunction with IRSS14) Organization D Serve through a single endpoint Aggregate & combine data!
Data aggregation issues Speak the same language! • Different sources use different notation • Data from multiple sources may be inconsistent • Each source may use different identifier for the same concept • Concept descriptions may differ or even contradict • We need a common way to describe data • We need common data description schemata • It is good to have an ontology in order to validate data 2nd SemaGrow Hackathon (in conjunction with IRSS14)
Common way to describe data • Resource Description Framework (RDF) • A data model for metadata (similar to E-R or Relational model) • Each concept • is a resource (subject) • has several aspects (predicates) • and values for these aspects (objects) • Data expressed as graphs • Resources are identified (URI) • Values are either simple or URI 2nd SemaGrow Hackathon (in conjunction with IRSS14) Data aggregation merge graphs on the common nodes (URIs)
Common data description schemas • Let’s agree on the predicates • We need machine readable ontologies, taxonomies or vocabularies • FOAF (Friend of a Friend): Agent, Person, name, title, familyName, givenName, knows etc. • DC (Dublin Core Schema): Title, Creator, Subject, Description, Publisher, Contributor etc. • Socially Interconnected Online Communities (SIOC) 2nd SemaGrow Hackathon (in conjunction with IRSS14)
Query Open Data Every endpoint is a database Query the databases and Aggregate query results (RDF tripplets – edges from the graphs) 2nd SemaGrow Hackathon (in conjunction with IRSS14) SPARQL query SPARQL query SPARQL query Query endpoints and merge results
In a real world • Most organizations “publish” data in their web sites • Unformated or semi-formated data (HTML, PDF) • Data scrapping is needed • Some of them publish data in machine readable format • xls, xml files • Only a few offer APIs 2nd SemaGrow Hackathon (in conjunction with IRSS14)
In Greece Data.gov.gr – Public Data Catalog Openarchives.gr – Greek publications Statisitcs.gr - (Hellenic Statistical Authority) Geodata.gov.gr – Public geospatial data opengeodata.gr – Open geospatial data astynomia.gr/opendata/ - Accident related data Other Datasets: Wikipedia, Europeana, GeoNames, WikiTravel, LinkedGeoData, YAGO2s, Freebase, FactForgeetc datasets@eellak.gr: 1)https://docs.google.com/spreadsheets/d/1X9qFojnUbk1RkFWQ8653n2IxjjRewtCcEPScfNAyrqU/edit#gid=02) http://mycontent.ellak.gr/?s=datasets&x=0&y=0 2nd SemaGrow Hackathon (in conjunction with IRSS14)
More APIs Open data cloud: www.opendatacloud.gr/ Data Extraction Tool: deixto.com/ Open data portal: open-data.okfn.gr/ PORTALS Registry for Research Data Repositories: www.re3data.org/ EU Open Data portal: https://open-data.europa.eu/en/data/ World Bank data: http://data.worldbank.org/ http://publicdata.eu/ http://oad.simmons.edu/oadwiki/Data_repositories 2nd SemaGrow Hackathon (in conjunction with IRSS14)
Roadmap 2nd SemaGrowHackathon (in conjunction with IRSS14)
Contribute at all levels! 2nd SemaGrowHackathon (in conjunction with IRSS14) Source: http://www.lorax.gr/ Source: https://www.peterkrantz.com/2012/publishing-open-data-api-design/
Thank you!Questions? 2nd SemaGrow Hackathon (in conjunction with IRSS14)