270 likes | 376 Views
<Panel: The Art & Science of Data Visualization>. First they have to find it: Getting Government Data Discovered and Used. Adapted from: John S. Erickson, Ph.D. Tetherless World Constellation Rensselaer Polytechnic Institute Troy, New York, USA Twitter: @ olyerickson #TWCRPI.
E N D
<Panel: The Art & Science of Data Visualization> First they have to find it: Getting Government Data Discovered and Used Adapted from: John S. Erickson, Ph.D. Tetherless World Constellation Rensselaer Polytechnic Institute Troy, New York, USA Twitter: @olyerickson #TWCRPI
Open Government Data Around the World Starting with efforts in the US and UK, governments around the world have recognized the need to publish their critical data Percent of total collection (from 1M+ datasets) 2
Diverse Approaches to Open Gov't Data Government data initiatives have taken many forms GovData portals are widely varied in how they help users discover and userelevant datasets Percent of total catalogs(from 192 catalogs) 3
Federated Discovery of Government Data Stakeholders have seenthe need forFederated discoveryacross catalogs, especially from withinmajor search enginesincludingBing, Google, Yahoo!and Yandex 4
Government Data in the linked open data cloud Government Data is currently over ½ the cloud in size (~17B triples), 10s of thousands of links to other data (within and without) http://linkeddata.org/
Linked Data is Not Enough... Publishing open government data as Linked Data is not enough For OGD to be useful, datasets must be published using metadata, markup standards and presentation that aid discovery and use 6
Linked Data is Not Enough... Publishing open government data as Linked Data is not enough For OGD to be useful, datasets must be published using metadata, markup standards and presentation that aid discovery and use 7
Dataset Metadata for Discovery and Use Recent work at TWC RPI demonstrates the value of applying emerging standards for uniformly describing government datasetsand catalogs 8
International Open Government Dataset Search TWC's IOGDS application is an aggregated catalog of more than 1M datasets from over 192 dataset catalogs from governments at every level around the world See: http://logd.tw.rpi.edu 9
International Open Government Dataset Search Anticipates W3C DCAT RDF vocabulary Demos what a comprehensive federated catalog based on DCAT and aggregation API might look like 10
International Open Government Dataset Search IOGDS is a multi-year effort based on downloading, scraping or accessing APIs, converting metadata to a proto-DCAT model, and publishing via endpoint and download Catalogs IOGDS Workflow API ad hoccode Download IODGS CSV Csv2rdf4lod automation Web Per-sitescrapercode Web Web See: http://logd.tw.rpi.edu 11 11
Schema.org: Semantic Markup for Discovery TWC RPI has published dataset listings based on IOGDS using emerging microdata standards, esp. schema.org model endorsed by Bing, Google, Yahoo!, Yandex... 12
Schema.org datasets extension TWC RPI's schema.org datasetextension will enable government dataset catalogs to more easily be parsed and indexed by the major search engines... ...which will help users find relevant datasets! TWC's dataset extension entered public discussion June 2012 13
Schema.org datasets extension The schema.org datasets extension enables relevant datasets to be more easily discovered by a range of stakeholders including researchers, data journalists, bloggers and developers 14
Schema.org datasets extension “...we've reviewed the current datasets schema proposal in draft, and we are comfortable with the current state of things... “...At this point, if the group would solidify on the dataset proposal, then Data.gov would support and use it. ---Chris Musialek 15
CKAN Data Catalog Scheme & Protocol API-based catalog federation is also possible ckan announced DCAT-based query/federation API enables OAI-PMH-like harvesting and more 16
Demo/ links http://www.w3.org/wiki/WebSchemas/Datasets http://www.w3.org/wiki/WebSchemas/SchemaDotOrgProposals Good introduction (longer/ with more context): http://www.slideshare.net/joshsh/semantic-markup-using-schemaorg
Examples of current schema.org results http://schema-creator.org/event.php http://schema-creator.org/product.php
To do… • Get Google, Bing, Yahoo, … to crawl these pages • It might look like this: http://www.google.com/publicdata/directory
From Jim Hendler: • Google is now building custom search engines that will pull down schema.org • Dan Brickley is working on one from the Dataset schema, not yet public • There's also an open govt data search – not much in it, but looks nice – it's at http://www.google.com/publicdata/directory
Retrieve all the logd datasets: • PREFIX dgtwc: <http://data-gov.tw.rpi.edu/2009/data-gov-twc.rdf#> • PREFIX conv: <http://purl.org/twc/vocab/conversion/> • PREFIX void: <http://rdfs.org/ns/void#> • PREFIX dcterms: <http://purl.org/dc/terms/> • SELECT DISTINCT ?dataset ?catalog ?catalog_id ?title ?desc ?country ?homepage ?agency_id ?contributor_id WHERE { • ?dataset a conv:CatalogedDataset . • ?dataset void:inDataset ?catalog . • ?catalog dcterms:identifier ?catalog_id . • ?dataset <http://purl.org/dc/terms/title> ?title . • ?dataset dcterms:description ?desc . • OPTIONAL { • ?dataset dgtwc:catalog_country ?country . • } • OPTIONAL { • ?dataset <http://xmlns.com/foaf/0.1/homepage> ?homepage . • } • OPTIONAL { • ?dataset dgtwc:agency ?agency . • ?agency dcterms:identifier ?agency_id . • } • OPTIONAL { • ?dataset <http://purl.org/dc/terms/contributor> ?contributor . • ?contributor dcterms:identifier ?contributor_id . • } • #?dataset dgtwc:catalog_country <http://dbpedia.org/resource/United_States> . • } Courtesy: Josh Shinavier (RPI/TWC)
A large number of datasets: http://logd.tw.rpi.edu/schemaorg_dataset_extension http://www.google.com/webmasters/tools/richsnippets?url=http://logd.tw.rpi.edu/schemaorg_dataset_extension&view=
http://logd.tw.rpi.edu/page/international_dataset_catalog_searchhttp://logd.tw.rpi.edu/page/international_dataset_catalog_search
Latest from Josh: • Datasets-as-Linked-Data demo. The RDFa in the pages is not only correct w.r.t. schema.org but is also presented in such a way that an RDFa-aware Linked Data crawler can hop from datasets to catalogs, back again, into DBpedia, etc. while gathering the RDFa as linked RDF. • Since we now have Datasets-ish RDFa markup in the main IOGDS dataset pages (i.e. the pages which the URIs of the datasets redirect to), we're pretty close to a completely integrated demo. • What remains: (1) the current markup has some problems. We need to fix those; (2) we need markup for catalogs as well as datasets…
Needed (1) and (2): • To fix (1), we need to make changes to the LODSPeaKr templates that automatically generate those pages, to make them compliant with the model Josh developed. • To fix (2), we'll work with Alvaro (Graves) to create LODSPeaKr-based automation to generate catalog pages in an efficient way. • (2) presents more of a challenge than (1) at this point, since the IOGDS implementation of dataset details pages is mostly correct at this point. • Still need Dan B. to assist with getting them found…
What we need: • Willingness to adopt the dataset schema extension – we need lots of datasets to start showing up • We (TWC) will be pushing out some tools, more demos and how-tos, very soon • Wanna play? http://wiki.esipfed.org/index.php/DatasetSchema