600 likes | 738 Views
healthdata.gov now and next challenges overview. hhs ocio , health datapalooza 2012. session agenda. now tools and features next target architecture challenges explanations in sequence. now – tools and features. Drupal publishing workflow and community engagement Solr
E N D
healthdata.govnow and next challenges overview hhsocio, health datapalooza 2012
session agenda • now • tools and features • next • target architecture • challenges • explanations in sequence
now – tools and features • Drupal • publishing workflow and community engagement • Solr • faceted search • CKAN • ‘on demand resources’ (RESTful API and feeds) • EC2 • powered by GovCloud • github.com/hhs • public repo’s coming soon!
publishing workbench • insert interesting workbench screenshot
community engagement • insert interesting community engagement screenshot • question and/or ideas example
hub.healthdata.gov/api/rest/dataset step 1: HTTP GET /dataset collection as JSON (GUID or name)
hub.healthdata.gov/api/rest/dataset/{name} step 2: HTTP GET each /dataset (as JSON, RDF/XML, or N3)
hub.healthdata.gov/api/search/dataset?q=medicare+costs JSON results for ‘medicare’and ‘costs’ search query
hub.healthdata.gov/feeds/dataset.atom atom feed for all datasets (including recent updates and changes)
hub.healthdata.gov/feeds/custom.atom?q=medicare+cost custom search query result atom feed (anything with ‘medicare+cost’)
next – target architecture • linked data • (closed) google knowledge graph • open health knowledge graph • integration framework • top down modeling • bottom up mapping • social curation
#gkg – (closed) ‘things, not strings’ “The Knowledge Graph helps us understand the relationships between things [… that are] linked in our graph. […] It’s not just a catalog of objects; it also models all these inter-relationships.” source
Linked Data Integration Framework GKG/Watson/Siri/… healthdata.gov PCAST DEAS HKG Variety Volume Velocity Health Data Actor
i2 challenges • two types • three domain specific • improve the integration and liquidity of data made available • four platform specific • enhance the capabilities of the technology components • 3 release rounds • sequenced to leverage dependencies • round 1: June through October 2102 • round 2: November 2012 through May 2013 • round 3: June through December 2013
round 1 challenges • June 2012 through October 2012 • domain specific • [1.1] cross domain and domain specific metadata • voluntary consensus standards organizations, defacto standards, other • platform specific • [1.2] Simplified Sign On (SSO) • WebIDidentity provider and relying parties, HDP infrastructure components • $35K: $20K 1st, $10K 2nd, $5K 3rdplace prizes
round 2 challenges • November 2012 through May 2013 • domain specific • [2.3] Mapping, Reconciliation and Correlation • structural variety, authoritative URI’s, linking heuristics • platform specific • [2.4] Faceted Browsing and Visualization • D3 (backbone, jQuery, etc.) • [2.5] Custom API • Linked Data API ‘configurator’ for dataset resources • each of these builds on [1.1] results
round 3 challenges • June 2013 through December 2013 • domain specific • [3.6] Correlating HHS and NHS Classifications • structural variety, authoritative URI’s, linking heuristics • platform specific • [3.7] Linked Data API based Data Element Access Services • ‘securing the data, not just the device’ • builds on [1.1], [1.2], and [2.5]
domain challenge [1.1] • Metadata • requests the application of existing voluntary consensus standards for metadata common to all open government data • and invites new designs for health domain specific metadata to classify datasets in our growing catalog, creating entities, attributes and relations • that form the foundations for better discovery, integration and liquidity. • 374 on challenge.gov
hub.healthdata.gov/dataset/hospice-medicare-cost-report-data.rdfhub.healthdata.gov/dataset/hospice-medicare-cost-report-data.rdf rdf/xml output uses dublin core and dcat metadata (mapping issues to work out, N3 output is incomplete, etc.)
https://github.com/HHS/hd2-ckan/blob/master/templates/package/read.rdfhttps://github.com/HHS/hd2-ckan/blob/master/templates/package/read.rdf ckan script that creates dc and dcat metadata tags / values (thanks @JoshData! public github repo soon :-)
W3C Data Cube – statistics refactor CQLD vocabs/data? start here and follow imports
W3C Provenance – change mgmt apply to CKAN /revisions
OMG BMM – business motivation image source
platform challenge [1.2] • WebIDbased SSO • will improve community engagement • by providing simplified sign on (SSO) for external users interacting across multiple HDP technology components, • making it easier for community collaborators to contribute, • leveraging new approaches to decentralized authentication. • 375 on challenge.gov
domain challenge [2.3] • Mapping, Reconciliation and Correlation • builds on the Metadata domain challenge [1.1] • begins by acknowledging disparate open government publishing practices • and seeks the demonstration of an innovative and automated solution for transforming semi-structured data into structured data, • reconciles decentralized distributions about the same data entity against the master identity of an authoritative source, • and correlates these master identities when multiple authoritative sources exist, • enabling the network effect by introducing strong identity resolution techniques that ease the ability to aggregate different data about the same entities from independent publishers.
platform challenge [2.4] • Faceted Browsing and Visualization • builds on the Metadata domain challenge [1.1] • uses the most popular browser based UI frameworks and libraries to realize novel exploration and discovery techniques for traversing large amounts of interrelated data, • contributing to a growing collection of open source widgets that make it easy for third parties to create new applications and embed health data in their content.
surfing the domain schemata no domain knowledge required to discover entities and relationships
agents construct e/r queries Siri, which {LA County} Hospitals have the best {Heart Attack} stats?