1 / 21

Data Wiki: A Semantic Web Approach to Government Data

Tetherless World. Data.gov Wiki: A Semantic Web Approach to Government Data. Li Ding, Dominic DiFranzo, Sarah Magidson, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness, Jim Hendler Tetherless World Constellation Nov 2, 2009. DATA GOV. Synergy.

ike
Download Presentation

Data Wiki: A Semantic Web Approach to Government Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tetherless World Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness, Jim Hendler Tetherless World Constellation Nov 2, 2009 DATA GOV

  2. Synergy • Government: data is out there “as is” • Loop: gov data and linked data • Loop: gov data and web developers • Loop: gov data and end users

  3. Government Data on the Web

  4. Objectives • Investigate the role of semantic web in producing, processing and utilizing government datasets • To enrich the value of data via normalizing, linking and information-extraction • To realize the value of data via applications, esp. visualization • To support web developers via machine friendly data access and web services

  5. Semantic Web Architecture for Government Data DATA-GOV View & Use Data Sem Wiki tagCloud Tabulator Google Viz MIT Exhibit RSS 1.0 Data Processors (Web Services & Analyzers) XSLT Service Diff Service Link Annotator RSS Generator … Link & Enrich Data SPARQL Web Service SPARQL End Point RDF/XML RDF/XML Linked Data GOV data (RDF) Convert Data DATA.GOV CSV XSL … Li Ding, Dominic DiFranzo, Sarah Magidson, and Jim Hendler · Tetherless World Constellation · Rensselaer Polytechnic Institute · Aug 7 2009 · http://data-gov.tw.rpi.edu/

  6. The Landscape

  7. The catalog data

  8. Data-gov Cloud (Aug 2009) DATA-GOV Energy and Utilities (#91) 2005-2007 ACS PUMS Population (#10) Residential Energy Consumption Survey Geography and Environment Population (#34) Worldwide M1+ Earthquakes past 7 days (#249) 2006 Toxics Release Inventory (#90) 2005-2007 ACS PUMS Housing (#191) 2005 Toxics Release Inventory (#397) 2007 Toxics Release Inventory Budget (#401) Budget Authority and offsetting receipts 1976-2014 (#9) CASTNET Visibility (#8) CASTNET Ozone (#402) Outlays and offsetting receipts 1962-2014 (#403) Governmental Receipts 1962-2014 (@10001) CASTNET sites Li Ding, Dominic DiFranzo, Sarah Magidson, and Jim Hendler · Tetherless World Constellation · Rensselaer Polytechnic Institute ·Aug 7 2009 · http://data-gov.tw.rpi.edu/

  9. Data-gov Cloud (Oct 2009) GeoNames LinkedData Community US agency US location LABOR-STAT (19xx-Present) Environment USAspending (2008-2010) Government EARTHQUAKE (Present) US-COMMUNITY (2005-2007) GOV-BUDGET (1962-2014) TOXIC-RELEASE (2005-2008) DATA-GOV-CATALOG (present) RECS code RECS (2005) CASTNET (1990 – Present) MED-COST (1994-2009) CASTNET sites Services PUBLIC-LIB (1992-2006) STATE-LIB (2006-2007) Li Ding and Jim Hendler · Tetherless World Constellation · Rensselaer Polytechnic Institute · Oct 2009 · http://data-gov.tw.rpi.edu/

  10. More statistics

  11. Demos

  12. Data.gov + epa.gov

  13. Gov Data + Corporate Data + User Data

  14. Computing Difference of Revisions

  15. More demos? • http://data-gov.tw.rpi.edu/wiki/demos

  16. Technical Issues

  17. Issues in Data.gov • Duplicated Datasets- Some datasets are part of another dataset • Dataset 140 (2005 Toxics Release Inventory data for the state of California (EPA)) is a subset of Dataset 191. • Formatting Issues - The format of some datasets is not friendly to machine processing. • Dataset 37 (Lower Colorado River Daily Average Water Elevations and Releases (US Bureau of Reclamation)). • Dataset 335 (National Longitudinal Surveys (US Bureau of Labor Statistics)) tells you how to order data from the government. • Access Point Issues - The access points are interactive webpage which is not friendly for machine access. • Dataset 330 (Local Area Unemployment Statistics (US Bureau of Labor Statistics) Sarah

  18. Linking Data • link similar datasets by reusing property namespace • link to rdfs:label (via rdfs:subPropertyOf) using semantic wiki • link to DBpedia (via owl:sameAs) using wikipedia widget • link instances (via common <property, literal-value> pair) • link government data with web data (via time and location) • link revisions of government data (via knowledge provenance)

  19. Semantic mapping: AI + CI Map to Wikipedia/DBpedia Name need manual disambiguation!

  20. RDF => SPARQL => Web • We use SPARQL to bridge Web devlopers and Semantic Web data. • A triple store is used to support handling multi-million triple RDF datasets

  21. Conclusion • semantic web enabled portal for linked government data • 5 billion triples from data.gov • hosts apps, demos & services • provide education services • integrates web users’ contributions

More Related