330 likes | 475 Views
TWC LOGD: A Portal for Linking Open Government Data. Li Ding , Deborah L. McGuinness, Jim Hendler Tetherless World Constellation Rensselaer Polytechnic Institute Presented by Li Ding at Northwestern University Dec 1, 2010. The TWC LOGD Portal Highlights. Real World Data US, UK, China,…
E N D
TWC LOGD: A Portal for Linking Open Government Data Li Ding, Deborah L. McGuinness, Jim Hendler Tetherless World Constellation Rensselaer Polytechnic Institute Presented by Li Ding at Northwestern University Dec 1, 2010
The TWC LOGD Portal Highlights Real World Data • US, UK, China,… • Health, energy, economy Applied Semantic Web • Major partner of Data.gov • 8.5 billion triples in LOD End User Applications • Community Portal • Fast, Low-cost Mashups
Semantic Web Deployed at Data.gov http://www.data.gov/semantic
Data.gov and World-Wide Open Government Data Activities 4 data.gov online data.gov relaunch with semantic web featured January 1, 2009 “Openness will strengthen our democracy and promote efficiency and effectiveness in Government.” --- President Obama May 21, 2010 May 21, 2009 2009 2010 … January 19, 2010 June30,2009 Putting Government Data online • Many countries • US • UK • Australia • New Zealand • … data.gov.uk online
First anniversary of Data.gov Semantic Web and RDF logo showed up on the frontpage of the US Data.gov website
Semantic Web deployed at Data.gov: RDF data, SPARQL endpoint, semantic mashups
Government Adoption Process data.gov online SPARQL End Point & RDF data &Demos Replicated at Data.gov data.gov relaunch with semantic web featured May 21, 2009 May 21, 2010 May, 2010 New Application published by a team at DOE Oct, 2010 2009 2010 … Demos Tutorials Videos SPARQL Endpoint Data-gov Wiki @RPI online Two-day Mashathon in Washington DC July,2009 2009-2010 Aug, 2010 TWC LOGD Drupal Site announced Oct, 2010
The Largest Real World LOD Dataset http://logd.tw.rpi.edu/twc-logd
Categories of Data.gov Datasets • Statistical data about various aspect of society • Over 3000 Datasets
Raw Government Data Now Metadata in PDF Data in Excel
Enhancement: Linking Open Government Data PHSY_ST: state abbreviation ID: unique id cost: unit is million US dollars year: 1975-2008 Metadata (field definition) Metadata (value definition) Correlated dataset Complement dataset DS123:NY owl:sameAs
The Largest Real World LOD Dataset • 8.5+ billion triples from real world • 7500+ LOD links • Accessible via Data Browser, e.g. Tabulator
Consuming Linked Open Government Data http://logd.tw.rpi.edu/demos
LOGD Consumption Workflow LOGD Application UI Query Data Integrate Data Visualize Data JSON XML CSV SPARQL Query Format Data SPARQL Results dbpedia data.gov.uk TWC LOGD W W W
Mashing up LOGD Data Data.gov epa.gov CASTNET Ozone (CSV) CASTNET Site (CSV) 1 Convert raw dataset into linkable RDF 4 surf to EPA applications query multiple RDF dataset via SPARQL end point 2 3 drill down for details Visualization API Exhibit Data Mashup Web Application Mashup Visualization Mashup Created by Dominic DiFranzo, PhD student at RPI, http://www.data.gov/semantic/Castnet/html/exhibit
Smoking Prevalence vs. Tax, Policy …Extensible and accountable Mashups with NCI Trends in Smoking Prevalence, Tobacco Policy Coverage and Tobacco Prices (1991-2007) Extensible Mashups via Linked Data • Diverse datasets from NIH • Potentially linking to “unemployment rate” Accountable Mashups via Provenance • Annotate datasets used in demos • Feedback users’ comment to gov contact (e.g. %) Created by Li Ding, Tim Lebo, RPI, http://logd.tw.rpi.edu/project/popscigrid
Smoking Prevalence vs. Other Factors Integrating different sources for discovery Gov data provides knowledge for poplation science study [Spatial Mashup] Data.gov (Population) + NIH (Tobacco Tax, Smoking rate) Created by Sarah Magidson, U. Chicago.http://data-gov.tw.rpi.edu/demo/stable/tobacco-smoker/demo-state-10026-smoke-rate-statevarsapi.html
Linking GDP of the US and ChinaLinking international government data meaningfully GDP of the US (Billion Dollar) 8.3 6.3 2000 2010 GDP of China (Billion Chinese Yuan ) [Temporal Mashup] bea.gov + federalreserve.gov +stats.gov.cn Created by Li Ding, RPI, http://logd.tw.rpi.edu/demo/linking_us_and_chinas_gdp_data/
Semantic Search on LOGD datarich snippet in results Web Search (HTML) http://data-gov.tw.rpi.edu/ XHTML+RDFa Rich Snippet (RDFa) ARC2
Adding Social Factor to Mashups • Import socially contributed data, e.g. DBpedia • Let users contribute • links • feedbacks Other Social Web Apps Enhance* Import/export RDF Raw Data Publish* consume* User feedback
US Wildland Fire and BudgetLinking to Wikipedia (socially contributed) Category:Wildfires In The United States Budget on wildfire “DOI” and “USDA”(OMB) Wildland fire (NIFC) [Temporal Mashup] Data.gov (statistics+ budget) + Wikipedia (famous fires) Created by Li Ding, RPI, http://data-gov.tw.rpi.edu/demo/stable/demo-1187-40x-wildfire-budget.html
White House Visitor SearchLeveraging linked data (DBpedia & New York Times) NYTimes Wikipedia dbpedia:Barack_Obama Semantic Wiki “POTUS” The White House • [Person Mashup] Data.gov (statistics) + DBpedia (personal profiles)+ NYTimes (news) • [Technologies] Semantic MediaWiki, Google Visualization, IPad Apps available in Apple Store Created by Dominic DiFranzo, Evan Patton, RPI, http://data-gov.tw.rpi.edu/demo/stable/white-house-visitor/top100-visitees.php
USPS Spending and Newsgovernment data + User Feedbacks [Temporal Mashup] Data.gov (budget) + USPS + User Contributed News Created by Sarah Magidson, http://data-gov.tw.rpi.edu/demo/linked/demo-401-usps-news.html
Current Status of TWC LOGD http://data-gov.tw.rpi.edu => http://logd.tw.rpi.edu (Semantic MediaWiki) (Drupal + RDFa)
Website Statistics • 378,128 page hits • 28,481 visits • 16,041 visitors • 4126 cities • 34 countries Note: the above statistics are about http://data-gov.tw.rpi.edu. Dataset access not counted.
Data Abstraction and Versioning Conversion Layer LOGD (raw) LOGD (e1) … Version OGD (part1) Snapshot OGD (part2) Snapshot Data publishing stages … Source Dataset Table Record … high Levels of structural data granularity low
Provenance and Workflow Convert Access Enhance Version SemDiff derive derive create revision derive
Linking Open Source CommunityLinking semantic web with web developers • Social Semantic Web extensions/modules to popular CMS, e.g. Semantic Wiki, Drupal • Process/consume integrated gov data in a number of different ways: social networks, natural language technologies, workflows, search… Web n-grams
Education: Linked Tutorials, Demos… project dataset dcterms:relation logd:uses_dataset dcterms:source logd:uses_datasource demo dcterms:contributor source person dcterms:relation logd:uses_technology video technology dcterms:relation tutorial http://logd.tw.rpi.edu/tutorials
Summary of the TWC LOGD Portal http://logd.tw.rpi.edu Real World Data • 8.5+ billion triples • 400+ datasets • 10+ sources • Many domains Semantic Web Technology • completely open source • Demos/tutorials/videos Community and Users • partner of US government • open source community • education in university Beyond just dogfood; Linking Open Government Data Now!
The Team and Sponsors • Leaders • Jim Hendler • Deborah L. McGuinness • Li Ding • Members • Dominic DiFranzo • Sarah Magidson • James Michaelis • Alvaro Graves • Jin Guang Zheng • Xian Li • Gregory Todd Williams • Tim Lebo • Zhenning Shangguan • Devin Gaffney • Peter Coons • Adam Bell • William Cooper • Brian Zaik • Johanna Flores Government Sponsors DARPA NSF NASA IARPA NIH/NCI …