220 likes | 315 Views
3 Round Stones: All Content As Big Data. Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ March 15, 2013 http://semanticommunity.info/3_Round_Stones.
E N D
3 Round Stones:All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ March 15, 2013 http://semanticommunity.info/3_Round_Stones
Awarded Top Semantic Technology Startup http://semanticweb.com/3-round-stones-named-%E2%80%9Ctop-semantic-technology-start-up%E2%80%9D-at-semantic-tech-business-conference_b29646
Linked Data Book by David Wood, et al http://www.meetup.com/Northern-Virginia-Semantic-Web-Meetup/events/104544852/
Current US GovernmentSemantic Web Strategy • Data.gov Advocates RDFa 1.1 Lite for Semantic Web Strategy. • See Comment From Owen Ambur on Next Slide. • I believe there is a better way to handle this that I showed the W3C eGov Special Interest Group on January 21st and have recommended for the reintroduction of the Data Act to the 113th Congress. • Create a Semantic Index of Strong Relationships (SR) in RDF Format in a Spreadsheet. • See next slide for example (spreadsheet and words) • Integrate That With Other Spreadsheets and Relational Databases in An Interoperability Interface (e.g. Dashboard) That Can Searched. • Essentially: • Computer Scientists Use RD2RDF (James Hendler) • Data Scientists Use SR2Excel2RDF (Brand Niemann)
Comment From Owen Ambur • OMB's official guidance to agencies on implementation of section 10 of the GPRA Modernization Act (GPRAMA) says they may use XML, JSON, spreadsheets or CSVs in order to meet the requirement to publish their strategic and performance plans and reports in machine-readable format... but not PDF or HTML -- at least not without "enhanced structural elements".[1] • I couldn't help but chuckle at how [1] is a PDF. I get your pointhowever, which I think reinforces mine, that there is no US federalpolicy that prefers RDFa 1.1 over HTML Microdata for publishingmetadata in HTML. • [1]RDFa Lite 1.1, W3C Recommendation, June 7, 2012, Manu Sporny, editor, see http://www.w3.org/TR/rdfa-lite/ • Source: Owen Ambur, December 18, 2012, W3C eGov Mailing List. My Note: Former, Co-Chair of the Federal XML Working Group.
International Linked Open Data Strategy:Linked Open Data Cloud Data My Question: Is it easy to add columns for who links to who? Answer: Not in a single table. SPARQL can't do cross-tabulation (Richard Cyganiak). http://semanticommunity.info/@api/deki/files/8824/=VIVO.xlsx
International Linked Open Data:Comments to David Wood • The Linked Open Data Cloud is not actually “linked data”. • RDF at Data.gov is not linked data. • The analytical and statistical communities view Data.gov and Linked Open Data as “IT projects”. • Former Census Bureau Director Robert Groves. • Conventional tools can do linked data and data integration. • Spotfire Information Designer, Informatica, Information Builders, etc. http://manning.com/dwood/LinkedData_MEAP_ch1.pdf http://semanticommunity.info/AOL_Government/Exploiting_Linked_Data_with_BI_Tools
Our Semantic Web Strategy for Data:Simple Explanation • One Table: • Two Columns • Example: Column 1: Section and Column 2: URL • Note: A Column 3: Description could be in the URL • Example: See Slide 18 • Three Columns: • Example: Column 1: Subject, Column 2: Object, and Column 3: Predicate • Note: This is the Semantic Web’s Linked Open Data Cloud as Linked Open Data for Network Analytics! • Example: See Slide 18 • Four Columns: • Examples: Column 1: Subject, Column 2: Attribute, Column 3: From, and Column 4: To, or Column 1: City, Column 2: Country, Column 3: Longitude, and Column 4: Latitude • Note: This is the format for Spotfire’s Network Analytics Module developed for the CIA • Example: See Next Slide and Semantic Medline
Our Semantic Web Strategy for Data:Spotfire Network Analytics http://semanticommunity.info/AOL_Government/Social_Media_-_Six_Degrees_of_Separation_and_Now_Even_Less
Edge and Node Tables To create a new network visualization it is necessary to provide an edge data table. It is optional to add a node data table since the application can generate a node table from your edge table as soon as you have made the necessary settings for the edges. The edge table must contain at least two columns, but usually more than two columns are needed for the network graph to give any useful insight into the data. The table should also contain a meaningful relation between the columns. For example, persons travelling to or from cities or, friendship relationships.
My Process • Linked Data Web Sites to MindTouch Knowledge Base and to an Excel Spreadsheet • Linked Data Nuclear Power Plants Demo Application to MindTouch Knowledge Base and to an Excel Spreadsheet • Other Nuclear Power Plant Data Sources (2) to an Excel Spreadsheet • Import the Above (5) and Into Spotfire • Get Visualizations and Beginning of a Unified Big Data Architecture and Ecosystem for Big Data Integration
Linked Data Book Web Site http://manning.com/dwood/ and http://manning.com/dwood/LinkedData_MEAP_ch1.pdf
Linked Data Book in MindTouch My Note: Every Section, Figure, and Code Listing Has a well-defined URL! http://semanticommunity.info/3_Round_Stones#Book
Knowledge Base Attachments My Note: This is similar to Callimachus attachments. http://semanticommunity.info/3_Round_Stones
Callimachus Linked Open Data Demonstrations http://demo.3roundstones.net/rdf/2012/nuclear/schema/index.xhtml?view
Callimachus jQuery Data Tables Example of Nuclear Power Plants http://demo.3roundstones.net/rdf/2012/datatable/index.xhtml?view
Arkansas Nuclear One http://demo.3roundstones.net/diverted;http://usepa.3roundstones.net/facilities/110028034721?view
Knowledge Base in MindTouchto Excel Spreadsheet Entity Extraction in Progress From MindTouchMashup to Excel Spreadsheet in Triple Format – Recall Slide 8 – to Build Strong Relationships. http://semanticommunity.info/@api/deki/files/23420/3RoundStonesLODDemos.xlsx
Use Other Nuclear Power Plant Data Sources Data.gov: Appa (Operating Rx- data.gov).xls PowerReactorStatusForLast365Days.xls http://www.nrc.gov/info-finder/reactor/ano1.html
3 Round Stones:Five Excel Spreadsheets in Spotfire My Note: See Beginning of Unified Data Architecture & Ecosystem Also Photo Images Linked Data. https://silverspotfire.tibco.com/ViewAnalysis.aspx?file=/users/bniemann/Public/3RoundStones-Spotfire
Summary • The New Digital Government Strategy of treating all content as data has been applied to the 3 Round Stones Web content and Callimachus Demo. • The Callimachus Demo has been turned into data in spreadsheets and statistical visualizations in Spotfire 5. • This simplifies the complex Callimachus interface which requires lots of extra mouse clicks and provides no faceted search. • There are other nuclear power plant data and metadata sources that should and have been included. • This process provides the beginning of a Unified Data Architecture and Ecosystem for Data Integration using the View Data function in Spotfire 5.
Post Meetup Comments • US EPA’s data problems are systemic and not technological (I know because I was there for 30 years and was their first data architect and data scientist). • I have produced over 50 EPA Data Science Products and used Spotfire 5 to integrate 30 or so of EPA’s major data sets for the 2011 EPA Apps for the Environment Challenge using Spotfire 5. • I helped design Data.gov, implemented a more semantic version while on detail to them, and helped the Japan METI start Open Government Data. • Be Informed is the most advanced semantic technology (ontology & rules) in the world, but they do not call it that for business reasons. • Semantic Medline is the “killer semantic web app” for the Federal Government that our Data Science Team is moving to the new Cray Graph Computer. • At the Health Datapalooza 2012, Dr. Bill Frist (Eminent Heart Surgeon and Former Senate Majority Leader) described the exciting work that he is involved in to improve the outcomes of heart transplant surgery by individualizing the treatment of patients that reject the normal organ transplant medications due to genetic factors. • I volunteer to show how make 3 Round Stones: All Content As Big Data using the new Digital Government Strategy and our Semantic Web Strategy for Data.