120 likes | 254 Views
Open Government Knowledge: A Radical DBMS Vendor’s Perspective. Chris Biow Public Sector CTO AAAI, Fall 2011. Changing the world with ideas. Impact of Tim Berners-Lee. HTTP and WWW TimBL : 1989-90 World exploded through 1990s Netscape IPO 1995 Applied minimal structure to text
E N D
Open Government Knowledge:A Radical DBMS Vendor’s Perspective Chris Biow Public Sector CTO AAAI, Fall 2011
Changing the world with ideas Impact of Tim Berners-Lee • HTTP and WWW • TimBL: 1989-90 • World exploded through 1990s • Netscape IPO 1995 • Applied minimal structure to text • RDF and SEMWEB • TimBL, Hendler, Lasilla: 2001 • We are still struggling • Bex Huff, 2009: “this pile of vaporware called "The Semantic Web" is really starting to tick me off... So what has the W3C got to show for their decade of effort? A bunch of bloated XML formats that nobody uses... because we apparently needed more of those.[http://goo.gl/OFwxn]
Vision: The Semantic Web [TimBL et. al., SciAm May 2001] At the doctor's office, Lucy instructed her Semantic Web agent through her handheld Web browser. The agent promptly retrieved information about Mom's prescribed treatment from the doctor's agent, looked up several lists of providers, and checked for the ones in-plan for Mom's insurance within a 20-mile radius of her home and with a ratingof excellent or very good on trusted rating services. It then began trying to find a match between available appointment times (supplied by the agents of individual providers through their Web sites) and Pete's and Lucy's busy schedules. (The emphasized keywords indicate terms whose semantics, or meaning, were defined for the agent through the Semantic Web.)
Open Government / Business Opportunities Need a confluence of value • Political value • Democracy requires visibility • Visibility may preserve the office • Obama transparency initiative • Do the easy work first • “No bucks, no Buck Rodgers” [Tom Wolfe] • Where are the jobs? • Business value • Open Source Intelligence • Broad interest in SemTech • Killer app • Fairfax County Land Development System • Data sharing (ICD 501: discovery and request)
The Gap • 80% unstructured [Apoc., ~true] • Structure is given to the most important • Enterprise value lies across the spectrum of structure • Unstructured • Semi-structured • Poly-structured • Relational • Methods vary by normalization • RDF • SQL RDBMS • Document DBMS • CMS • Search is 80% ?
MarkLogic in Four Concepts Internet Scale Unstructured, Polystructured Information Commodity Hardware Real-Time Speed
Who Uses MarkLogic? Media Customers Government Customers Financial Services and Other Customers
Why Did We Do SemTech? • MarkLogic Server is a document-centric, metadata-sensitive database for unstructured information • MarkLogic is schema agnostic, and can query your content and metadata in whatever form it presents itself • Document-centric information and linked information are not mutually exclusive • RDFa: GoodRelations, rNews, Open Graph Protocol • Semantic data is increasing in government, enterprise, and financial services • Our customers are asking for a single solution
Open Government in the Trenches Fairfax County LDS • Multi-source synthesis • ETL • Oracle XML PLSQL (broken XML) • DataDirect XQuery • Mainframe flatfile parsing • RTF input • Geotagging