340 likes | 471 Views
An RDF and XML Database. John Snelson, Lead Engineer 23 rd October 2013. MarkLogic. DATABASE. SEARCH. APPLICATION SERVICES. Data ≠ Information. Data + Context = Information. Dynamic Semantic Publishing BBC Sports. The Challenge. Goals. Size and Complexity: # of athletes # of teams
E N D
An RDF and XML Database John Snelson, Lead Engineer 23rdOctober 2013
MarkLogic DATABASE SEARCH APPLICATION SERVICES
Data ≠ • Information
Data + • Context = • Information
Dynamic Semantic PublishingBBC Sports The Challenge Goals • Size and Complexity: • # of athletes • # of teams • # of assets (match reports, statistics, etc.) • # of relations (facts) • Rich user experience • See information in context • Personalize content • Easy navigation • Intelligently serve ads (outside of UK) • Manageable • Static pages? Too many, changing too fast • Limited number of journalists • Automate as much as possible
Dynamic Semantic PublishingA Solution XML Database Triple Store • Store, manage documents • Stories • Blogs • Feeds • Profiles • Store, manage values • Statistics • Full-Text search • Performance, scalability • Robustness • Metadata about documents • Tagged by journalists • Added (semi-)automatically • Inferred • Facts reported by journalists • Linked Open Data for real-world facts
Dynamic Semantic PublishingUnderstanding Data played in plays for plays in
What is RDF? “John” :person4 :place5 :birth-place :first-name :spouse :has-child :birth-place :spouse :has-parent :person5 :has-child :person20 :has-parent
What is RDF? • Schema-less • Triple granularity • Open world assumption • Joins - the cost of granularity RDF
What is Semantics? Datastored in Triples • Expressed as Subject : Predicate : Object Example: "John Smith" : livesIn : "London""London" : isIn : "England"
What is Semantics? Datastored in Triples • Expressed as Subject : Predicate : Object Example: "John Smith" : livesIn : "London""London" : isIn : "England" Rulestell us something about the triples Example: If (A livesIn X) AND (X isIn Y) then (A livesIn Y) Inference: "John Smith" : livesIn : "England"
What is Semantics? Datastored in Triples • Expressed as Subject : Predicate : Object Example: "John Smith" : livesIn : "London""London" : isIn : "England" Rulestell us something about the triples "John Smith" "London" "England" livesIn isIn livesIn
Semantics Architecture GRAPH SPARQL TRIPLE SPARQL XQY XSLT SQL
Triple Index • 3 triple orders • Cached for performance • Works seamlessly with other indexes • Security • 150 bytes per triple on disk • Billions oftriples per host • Scaling out horizontally TRIPLE
RDF Loading RDF
Triples Embedded in Documents … <sem:triple> <sem:subject> http://example.org/kennedy/person12 </sem:subject> <sem:predicate> http://example.org/kennedy/last-name </sem:predicate> <sem:object datatype="http://www.w3.org/2001/XMLSchema#string"> Lawford </sem:object> </sem:triple> …
Content, Data, and Semantics <SAR> </title> Suspicious vehicle near airport <title> Suspicious vehicle… 2012-11-12Z <date> </date> <type> observation/surveillance </type> <threat> <type> suspicious activity </type> <category> suspicious vehicle </category> </threat> <location> <lat> 37.497075 </lat> <long> -122.363319 </long> </location> A blue van with license plate ABC 123 was observed parked behind the airport sign… <description> A blue van… </triple> <predicate> isa </predicate> license-plate <object> </object> IRIID <subject> </subject> <triple> <triple> </subject> value <subject> IRIID <predicate> </predicate> <object> ABC 123 </object> </triple> </description> </SAR>
Content, Data, and Semantics Unstructured full-text <SAR> <description> <title> Suspicious vehicle… <triple> <type> A blue van… <object> <location> <date> <triple> <predicate> ABC 123 <lat> 2012-11-12Z <threat> <long> <subject> value <subject> 37.497075 IRIID -122.363319 <predicate> IRIID <type> observation/surveillance <object> Semantic (RDF) Triples isa <category> Geospatial Data suspicious activity license-plate suspicious vehicle
RDFValues <http://example.org/kennedy/person4> _:blank1 “string value”^^xs:string “bonjour”@fr “2013-04-09”^^xs:date “simple” “987”^^xs:double
Datatype Mapping IRI <http://example.com> sem:iri(“http:// example.com”) Blank Node _:blank1 sem:blank(“…”) Simple Literal “simple” xs:string(“simple”) Language “bonjour”@fr Tagged Literal rdf:langString(“bonjour”, “fr”)
SPARQL select * where { ?person :birth-place ?place; :first-name “John” } • Executed using the triple index • SPARQL 1.0 + much of SPARQL 1.1 • Cost-based optimization • Join ordering and algorithms SPARQL
Executing SPARQL sem:sparql(“ prefix : <http://example.org/kennedy/> select * { ?person :first-name ?first; :last-name ?last; :alma-mater [:ivy-league :true] }”, map:entry(“first”,“John”), (), cts:collection-query(“mycollection”) )
Returning Binding Solutions select * where { ?person :birth-place :place5 } select * where { ?person :birth-place ?place; :first-name “John” }
Solution Results map:map
SPARQL Query Results XML Format sem:query-result-serialize( sem:sparql(“select* { … }”), “xml” )
Returning Triples describe :person4 construct { ?bp :uses-name ?fn } where { ?person :birth-place ?bp; :first-name ?fn }
Triple Results :place0 :uses-name “Ethel”, “Jeffrey”, “Kara” . :place1 :uses-name “Edward”, “James” . :place10 :uses-name “Robert”, “Sheila”, “Stephen” . sem:triple sem:iri
Querying Named Graphs select * from <http://my_graph> where { ?s ?p ?o } collection select * where { graph <http://my_graph> { ?s ?p ?o } }
Restricting The Datasets let $options := “properties” let $query := cts:and-query( cts:directory-query(“/triples/”), cts:element-range-query( xs:QName(“date”),“>”,$date) ) returnsem:sparql(“…”,(),(), $options,$query)
Creating Triples Returning sem:triple values Inserting to a database • sem:triple() • sem:rdf-parse() • sem:rdf-get() • sem:rdf-builder() • sem:rdf-load() • sem:rdf-insert()
Graph Store API • declare function graph-insert( • $graphname as sem:iri, • $triples as sem:triple*, • [$permissions as element(sec:permission)*, • $collections as xs:string*, • $quality as xs:int?, • $forest-ids as xs:unsignedLong*] • ) as xs:string*; • declare function graph-delete( • $graphname as sem:iri • ) as empty-sequence();
Conclusion • Semantics can enhance your data-oriented and search applications. • XQuery and SPARQL work well together. • A combination RDF and XML database simplifies working with the technologies together. • Try MarkLogic 7:http://www.marklogic.com/early-access/