280 likes | 307 Views
About blank nodes. 瞿裕忠( Yuzhong Qu ) yzqu@nju.edu.cn 计算机科学与技术系. Reading material. Aidan Hogan, Marcelo Arenas, Alejandro Mallea, Axel Polleres: Everything you always wanted to know about blank nodes . J. Web Sem. 27: 42-69 (2014) Extended paper of “ On blank nodes ”(ISWC2011). Introduction.
E N D
About blank nodes 瞿裕忠(Yuzhong Qu) yzqu@nju.edu.cn 计算机科学与技术系
Reading material • Aidan Hogan, Marcelo Arenas, Alejandro Mallea, Axel Polleres: Everything you always wanted to know about blank nodes. J. Web Sem. 27: 42-69 (2014) • Extended paper of “On blank nodes”(ISWC2011)
Introduction • Blank node: one of core features of RDF • Misunderstood, misinterpreted, ignored • Inconsistency between the standard and its actual use • Are the semantics and the current definition of blank nodes appropriate for the needs of the Web community?
Preliminaries • An RDF graph: a finite set of RDF triples • (s, p, o) ∈ UB × U × UBL • terms(G): the set of elements of UBL occurring in G. • voc(G) = terms(G) ∩ UL • A graph G is ground if terms(G) ∩ B = ∅ • A map is a partial function μ : UBL → UBL such that • μ(u) = u for all u ∈ dom(μ) ∩ UL • μ(G) ={(μ(s), μ(p), μ(o)) |(s, p, o) ∈ G} • μ : G1 → G2, if dom(μ) = terms(G1) and μ(G1) ⊆ G2.
Preliminaries • Isomorphic • maps blank nodes to blank nodes on a one-to-one basis • (s, p, o) ∈ G1 if and only if (μ(s), μ(p), μ(o)) ∈ G2. • A map μ is consistent with G if μ(G) is an RDF graph, • if s is the subject of a triple in G, then μ(s) ∈ UB, • if p is the predicate of a triple in G, then μ(p) ∈ U, etc. • If μ is consistent with G, μ(G) is called an instance of G. • An instance μ(G) of G is proper if μ(G) has fewer bnodes than G. • A merge of G1 and G2, denoted G1+G2, is the union G′1∪G′2, • G′1 and G′2 are isomorphic copies of G1 and G2, respectively • the sets of blank nodes in G′1 and G′2 are disjoint from each other.
Preliminaries • G simple-entails H, denoted by G H, if every model of G is also a model of H. • The simple entailment G H holds if and only if • there is a map μ : H → G. • The intractability of deciding whether an RDF graph G simple-entails a graph H depends only on the structure of the subgraph of H induced by its blank nodes • An RDF graph G is lean if there is no map μ such that μ(G) is a proper subgraph of G;
Existential variables in first-order logic • Existential first-order formulas without negation and disjunction. • ρ: UBL→UVL is a 1-1 map that is the identity on UL. • ρ(t): triple(ρ(s), ρ(p), ρ(o)), for t = (s, p, o) • Skolemisation • Replace existentially quantified variables by ‘‘fresh’’ constants • ∃x∀y R(x, y) , ∀y R(c, y) • ∀x∃y (P(x) → Q(y)) , ∀x(P(x) → Q(f(x)))
Simple entailment checks • The simple entailment check G H has the upper bound O(n2+mn2k), where k = tw(blank(H))+1 • blank(H): the blank graph of H • tw(.): treewidth of a given graph
Blank nodes in the standards • Turtle can avoid the auxiliary blank nodes in some cases • The JSON-LD specification permits use of blank nodes in the predicate position. To map such data to RDF, IRIs must first be minted for predicate terms. • Jena offers sound and complete methods for checking the isomorphism of two RDF graphs
Blank nodes in the standards The incompleteness of RDFS entailment rules • We cannot infer the triple :Federer rdf:type :Competitor
SPARQL Support for blank nodes • With respect to querying over blank nodes in the dataset, SPARQL considers blank nodes as constants that are local to the scoping graph they appear in {{(?X, _: b1)}, {(?X, _: b3)}}
SPARQL Support for blank nodes • SPARQL uses blank nodes in the WHERE clause of the query to represent non-distinguishable variables, • A second use of blank-node is within CONSTRUCT templates, which generate RDF data from solution mappings
Blank nodes in publishing • BTC-2012 corpus • 1.230 billion unique quadruples • 8.373 million RDF documents • 829 different pay-level domains • Prevalence of blank nodes in Web data • 274 M (22.3%) triples had a blank node as subject and 94M (7.7%) triples had a blank node in the object position. • 88M (25.9%) were blank nodes among all RDF terms. • 3.758 M (44.9%) documents featured at least one blank node. • 549 (66.2%) PLDs feature use of at least one blank node
Structure of blank nodes in web data • 1.477 M (39.3%) docs contained connected BNodes: • 3.334 M non-singleton components contained 62.938 M blank nodes • each component contained on average 18.8 blank nodes. • (71.0%) blank nodes were connected • (37.7%) blank-node components containing cycles. • 17 domains published blank-node components with cycles
Structure of blank nodes in web data • Of the 1,258,774 with a treewidth of 2, 1,257,229 of these (99.9%) originated from “data.gov.uk” • Only 19 components have a treewidth of three or more.
Structure of blank nodes in web data Distribution of degree of connected BNodes in directed blank graphs (log/log)
(Non-) Lean blank nodes in web data • 5.378 M (6.07%) BNode are non-lean. • The vast majority are isomorphic cases, main reasons: • documents copies • blank nodes are left ‘‘underspecified’’ and thus referentially ambiguous, where we would conjecture that the intent is often to refer to different real world things with each blank node.
Alternatives for blank nodes • Deprecate/disallow blank nodes • Discouraging the ‘‘unnecessary’’ use of blank nodes • Ground semantics • Well-behaved RDF • Acyclic blank nodes • No Change
Summary • Semantic of blank nodes • Simple entailment, Leanness checking • Tree Treewidth • Experimental analysis
Research Issue • Sentence pattern: Sub-tree mining • Sub-tree isomorphic problem • Inexact/Approximate graph matching
Related readings • J.-F. Baget, RDF entailment as a graph homomorphism, in: International Semantic Web Conference, 2005, pp. 82–96. • J.J. Carroll, Signing RDF graphs, in: ISWC, 2003, pp. 369–384. • Y. Tzitzikas, C. Lantzaki, D. Zeginis, Blank node matching and RDF/S comparison functions, in: International Semantic Web Conference, 2012, pp. 591–607. • J. de Bruijn, S. Heymans, Logical foundations of (e)RDF(S): complexity and reasoning, in: ISWC/ASWC, 2007, pp. 86–99. • ter Horst H J. 2005. Completeness, Decidability and Complexity of Entailment for RDF Schema and a Semantic Extension Involving the OWL Vocabulary. Web Semantics: Science, Services and Agents on the World Wide Web, 3(2-3): 79-115. • P. Hayes, RDF Semantics. W3C Recommendation, February 2004.
Related readings—Mining Subtree • Yun Chi, Richard R. Muntz, Siegfried Nijssen, Joost N. Kok: Frequent Subtree Mining - An Overview. Fundam. Inform. 66(1-2): 161-198 (2005) • Yun Chi, Yi Xia, Yirong Yang, Richard R. Muntz: Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees. IEEE Trans. Knowl. Data Eng. 17(2): 190-202 (2005) • Yun Chi, Yirong Yang, Richard R. Muntz: Canonical forms for labelled trees and their applications in frequent subtree mining. Knowl. Inf. Syst. 8(2): 203-234 (2005)
Related readings—Inexact Graph Matching • Jason Tsong-Li Wang, Kaizhong Zhang, Gung-Wei Chirn: Algorithms for Approximate Graph Matching. Inf. Sci. 82(1-2): 45-74 (1995) • Kaspar Riesen, Xiaoyi Jiang, Horst Bunke. Exact and Inexact Graph Matching: Methodology and Applications. Advances in Database Systems Volume 40, 2010, pp 217-247. • Yuanyuan Tian, Jignesh M. Patel: TALE: A Tool for Approximate Large Graph Matching.ICDE 2008: 963-972
Related readings—Inexact Graph Matching • Endika Bengoetxea. Inexact Graph Matching Using Estimation of Distribution Algorithms. 2002, PhD Thesis • Mongiovì M1, Di Natale R, Giugno R, Pulvirenti A, Ferro A, Sharan R. SIGMA: a set-cover-based inexact graph matching algorithm. J Bioinform Comput Biol. 2010 Apr;8(2):199-218. • Laura A. Zager, George C. Verghese, Graph similarity scoring and matching, Applied Mathematics Letters, Volume 21, Issue 1, January 2008, Pages 86-94.
Q&A 欢迎讨论