Toward Scalable Reasoning over Annotated RDF Data Using MapReduce

Toward Scalable Reasoning over Annotated RDF Data Using MapReduce Chang Liu1, Guilin Qi2 1Shanghai Jiao Tong University 2Southeast University, China

Motivation • More interests to represent additional information on top of RDF • Time, uncertainty, trust, and provenance • => Annotated RDF • Large amount of data • YAGO2 • Problem: Large Scale Reasoning

Motivation (cont’d) • Recent work on scalable reasoning using MapReduce • WebPIE (ISWC ‘09, ESWC ‘10) • Fuzzy pD* (ISWC ‘11) • Our idea • Large scale annoated RDF reasoner using MapReduce

Background: Annotated RDF • Syntax: • Deductive rules: • Subproperty, Subclass, Domain, Range, Generalization • Example: • Subproperty (a) • Zimmermann et al.: A general framework for representing, reasoning and querying with annotated Semantic Web data. Journal of Web Semantics 11, 72-95 (2012)

Background: MapReduce

Naïve Implementation • Subproperty (a) (P,sp,Q) : (X, P, Y) : Mapper Mapper Mapper Reducer Reducer Reducer (X,Q,Y) :

Challenges and solutions • Generalization Rule • Delete triples from the data set • Large data reconstruction cost • Solution • Only perform at the beginning and at the end • Combine Generalization Rule with other rules • E.g. when a reducer generates and , it generates instead.

Challenges and solutions (cont’d) • Unnecessary Derivation • E.g. • Waste a lot of computation time • Solution • Incorporate the annotation into mapped key • E.g. • Map to ((t1, p), (1, s,o, [1,2])) • Map to (t3, p), (2, q, [3,4])) • They will not be grouped together!

Challenges and solutions (cont’d) • Fixpoint Calculation • Subproperty/subclass rules require fixpoint iteration • Solution • Load subproperty/subclass schema triples into memory • Calculate the closure • Shortest path calculation Floyd-Warshall style algorithm … “Shortest” path

Experiment setup • Dataset • FuzzifiedDBPedia core ontology • fpdLUBM1000, 2000, 4000, 8000 • Cluster • 25 machine with 75 mapper/reducer slots • Liu et al.: Reasoning with Large Scale Ontologies in Fuzzy pD* Using MapReduce. Computational Intelligence Magazine, IEEE 7(2), 54-66 (2012)

Experiment result - fuzzy DBPedia Dataset: fuzzifiedDBPedia core ontology Results:

Experiment result – fpdLUBM Experimental results of FuzzyPD and WebPIE

Experiment result– fpdLUBM (cont’d) Scalability over number of units

Experiment result– fpdLUBM (cont’d) Scalability over data volume

Conclusion and Future work • We show how to design MapReduce algorithms to achieve scalable annotated RDFS reasoning • Several challenges along with solutions • Future work • More experiments on annotated RDFS ontologies • Annotated OWL 2 RL

Q&A

Toward Scalable Reasoning over Annotated RDF Data Using MapReduce