170 likes | 376 Views
Toward Scalable Reasoning over Annotated RDF Data Using MapReduce. Chang Liu 1 , Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University , China. Motivation. More interests to represent additional information on top of RDF Time, uncertainty, trust, and provenance
E N D
Toward Scalable Reasoning over Annotated RDF Data Using MapReduce Chang Liu1, Guilin Qi2 1Shanghai Jiao Tong University 2Southeast University, China
Motivation • More interests to represent additional information on top of RDF • Time, uncertainty, trust, and provenance • => Annotated RDF • Large amount of data • YAGO2 • Problem: Large Scale Reasoning
Motivation (cont’d) • Recent work on scalable reasoning using MapReduce • WebPIE (ISWC ‘09, ESWC ‘10) • Fuzzy pD* (ISWC ‘11) • Our idea • Large scale annoated RDF reasoner using MapReduce
Background: Annotated RDF • Syntax: • Deductive rules: • Subproperty, Subclass, Domain, Range, Generalization • Example: • Subproperty (a) • Zimmermann et al.: A general framework for representing, reasoning and querying with annotated Semantic Web data. Journal of Web Semantics 11, 72-95 (2012)
Naïve Implementation • Subproperty (a) (P,sp,Q) : (X, P, Y) : Mapper Mapper Mapper Reducer Reducer Reducer (X,Q,Y) :
Challenges and solutions • Generalization Rule • Delete triples from the data set • Large data reconstruction cost • Solution • Only perform at the beginning and at the end • Combine Generalization Rule with other rules • E.g. when a reducer generates and , it generates instead.
Challenges and solutions (cont’d) • Unnecessary Derivation • E.g. • Waste a lot of computation time • Solution • Incorporate the annotation into mapped key • E.g. • Map to ((t1, p), (1, s,o, [1,2])) • Map to (t3, p), (2, q, [3,4])) • They will not be grouped together!
Challenges and solutions (cont’d) • Fixpoint Calculation • Subproperty/subclass rules require fixpoint iteration • Solution • Load subproperty/subclass schema triples into memory • Calculate the closure • Shortest path calculation Floyd-Warshall style algorithm … “Shortest” path
Experiment setup • Dataset • FuzzifiedDBPedia core ontology • fpdLUBM1000, 2000, 4000, 8000 • Cluster • 25 machine with 75 mapper/reducer slots • Liu et al.: Reasoning with Large Scale Ontologies in Fuzzy pD* Using MapReduce. Computational Intelligence Magazine, IEEE 7(2), 54-66 (2012)
Experiment result - fuzzy DBPedia Dataset: fuzzifiedDBPedia core ontology Results:
Experiment result – fpdLUBM Experimental results of FuzzyPD and WebPIE
Experiment result– fpdLUBM (cont’d) Scalability over number of units
Experiment result– fpdLUBM (cont’d) Scalability over number of units
Experiment result– fpdLUBM (cont’d) Scalability over data volume
Conclusion and Future work • We show how to design MapReduce algorithms to achieve scalable annotated RDFS reasoning • Several challenges along with solutions • Future work • More experiments on annotated RDFS ontologies • Annotated OWL 2 RL