150 likes | 167 Views
Benchmarking Reasoners for Multi-Ontology Applications. Ameet N Chitnis, Abir Qasem and Jeff Heflin. 11 November 2007. Talk Organization. Motivation ( a.k.a. why yet another benchmark? ) and Influences The Workload Domain Ontologies, map ontologies, data sources, queries The Metrics
E N D
Benchmarking Reasoners for Multi-OntologyApplications Ameet N Chitnis, Abir Qasem and Jeff Heflin 11 November 2007
Talk Organization • Motivation (a.k.a. why yet another benchmark?) and Influences • The Workload • Domain Ontologies, map ontologies, data sources, queries • The Metrics • How do we generate things? • Domain ontology generation • Map ontology Generation • Parameters & Relationships • Map Generator Algorithm • Data Source Generation • Query Generation • Sample Workload • Conclusion & Future Work
Motivation As the Semantic Web matures … • OWL Ontologies and data from various organizations will gain commercial value • Alignment of different ontologies and integration of data that commit to them will be a viable business enterprise • Quite possibly we will have post development alignments between ontologies (Alignment tools, third parties etc.) • Currently DBPedia, Hawkeye provides some form of third party alignments (non commercial) • We wanted to develop a benchmark that reflects the above reality
Influences • Lehigh University Benchmark (LUBM) by Y. Guo, Z. Pan, and J. Heflin. (ISWC 2004) • Extended LUBM (can support both OWL Lite and OWL DL) by L. Ma, Y. Yang, Z. Qiu, G, Xie and Y. Pan. (ESWC 2006) • Statistical Analysis of the available Semantic Web ontologies by Tempich, C. and Volz, R. (ISWC 2003) • Benchmarking DL systems by I. Horrocks and P. Patel-Schneider. (DL Workshop 1998) • Internet topology generator by J. Winick and S. Jamin. (University of Michigan)
The Workload (1) • Domain ontologies • “Simple” ontologies. We can control number of classes, properties, and branching factor of the hierarchies • Data sources • We can control number of data sources that commit to a given ontology, number of classes that will have individuals, number of properties that will connect those individuals, number of triples. • Queries • Extensional queries in SPARQL. • We can control the mix of classes, properties, individuals • We can control selectivity
The Workload (2) • Map ontologies: Main focus of this work • In our work a map ontology consists solely of “mapping” axioms that establish alignment between two domain ontologies • This is just for convenience of generation and analysis. Semantically they are not much different from the domain ontologies • Macro level: • We generate Directed acyclic graph of domain ontologies • Every edge represents a map ontology • Micro level: • We can control the type of axioms that are used to map two domain ontologies
Domain Ontology Generation • Simple taxonomy • The number to generate vary in a normal distribution with a user supplied value for the mean • Given a branching factor and number of terms we generate a balanced tree • Complex axioms are left for map ontologies
Map Ontology Generation Inputs • No. of Ontologies we want in the workload • Average Out-degree (referred to as out below) • Diameter The number of maps created is approximately equal to - • maps ~(total onts-terminal onts)* out However we do not have terminal onts as a parameter A reasonable approximation is Terminal ontologies ~ (onts*out)/(diameter+out) Thus we have Number of maps ~ (onts*out*diameter)/(diameter+out)
Map Generator Algorithm 1. Determine and mark the number of terminal nodes 2. Create a path of diameter length 3. Choose targets for every non-terminal ontology. Constraints: • No Cycles • No path greater than diameter • Non-terminal nodes should not become terminal Create the corresponding map ontologies by generating mapping axioms • Update the parameters of the source and the target
Mapping axioms • Given two domain ontologies and a desired distribution of OWL constructors and restrictions • We choose terms from the domain ontologies and create an axiom that connects them • We can generate fairly complex axioms • E.g. O1:A ⊔ O1:B ⊑∃ O2:P.O2:C ⊓∀O2:Q.O2:D • Currently the algorithm is restricted to generating axioms that will keep the ontology to OWLII (a subset of OWL used by OBII, Qasem et al. 2007, ISWC NFR workshop) • But this is NOT a limitation of our approach
Source Generation • Choose an ontology • Choose number of classes to create individuals • Generate triples • We can either generate random individuals or • Use the domain and range information to connect the individuals with properties
Query Generation SPARQL Queries (SELECT) • Choose the first predicate from the classes of an ontology. • We bias the next predicate with a 75% chance of being one of the properties from the ontology. • We make use of shared variables in order to implement “joins”. A shared variable is equally likely to be in the subject as well as the object position. • For single predicate queries all the variables are distinguished. For others, on an average 2/3rd of the variables are distinguished and the rest are non-distinguished. • There exists a 10% chance for a constant.
A Sample Workload • We used the benchmark to evaluate OBII – a distributed query answering system • We compared it with a “baseline” system which was essentially a KAON2 wrapper • Some characteristics of the workload • 50% of classes had individuals • On an average we generated 75 triples in a source • Generated configurations as large as 100 domain ontologies with about 1000 data sources
Conclusion and Future Work • A focus on workload that accounts for post development alignments • Micro level - controlling mapping axioms • Macro level - controlling how ontologies are mapped • Domain ontologies synthesis can be expanded to support complex axioms • Experiment with different characteristics • Hubs and Authorities (different in-degree / out-degree pattern)