280 likes | 396 Views
Test Sets: LUBM, SP2B, Barton Set, Billion Triple Challenge. Varsha Dubey. Agenda. General Introduction of all the test sets LUBM Introduction LUBM for OWL LUBM Benchmark Summary SP2B Introduction A SPARQL Performance Benchmark Summary Barton Data Set Introduction
E N D
Test Sets: LUBM, SP2B, Barton Set, Billion Triple Challenge Varsha Dubey
Agenda • General Introduction of all the test sets • LUBM • Introduction • LUBM for OWL • LUBM Benchmark • Summary • SP2B • Introduction • A SPARQL Performance Benchmark • Summary • Barton Data Set • Introduction • Barton Data Set as RDF Benchmark • Summary • Billion Triple Challenge • Introduction • Solutions Proposed • Summary
LUBM - Introduction • LUBM: Lehigh University Benchmark - A Benchmark for OWL Knowledge Base Systems • Need – • Issue - How to choose an appropriate KBS for a large OWL application • 2 basic requirements for such application – • Enormous amount of data where scalability and efficiency becomes crucial • Sufficient reasoning capabilities to support semantic requirements of the system • Need semantic web data that are of large range and commit to semantically rich ontologies. • Increased reasoning capability – increased processing time/query response time • Best Approach - Extensional queries over a large dataset that commits to a single ontology of moderate complexity and size - LUBM
LUBM For OWL • University Benchmark for OWL • LUBM Design Goals – • Support extensional queries – • Extensional queries are queries about the instance data over ontologies. • Intensional queries (i.e., queries about classes and properties). • Majority of Semantic Web applications will want to use data to answer questions, and that reasoning about subsumption will typically be a means to an end, not an end in itself. • It is important to have benchmarks that focus on this kind of query.
LUBM For OWL • Arbitrary scaling of data – • In order to evaluate the ability of systems to handle large DATA, we need to be able to vary the size of data, and see how the system scales. • Ontology of moderate size and complexity – • Existing DL benchmarks have looked at reasoning with large and complex ontologies, while various RDF systems have been evaluated with regards to various RDF Schemas. • It is important to have a benchmark that fell between these two extremes. • Furthermore, since focus is on data, the ontology should not be too large.
LUBM For OWL • LUBM Overview – • This benchmark is based on an ontology for the university domain. • Its test data are synthetically generated instance data over that ontology; they are random and repeatable and can be scaled to an arbitrary size. • It offers fourteen test queries over the data. • It also provides a set of performance metrics used to evaluate the system.
LUBM Benchmark • Benchmark Ontology - Univ-Bench • Univ-Bench describes universities and departments and the activities that occur at them. • The ontology is expressed in OWL Lite, the simplest sublanguage of OWL. • The ontology currently defines 43 classes and 32 properties (including 25 object properties and 7 data type properties). • It uses OWL Lite language features including inverseOf,TransitiveProperty, someValuesFrom restrictions, and intersectionOf.
LUBM Benchmark • Data Generation and OWL Datasets – • Test data of the LUBM are extensional data created over the Univ-Bench ontology. • Data generation is carried out by UBA (Univ-Bench Artificial data generator), a tool we have developed for the benchmark. • The generator features random and repeatable data generation. • Instances of both classes and properties are randomly decided. • To make the data as realistic as possible, some restrictions are applied based on common sense and domain investigation. • Example Restrictions – • “a minimum of 15 and a maximum of 25 departments in each university”, • “an undergraduate student/faculty ratio between 8 and 14 inclusive”, • “each graduate student takes at least 1 but at most 3 courses”, and so forth. • The generator identifies universities by assigning them zero-based indexes, i.e., the first university is named University0, and so on.
LUBM Benchmark • Test Queries – • The LUBM currently offers fourteen test queries, one more than when it was originally developed. • They are written in SPARQL [28], the query language that is poised to become the standard for RDF. • Factors taken in consideration – • Input size – This is measured as the proportion of the class instances involved in the query to the total class instances in the benchmark data. • Selectivity – This is measured as the estimated proportion of the class instances involved in the query that satisfy the query criteria. • Whether the selectivity is high or low for a query may depend on the dataset used. • Complexity - We use the number of classes and properties that are involved in the query as an indication of complexity. • Since we do not assume any specific implementation of the repository, the real degree of complexity may vary by systems and schemata. • Assumed hierarchy information – This considers whether information from the class hierarchy or property hierarchy is required to achieve the complete answer. • Assumed logical inference - This considers whether logical inference is required to achieve the completeness of the answer. • Features used in the test queries include subsumption, i.e., inference of implicit subclass relationship, owl:TransitiveProperty, owl:inverseOf, and realization, i.e., inference of the most specific concepts that an individual is an instance of.
LUBM Benchmark • Performance Metrics – • Load Time – • In a LUBM dataset, every university contains 15 to 25 departments, each described by a separate OWL file. These files are loaded to the target system in an incremental fashion. • We measure the load time as the stand alone elapsed time for storing the specified dataset to the system. This also counts the time spent in any processing of the ontology and source files, such as parsing and reasoning. • Repository Size - • Repository size is the resulting size of the repository after loading the specified benchmark data into the system. Size is only measured for systems with persistent storage and is calculated as the total size of all files that constitute the repository. • Query Response Time - • Query response time is measured based on the process used in database benchmarks. • To account for caching, each query is executed for ten times consecutively and the average time is computed. • Query Completeness and Soundness – • We also examine query completeness and soundness of each system. • we measure the degree of completeness of each query answer as the percentage of the entailed answers that are returned by the system. Note that we request that the result set contains unique answers.
LUBM Benchmark • Benchmark Architecture – • Test Module requests operation on repository – open/close, launches the loading process, issues queries and obtains results through interface as shown. • Target systems and test queries are defined in KBS specification and Query definition files. • Queries are translated to query language before issuing to the system to reduce the query response time. • Translated queries are fed to tester through query definition file. • Tester reads each line from query definition file and passes to the system.
LUBM Summary • Summary- • In the LUBM, the Univ-Bench ontology models the university domain in the OWL language and offers necessary features for evaluation purposes. • The OWL datasets are synthetically created over the ontology. • The data generated are random and repeatable, and can scale to an arbitrary size. • Fourteen test queries are chosen to represent a variety of properties including input size, selectivity, complexity, assumed hierarchy information, assumed logical inference, amongst others. • A set of performance metrics are provided, which include load time and repository size, query response time, query completeness and soundness, and a combined metric for evaluating the overall query performance. • The LUBM is intended to be used to evaluate Semantic Web KBSs with respect to extensional queries over a large dataset that commits to a single realistic ontology. • Conclusion - • The LUBM is not meant to be an overall Semantic Web KBS benchmark. • It is a benchmark limited to a particular domain represented by the ontology it uses.
SP2B Introduction • SP2B - A SPARQL Performance Benchmark • Need – • Recently, the SPARQL query language for RDF has reached the W3C recommendation status. • In response to this emerging standard, the database community is currently exploring efficient storage techniques for RDF data and evaluation • strategies for SPARQL queries. • A meaningful analysis and comparison of these approaches necessitates a comprehensive and universal benchmark platform. • The Lehigh University Benchmark (LUBM) was designed with focus on inference and reasoning capabilities of RDF engines. However, the SPARQL specification disregards the semantics of RDF and RDFS i.e. does not involve automated reasoning on top of RDFS constructs such as subclass and subproperty relations. • With this regard, LUBM does not constitute an adequate scenario for SPARQL performance evaluation. This is underlined by the fact that central SPARQL operators, such as UNION and OPTIONAL, are not addressed in LUBM.
SP2B Introduction • SP2B Overview – • A language-specific benchmark framework specifically designed to test the most common SPARQL constructs, operator constellations, and a broad range of RDF data access patterns. • SP2Bench aims at a comprehensive performance evaluation, rather than assessing the behavior ofengines in an application-driven scenario. • It allows to assess the generality of optimization approaches and to compare them in a universal, application-independent setting.
SP2B – SPARQL Performance Benchmark • Benchmarking – • The Barton Library benchmark [19] queries implement a user browsing session through the RDF Barton online catalog. • By design, the benchmark is application-oriented. • All queries are encoded in SQL, assuming that the RDF data is stored in a relational DB. • Due to missing language support for aggregation, most queries cannot be translated into SPARQL. • On the other hand, central SPARQL features like left outer joins (the relational equivalent of SPARQL operator OPTIONAL) and • solution modifiers are missing. • In summary, the benchmark offers only limited support for testing native SPARQL engines. • Benchmark Queries
SP2B – SPARQL Performance Benchmark • Design Principles – • Relevant - thus testing typical operations within the specific domains. • means benchmark should not focus on correctness verification, but on common operator constellations that impose particular challenges. • Portable - i.e. should be executable on different platforms. • Scalable - e.g. it should be possible to run the benchmark on both small and very large data sets. • data generator is deterministic, platform independent, and accurate w.r.t. the desired size of generated documents. • Moreover, it is very efficient and gets by with a constant amount of main memory, and hence supports the generation of arbitrarily large RDF documents. • Understandable - • It is important to keep queries simple and understandable. At the same time, they should leave room for diverse optimizations. In this regard, the queries are designed in such a way that they are amenable to a wide range of optimization strategies.
SP2B • DBLP - • Digital Bibliography & Library Project • DBLP is a computer sciencebibliography website hosted at Universität Trier, in Germany. • It was originally a database and logic programming bibliography site, and has existed at least since the 1980s. • DBLP listed more than one million articles on computer science in March 2008. Journals tracked on this site include VLDB, a journal for very large databases, the IEEE Transactions and the ACM Transactions. Conference proceedings papers are also tracked. It is mirrored at five sites across the Internet. • For his work on maintaining DBLP, Michael Ley received an award from the Association for Computing Machinery and the VLDB Endowment Special Recognition Award in 1997. • DBLP originally stood for DataBase systems and Logic Programming but is now taken to stand for Digital Bibliography & Library Project.
SP2B • DBLP RDF Scheme in SP2B – • XML-to-RDF mapping of the original DBLP data set. • However, as we want to generate arbitrarily-sized documents we provide lists of first and last names, publishers and random words to our data generator. • Conference and journal names are always of the form “Conference $i ($year)” • and “Journal $i ($year)”, where $i is a unique conference (respectively journal) number in the year $year. • Borrowed vocabulary from FOAF, SWRC, and Dublin Core (DC) to describe persons and scientific resources. • Additionally, we introduce a namespace bench, which defines DBLP-specific document classes, such as bench:Book and bench:Article.
SP2B Data Generator – Data generation is incremental, i.e. small documents are always contained in larger documents. The generator is implemented in C++ and offers two parameters, to fix either a triple count limit or the year up to which data will be generated. When the triple count limit is set, we make sure to end up in a “consistent” state, e.g. when proceedings are written, their conference also will be included. All random functions (which, for example, are used to assign the attributes according to Table I) base on a fixed seed, which makes data generation deterministic. Moreover the implementation is platform-independent, so we ensure that experimental results from different machines are comparable.
SP2B – Summary • We have presented the SP2Bench performance benchmark for SPARQL, which constitutes the first methodical approach for testing the performance of SPARQL engines w.r.t. different operator constellations, RDF access paths, typical RDF constructs, and a variety of possible optimization approaches. • Our data generator relies on a deep study of DBLP. • Although it is not possible to mirror all correlations found in the original DBLP data, many aspects are modeled in faithful detail and the queries are designed in such a way that they build on exactly those aspects, which makes them realistic, understandable, and predictable.
Barton Data Set - Introduction • What is Barton Data Set – • The dataset we work with is taken from the publicly available Barton Libraries dataset. • This data is provided by the Simile Project which develops tools for library data management and interoperability. • The data contains records acquired from an RDFformatted dump of the MIT Libraries Barton catalog, converted from raw data stored in an old library format standard called MARC (Machine Readable Catalog). • Because of the multiple sources the data was derived from and the diverse nature of the data that is cataloged, the structure of the data is quite irregular. • At the time of publication of this report, there are slightly more than 50 million triples in the dataset, with a total of 221 unique properties, of which the vast majority appear infrequently. • Of these properties, 82 (37%) are multi-valued, meaning that they appear more than once for a given subject; however, these properties appear more often (77% of the triples have a multi-valued property).
Barton Data Set as RDF Benchmark • Barton Dataset Use – • As RDF Benchmark • This data set can be converted to RDF/RDFXML or Triple formats using some kind of tools developed and then used as performance benchmark for KBS. • Example Use – • Scalable Semantic Web Data Management Using Vertical Partitioning - • Barton data set converted to triples and used as performance benchmark to prove the performance of vertical partitioning of semantic web data.
Barton Dataset • Summary – • Barton data set is huge data set of MIT libraries catalog which could be used as performance benchmark for semantic web data systems. • The dataset provides a good demonstration of the relatively unstructured nature of Semantic Web data.
Billion Triple Challenge - Introduction • Introduction - • What is billion triple challenge? • Peter Mika (Yahoo!) and Jim Hendler (RPI) have initiated the Billion Triples Challenge at the 7th Int. Semantic Web Conference. • They constructed the challenge of managing a huge amount of over one billion ill-structured facts harvested from public sources such as Wikipedia and semantic home pages and making this information and its relationships available for easy access and intuitive interaction by the lay user.
Billion Triple Challenge – Solutions • Billion Triple Challenge - problem • Managing a huge amount of over one billion ill-structured facts harvested from public sources such as Wikipedia and semantic home pages and making this information and its relationships available for easy access and intuitive interaction by the lay user. • Billion Triple Challenge – solution • General solution overview/requirements – • Huge Data and Less Memory • Efficient data store – no redundancies • Efficient access – easy and fast
Billion Triple Challenge • Semantics Scales in the Cloud: • University of Koblenz wins the challenge. • Semaplorer - an application – • SemaPlorer is an easy to use application that allows end users to interactively explore and visualize a very large, mixed-quality and semantically heterogeneous distributed semantic data set in real-time. • Its purpose is to acquaint oneself about a city, touristic area, or other area of interest. • By visualizing the data using a map, media, and different context views, we clearly go beyond simple storage and retrieval of large numbers of triples. • The interaction with the large data set is driven by the user. • SemaPlorer leverages different semantic data sources such as DBpedia, GeoNames, WordNet, and personal FOAF files. These make a significant portion of the data provided for the billion triple challenge. • More info @ http://www.uni-koblenz-landau.de/koblenz/fb4/institute/IFI/AGStaab/Research/systeme/semap
Summary • Test sets – • Huge data sets either in RDF/Triple or any other format that can be used by KBS as RDF store to check the performance of the system while loading, querying/accessing such huge data sets. • LUBM, SP2B are more of benchmarking standards • Barton Data Set - a huge library data set • Billion Triple Challenge – Challenge to desig and develop such efficient KBS to handle billion triples.