450 likes | 563 Views
LORE L ight O bject Re pository. by Othman Chhoul CSC5370 Fall 2003. Outline. Introduction What is Lore? History Lore’s Forensic Conclusion Questions Demo. Introduction. Limitations faced by traditional Databases: force all data to adhere to an explicitly specified schema
E N D
LORELight Object Repository by Othman Chhoul CSC5370 Fall 2003
Outline • Introduction • What is Lore? • History • Lore’s Forensic • Conclusion • Questions • Demo
Introduction • Limitations faced by traditional Databases: • force all data to adhere to an explicitly specified schema • Data Elements may change • Structures may change along the execution path of an application • Head ache when it comes to decide on a fixed schema for irregular or unstable data
SemiStructured Data • Widespread SemiStructured Data: • “Self-describing” • “Schemaless” • Examples: • Data from the web • Overall site structure may change often. • It would be nice to be able to query a web site. • Data integrated from multiple, heterogeneous data sources. • Information sources change, or new sources added.
What is Lore? • Lore is a DBMS designed specifically for managing semistructured information, such as XML • Among the Pioneers in this domain
History • Built, from scratch, by the DB Group at Stanford University, with research funding from DARPA, NASA and others. • Introduced in 1995, with the first version of the query language called Lorel, and used OEM as data model. • A lightweightsystem, because it was designed for a single-user, read-only access. • 1999 - changed to support XML
Lore’s Forensic • Lore’s Data model • Lore’s Query Language • Lore’s General Architecture • When XML gets into action
OEM (Object Exchange Model) • Simple, self-describing, nested object model for semi structured data (XML???) • Data in this model can be thought of as a labeled directed graph • Vertices in graph are objects. • Each object has a unique object identifier (oid), such as &5. • Atomicobjects have no outgoing edges and are types such as int, real, string, gif, etc. • All other objects that have outgoing edges are called complex objects.
OEM (Summary) • An OEM object has: • Label: a character string, object aliases • OID: Object unique identifier • Type: Atomic (int, real, string), Complex • Value: If it is a complex object list of OIDs If it is an atomic object atomic value of type int, real, string…
Lorel (Lore’s Query Language) • Lorel is an extension of OQL • Lorel supports path expressions for traversing graph data • A simple path expression is a name followed by a sequence of labels. • DBGroup.Member.Office: Set of objects that can be reached starting with the DBGroup object, following edges labels member and then office.
Lorel • Range variables can be assigned to path expression • Path expression are used directly in queries in an SQL style: select DBGroup.Member.Office where DBGroup.Member.Age > 30
Lorel Result: Office “Gates252” Office Building “CIS” Room “411”
Lorel (Behind the scenes) • Previous query rewritten to OQL style: • select Ofrom DBGroup.Member M, M.Office Owhere exists y in M.Age : y > 30 • Comparison on age transformed to existential condition: • A user can ask DBGroup.Member.Age < 30 regardless of whether Age is single valued, set valued, or unknown.
Lorel (More examples) • select DBGroup.Member.Name where DBGroup.Member.Office(.Room%)? like “%252” • Result: Name “Jones” Name “Smith” • Update: update P.Member +=( select DBGroup.Member where DBGroup.Member.Name = "Clark" ) from DBGroup.Project P where P.Title = "Lore" or P.Title = "Tsimmis"
Lore’s General Architecture • Query and Update Processing • External Data • DataGuides
Query and Update Processing Queries Data Engine Results (A Set of OEM objects)
Query Plan Generator • select Ofrom DBGroup.Member M, M.Office Owhere exists y in M.Age : y > 30
Query Iterators • Use recursive iterator approach: • execution begins at top of query plan • each node in the plan requests a tuple at a time from its children and performs some operation on the tuple(s). • pass result tuples up to parent.
Tuples (Object Assignment) • OA is a data structure containing slots for range variables with additional slots depending on the query. • Each slot within an OA will holds the oid of a vertex on a path being considered by the query engine. • We should end up at the end of a query with complete OAs
Query Operators • The Scan operator returns all oids that are sub-objects of a given object following a specified path expression: • Scan (StartingOASlot, Path_expression, TargetOASlot) • For each oid in StartingOASlot, check to see if object satisfies path_expression and place oid into TargetOASlot. • For each returned OA of the left child, the join operator calls exhaustively the right child until no more OA is returned
Query Operators (cont) • The aggregation operator (Aggr) adds to the target slot the result of the aggregation. • The Join, Project and Select are almost identical to their corresponding relational operators • Other operators: CreateSet, GoupBy, ArithOp
Query Optimizer • Does only a few optimizations: • Push selection ops down query tree. • Eliminate/combine redundant query operators. • Explores query plans that use indexes when possible. • Two kinds of indexes: • Lindex (link index): returns all parents OIDs of a given OID via a label, impl. as hashing. • Vindex (value index): returns all atomic objects of a label that satisfies a condition, impl. as B+-trees
Vindexes • Because of non-strict typing system, have String Vindex, Real Vindex, and String-coerced-to-real Vindex. • Separate B-Trees of each type are constructed for each label. • Using Vindex for comparison • If type is string, do lookup in String Vindex • If can convert to real the do lookup in String-coerced-to-real Vindex. • If type is real or int, do almost the same thin
Index Query plans • If the user’s query contains a comparison between a path expression and a value + appropriate Vindex and Lindex exist generate an index query plan • Previous query: select O from DBGroup.Member M, M.Office O where exists y in M.Age : y > 30
Update Query plans update P.Member +=( select DBGroup.Member where DBGroup.Member.Name = "Clark" ) from DBGroup.Project P where P.Title = "Lore" or P.Title = "Tsimmis"
External Data • Enables retrieval of information from other data sources, transparent to the user. • An external object in Lore is a “placeholder” for the external data and specifies how lore interacts with an external data source.
External Data • During query processing Scan operator notifies the external data manager whenever an external object is encountered • The spec for an external object includes: • Location of a wrapper program to fetch and convert data to OEM, • timeout interval • a set of arguments used to limit info fetched from external source.
DataGuides • A DataGuide is a concise and accurate summary of the structure of an OEM database (stored as OEM database itself, kind of like the system catalog). • Very Helpful: • No explicit database schema difficult to formulate meaningful queries • Query processor may perform unnecessary work with no knowledge of the database structure. • What if a path expression doesn’t exist (waste). • Each possible path expression is encoded once.
DataGuides (cont) • DataGuides are dynamically generated and maintained over an existing database • Can store statistics in DataGuide For example, the # of atomic objects of each type reachable by p.
When XML gets into Action • Little reminder: • Lore first proposal in 1995 • XML new standard for data representation and data exchange over the WWW. • Public class XML_data extends Semi_structured_data • Lore among the pioneers to integrate XML in their DBMS architecture
From Semistructured Data to XML • Data Model • Query Language • DataGuides
Changes in The Data Model • Similar to an OEM, an XML element in Lore is a pair of < EID, VALUE> • EID: is a unique element identifier • VALUE: is either an atomic string text or a complex value containing: • A String value: tag XML tag • An ordered list of attribute-name/atomic-value • An ordered list of crosslink subelements of the form <label,EID>, reachable via IDREF or IDREFS • An ordered list of subelements of the form <label,EID>
Changes in The Data Model (cont) • Comments are ignored • When an XML document is mapped into this new data model, it can be seen as a directed labeled graph
Query Language • Extended path expression to distinguish between subelements and attributes, by using qualifiers: • DBGroup.Member.>Name &6, use > to implicitly specify a subelement • DBGroup.Member.@Name “Smith”, use @ to implicitly specify an attribute • DBGroup.Member.Name &6 “Smith”, when no @ or > qualifier is used, both attributes and subelements are matched
DataGuides • Provide a DTD from which Lore builds the corresponding DataGuide • Otherwise if no DTD is provided, a DataGuide is generated from the XML document • Problems when updating: • With a DTD is provided, validity is assured • With no DTD, DataGuide is updated as the XML document is updated
Conclusion • Lore was originally developed for OEM data model since 1995, XML was integrated later in 1999 • Lore Provided a clear and robust solution for storing, querying, and updating semistructured data (XML came after) • The Lore project was declared pretty much out of business in 2000 by The Stanford Database Group