330 likes | 495 Views
OEM and LORE Query Language. Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu. Source : http://WWW-DB.Stanford.EDU/lore/. Semistructured Data (SSD). No explicit schema; Irregular and incomplete data Schema may be hidden or mixed with data Examples:
E N D
OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu
Semistructured Data (SSD) • No explicit schema; Irregular and incomplete data • Schema may be hidden or mixed with data • Examples: • Semi-structured data arises mainly from the integration of heterogeneous data sources; both structured and non-rigid structured • Information sources change, or new sources added. • semantic discrepancies among heterogeneous data sources • Data from the web • Overall site structure may change often. • Biological data
Characteristics of SSD • Missing or additional attributes • Multiple attributes • Different types in different objects • Heterogeneous collections • Self-describing • Irregular, no priori structure
Object Exchange Model (OEM) Motivation • Self-describing data model • information exchange and extraction • Handle incomplete and irregular data • Why a new data model? … it not a new model.
LORE • Lore : Lightweight Object Repository • Lightweight because • Object Model supported is lightweight • No multiuser or heavyweight DBMS features
Lore - motivation • Relational data model has null values, and OO models have inheritance and complex objects. Both have difficulties in designing schemas to incorporate irregular data. • To manage semi-structured data, as in such environment : • Difficult to decide in advance on a single, correct schema as • structure of the data may evolve rapidly, or • data elements may change types, or • data not conforming to previous structure may be added
Thus: • Need for management of semi-structured data! • Data managed by Lore is not confined to a schema and it may be irregular or incomplete. • OEM is the Lore’s data model. • Lorel is Lore’s query language.
Object Exchange Model (OEM) • Data in this model can be thought of as a labeled directed graph. • Schema-less and self-describing. • nodes are objects and • edges are labeled with attribute names and, • leaf nodes have atomic values • Object nesting. • Vertices in graph are objects. • Each object has a unique object identifier (oid), such as &5. • Atomic objects have no outgoing edges and are of types such as int, real, string, gif, java, etc. • All other objects that have outgoing edges are called complex objects.
OEM (Cont.) • Examples: • Object &3 is complex, and its subobjects are &8, &9, &10, and &11. • Object &7 is atomic and has value “Clark”. • DBGroup is a name that denotes object &1.(Names are entry points into the database). • Type and structure heterogeneity : Observe that members may have 0, 1 or more offices, office is a string and sometimes, also complex object, a room may be string and integer • DbGroup.Member denotes all member-labeled subobjects
An OEM Database DBGroup &1 Member Project Member Member Project Member &2 &3 &4 &5 &6 Name Project Name Office Project Name Age Age Office Office &9 &7 &8 &10 &11 &12 &13 &14 &15 &16 “Clark” “Smith” 46 “Gates 252” “Lore” “Tsimmis” “Jones” 28 Building Room Room Building &17 &18 &19 &20 “CIS” “411” “CIS” 252
Object Exchange Model - OEM • Each value exchanged is given an explicit label. Object temp-in-Fahrenheit, integer, 80 • “temp-in-Fahrenheit” is the label. • Each object is self-describing, with a label, type and value. set-of-temps, set, {cmpnt1, cmpnt2} cmpnt1 is temp-in-Fahrenheit, integer, 80 cmpnt2 is temp-in-Celsius, integer, 20
Labels • Lables Play two roles • identifying an object (component) • identifying the meaning of an object (component) person-record, set, {cmpnt1, cmpnt2, cmpnt3} cmpnt1 is person-name, string, ``Fred’’ cmpnt2 is office-num-in-bldg-5, integer, 333 cmpnt3 is department, string, ``toy’’ • Person-name both identifies cmpnt1 and coveys its meaning. • In relational data this corresponds to ….
Labels - Issues • Labels are relative (more specific) to the source of the data object. • Similar labels from different sources need to be resolved. • Labels provide the flexibility in representing object structure
Label Type Value Object-ID OEM - Specification • Each object in OEM has the following structure: • Label: A variable character string describing what the object represents. • Type: The data type of the object’s value. Each is either an atom type, or type set. • Value: A variable-length value of the object. • Object-ID: A unique variable-length identifier for the object or null.
OEM - Summary • OEM is an information exchange model. It does not specify how objects are stored at source. • OEM does specify how objects are received at a client, but after objects are received they can be stored in any way the client likes. • Each source has a distinguished object with lexical identifier ``root’’.
<biblio,set,{doc1,doc2,…,docn}> • doc1 is <doc, set, {auths1, topic1, call-no1}> • auths1 is <auth-set,set {auth11}> • auth11 is <auth-ln, string, ``Ullman’’> • topic1 is <topic, string,``Databases’’> • call-no1 is <internal-call-no, integer, 25> • doc2 is <doc, set, {auths2, topic2, call-no2}> • auths2 is <auth-set,set {auth21, auth22, auth23}> • auth21 is <auth-ln, string, ``Aho’’> • auth22 is <auth-ln, string, ``Hopcroft’’> • auth23 is <auth-ln, string, ``Ullman’’> Example • topic2 is <topic, string,``Algorithms’’> • call-no1 is <dewey-decimal, string, ``BR273’’> • docn is <doc, set, {authsn, topicn, call-non}> • authsn is <auth,string, ``Crichton’’> • topic1 is <topic, string,``Dinosaurs’’> • call-no1 is <fictional-call-no, integer, 95> • biblio is the root object.
OEM - QL SELECT Fetch-expression FROM Object WHERE Condition • The result of this query is itself an object, with special label ``answer’’: answer, set, {obj1, obj2, …, objn} • Each returned obji is a component of object specified in the From clause of the query, where the component is located by the Fetch-expression and satisfies the Condition.
Path • The notion of path is used in both Fetch-Expression in the Select clause and the condition in the Where clause. • Path describes traversals through an object using subobject structure and labels. • Example: ``biblio.doc.auth’’ • Paths are used in Fetch-Expression to specify which components are are returned in the answer object. • Paths are used in the condition to qualify the fetched objects or other (related) components in the same object structure.
Queries - Simple • Retrieve the topic of each document for which ``Ullman’’ is one of the authors: SELECT biblio.doc.topic FROM root WHERE biblio.doc.auth-set.auth-ln = ``Ullman’’ • Intuitively, the query’s where clause finds all paths through subobject structure with the sequence of labels [biblio,doc,auth-set,auth-ln] such that the object at the end of the path has value ``Ullman.’’ <answer, set, {obj1, obj2}> obj1 is <topic, string, ``Databases’’> obj2 is <topic, string, “Algorithms”>
Queries - ``wild-cards’’ • Retrieve all documents with internal call number: SELECT biblio.?.topic FROM root WHERE biblio.?.internal-call-no • ``?’’ label matches any label. For this query, the doc labels can be replaced by any other strings and query would produce the same result. By convention, two occurrences of ? In the same query must match the same label unless variables are used. <answer, set, {obj1}> obj1 is <topic, string, ``Databases’’>
Queries - ``wild-paths’’ • Retrieve all documents with internal call number: SELECT *.topic FROM root WHERE *.internal-call-no • Symbol ``*’’ matches any path of length one or more. The use of * followed by a single label is a convenient and common way to locate objects with a certain label in complex structure. Similar to ?, two occurrences of * in the same query must match the same sequence of labels, unless variables are used. <answer, set, {obj1}> obj1 is <topic, string, ``Databases’’>
Queries - variables • Retrieve each document for which both ``Hopcroft’’ and ``Aho’’ are co-authors: SELECT biblio.doc FROM root WHERE biblio.doc.auth-set.auth-ln=``Aho’’ and biblio.doc.auth-set.auth-ln=``Hopcroft’’ • Here, the query finds all the paths with structure [biblio, doc, auth-set], and with two distinct path completions with label auth with values ``Aho’’ and ``Hopcroft’’ <answer, set, {obj1}> obj1 is the complete doc2
Lorel Query Language • Need query language that supports path expressions for traversing graph data and handling of ‘typeless’ data. • A simple path expression is a name followed by a sequence of labels. • DBGroup.Member.Office. • Set of objects that can be reached starting with the DBGroup object, following edges labeled as member and then office.
Lorel (cont.) • Example: • select DBGroup.Member.Officewhere DBGroup.Member.Age < 30 • Result: • Office “Gates 252” • Office Building “CIS” Room “411”
Lorel Query Rewrite • Previous query rewritten to (OQL style) • select Ofrom DBGroup.Member M, M.Office Owhere exists y in M.Age : y < 30
Lorel Query Features • Explicitly handle coercion. • Automatic type coercion 0.5 < “0.9” should return true • Comparison on age transformed to existential condition. • Since all properties are set-valued in OEM. • A user can ask DBGroup.Member.Age < 30 regardless of whether Age is single valued, set valued, or unknown.
Path expression queries -specification for a set of possible paths through the graph • Example - * is a path expression that matches any number of labels • Use of Data guides – Structural summary of the database
Lorel (cont.) • General path expressions are loosely specified patterns for labels in the database.(‘|’ disjunction, ‘?’ label pattern optional) • Example: • select DBGroup.Member.Namewhere DBGroup.Member.Office(.Room%|.Cubicle)? like “%252” • Result: • Name “Jones”Name “Smith”
Lorel Queries - Simple Path Expression • Retrieve the offices of members with age greater than 30 years: Query SELECT DBGroup.Member.Office WHERE DBGroup.Member.Age > 30 Result Office “Gates 252” Office Building “CIS” Room “411”
Queries - General Path Expression Query SELECT DBGroup.Member.Name WHERE DBGroup.Member.Office(.Room%|.Cubicle)? Like “%252” Result Name “Jones” Name “Smith” • Room% matches all labels starting from Room, like Room68. “|” stands for disjunction. “?” indicates that the label pattern is optional. “like %252” specifies that the data value should end with string “252”.
Queries - SubQueries Retrieve Lore project members who work on other projects Query SELECT M.Name, ( SELECT M.Project.Title WHERE M.Project.Title != “Lore”) FROM DBGroup.Member M WHERE M.Project.Title = “Lore” Result Member Name “Jones” Title “Tsimmis”
Lore - Summary • Lore does facilitate query and updates on semi-structural databases • There has been more work done on optimization using: data guides (vldb97). • How is this related to WWW? • XML-QL and related work provides the answer.