230 likes | 387 Views
A Z Approach in Validating ORA-SS Data Models. Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li. Introduction. Semistructured data Rapid growth in its usage Through World Wide Web, Web Services, other Web-based applications.
E N D
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li
Introduction • Semistructured data • Rapid growth in its usage • Through World Wide Web, Web Services, other Web-based applications. • Due to the introduction of XML and its related technologies. • Requires design of good semistructured data structure • Especially if the data is stored in a database. • Requires good schema definition • ORA-SS can be used. • ORA-SS • Provides schema definition for semistructured data. • Restricted to a diagrammatic notation and semantic written in English. • Requires formal mathematical semantics for wider utilization.
Motivation • Benefits of having formal semantics for ORA-SS • Remove ambiguity that may arise from a diagrammatic representation. • EnabletheuseofORA-SSinotherapplicationsandtools. • Reveal inconsistencies in a design at the schema and instance levels. • Increase quality of the software system through semantics checking. • Improve quality of the software system by providing deep semantic checking for semistructured data used.
ORA-SS Object • Object class • similar to an entity type in an ER diagram, a class in an object-oriented diagram, or an element in an XML document. • Relationship type • represents a nesting relationship among object classes. • is represented optionally with a labelled diamond and can be described by name, n, p and c. • name : name of relationship type • n : integer indicating the degree of relationship type • p : participation constraint of parent object class in relationship type • c : participation constraint of child object class in relationship type • Attribute • represents properties of an object or a property of a relationship. • can be a key attribute which has a unique value. • Reference • model recursive andsymmetric relationships. • reduce redundancy especially for many-to-many relationships. • represent disjunctionof objects and attributes. object name, n, p, c object Relationship name name name object object
ORA-SS Example • The diagram presents an ORA-SS schema that represents the structure of a particular semistructured data. • The schema should consists of followings: • relationship between the ‘course’ object class and the ‘student’ object class with a single-valued attribute ‘mark’. • object class ‘course’ with an identifier ‘code’, a single-valued attribute ‘title’ and multi-valued attribute ‘ANY’. • object class ‘student’ with an identifier ‘ID number’ and single valued attributes ‘name’ and ‘email’. • The schema diagram is syntactically correct but there are three semantic errors • The degree of relationship ‘cs’ is 3, representing a ternary relationship where it actually is a binary relationship since object ‘course’ is not related to any other objects besides ‘students’. • Having two primary keys for the object class ‘student’. There are two attributes selected as primary key where there should only be one primary key for each object class. • the primary key ‘ID number’ is represented as an attribute of the relationship ‘cs’ where it really is an attribute of an object ‘student’. • A validation process is required to pick up this kind of errors in the design process
Z & Z/EVES • Z • formal specification language • developed at the Programming Research Group at Oxford University. • based on set theory and first-order predicate logic. • declarative language with number of language constructs including given type, abbreviation type, axiomatic definition, state and operation schema definitions. • widely used forproviding formal semantics and verifications in various application domains. • Z/EVES • an interactive system for composing, checking, and analyzing Z specifications. • supports general theorem proving of Zspecifications.
Formal Semantics of ORA-SS (Basic Type & Relationship Type) • Basic Types • Basic types used in the ORA-SS data modeling language has been identified and defined prior to constructing the formal representation. • The object types and attribute types defined above represent the set of objectclasses, object instances, attributes and attribute values respectively in theORA-SS language. • Relationship Type • Relationship Type in ORA-SS data modeling language has been defined as a function with a set of object classes as itsdomain and a sequence of set of object classes as its range. • The predicate ofthe function uses a recursive definition and describes that object classes canbe related to other object classes as well as to other relationships.
Formal Semantics of ORA-SS (Relationship Type) • The definition includes two types of relationship in an ORA-SS schema diagram. • anormal relationship where the child participant is a single object class. • adisjunctive relationship where the child participant is a set of disjunctive objectclasses. • The first predicate of the function prevents cyclic definitions in the relationship structure. • The second predicate allows the represention of a binary relationship as well as a relationship of degree 3 ormore.
Formal Semantics of ORA-SS (Degree of a Relationship Type) • Every relationship in an ORA-SS schema diagram has its associated degreerepresented as a natural number. • The above definition represents degree as a function where the firstargumentrepresents the relationship and the second argument represents the naturalnumber which refers to the value of the degree of the relationship. • The predicate of the function defines that the degree of any relationship is the number of object classes involved in the relationship.
Formal Semantics of ORA-SS (Instances of Object Classes & Attribute) • In the ORA-SS data model, an object class has instances which are objects. • The above definition defines object classes having instances as a function where the first argument represents an object class and the secondargument represents a set of objects which refers to all the instances of theobject class. • The predicate of the function specifies that anobject cannot be an instance of multiple object classes. • The Instances of attribute has been defined similarly,as an object class hasinstances, attributes also have values.
Formal Semantics of ORA-SS (Instances of a Relationship Type) • Relationship type also has its instances whichrepresents the participation instances from their corresponding object classesin the relationship.
Formal Semantics of ORA-SS (Instances of a Relationship Type) • The relationship instance definitionis defined as a function wherethe first argumentrepresents a relationship and the second argument represents the instance of the relationship. • An instance of the relationship is represented as an object related to a sequence of objects that conforms to therelationship definition. • The first predicate of the function defines that the degree of a relationshipinstance should be the same as the degree of the relationship type. • Thesecond predicate defines that child object instance should be an instance of theassociated selected child object classes in the relationship. This predicate also defines that only the objects of a single object class is related to a parentobject or sub-relationship instance in the case of a disjunctive relationship. • The third predicate consists of two cases • If the degree of the relationship isbinary, the parent object instance should be an instance of the parent objectclass. • If the degree of the relationship is ternary or more, the second part ofthe predicate recursively defines that the sub-relationship instance sequenceis an instance of the sub-relationship type. • The last predicate definesthat any two relationship types should have their own disjoint set of instances. This specifies that a relationship instance cannot be an instance of multiplerelationship types.
Formal Semantics of ORA-SS (Participation Constraints on Object in a Relationship Type) • Every relationship type in an ORA-SS schema diagram has its associatedconstraints on its participating objects which is represented by the ‘min:max’ notation. It constrains the number of child objects that a parent object canrelate to and vice versa.
Formal Semantics of ORA-SS (Participation Constraints on Object in a Relationship Type) • The participation constraints on object in a relationship type is defined as a function where the first argument represents a relationship and the second argument represents a cartesian product of multiplicity which refers to a ‘min:max’ pair. • The predicateof the function defines that the number of relationship instances in whicheach object of the parent object class or each relationship instance of the sub-relationship type should be within the multiplicities defined in the relationship. • It specifies that the parent constraint sets the boundaries for the number ofchild objects that a single parent object or sub-relationship instance can have. • The child constraints of the relationship has been defined in a similar way.
Formal Semantics of ORA-SS (Candidate Key of an Object Class) • An object can have an attribute or set of attributes that have a unique valuefor each instance of an object class called a candidate key. • a candidate key is a single attribute with unique value. • a composite candidate key is a set of attributes with a unique combined value.
Formal Semantics of ORA-SS (Candidate Key and Primary Key of an Object Class) • The candidate key is defines as a relationship where objectclasses are related to the set of attributes which refer to all the candidatekeys that belong to the object. • The first predicate of thefunction defines that candidate keys belong to the set of attributes that theobject has. • The second predicate of the function defines two facts. • two objects are different when values of the candidate key for each objectare different. • two objects are the same when values of the candidate keyfor each object are the same. • The predicate also specifies thatthe value of candidate key for each object of an object class should uniquelyidentify an object instance. • Primary key has been defined as a total function with the same arguments as the candidate key definition and its predicate specifies that primary key is selected from a set of a candidate key.
Formal Semantics of ORA-SS (Other definitions) • Object class, attribute pair and their instances • Definition of object class and attribute pair is defined as a simple total function similar to primary key definition but with no predicates. • Definition of instance of object class and attribute pair has been defined similar to the instance of a relationship. • Cardinality of attribute values associated with an object • Definition of cardinality for attribute values associated with an object is defined similar to the participation constraints in relationship type.
Validation (Schema Diagram) • Guideline for validating an ORA-SS schema diagram • In a relationship type, the child object class must be either related to another parent object class to form a binary relationship or related to anothersub-relationship type to form a relationship type of degree 3 or more. • The degree of a binary relationship is 2, ternary is 3 and n-nary is n. • In a disjunctive relationship type, the child participants is a set of disjunctiveobject classes. • A composite attribute or disjunctive attribute has an attribute that is related to two or more sub-attributes. • A candidate key of an object class is selected from the set of attributes ofthe object class. • A composite key is selected from 2 or more attributes of an object class. • There can only be one primary key per object class and it can be either acandidate key or a composite candidate key. • Relationship attributes have to relate to an existing relationship. • An object class can reference one object class only, but an object class canbe referenced by multiple object classes.
Validation (Schema Diagram) • Most of the guidelines have been encoded into Z semantics of ORA-SS • When a schema diagram is represented in Z, it can be validated of its correctness against the ORA-SS Z semantic • Previous ORA-SS schema diagram example represented in Z • Validating the degree of ‘cs’ relationship • The validation proves that the definition of degree cs = 3 is invalid.
Validation (XML) • Guideline for validating an XML instance • Relationship instances must conform to the participation constraints. • In a disjunctive relationship, only one object class can be selected from thedisjunctive object class set and associated to a particular parent instance. • For a candidate key (single or composite), its value should uniquely identifythe object that this key attribute belongs to. • Each object can have one and only one primary key. • All attributes have their own cardinality and the number of attributes thatbelong to an object should be limited by the minimum and maximum cardinality values of the attribute. • For a set of disjunctive attributes, only one of the attribute choices can beselected and associated to an object instance. • Validation of a given XML instance following the guideline can be achieved by checking its consistency of the content in the document against its ORA-SS schema definitions.
Validation (XML) • XML document conforming to the corrected ORA-SS schema example • Validating parent participation constraints of the relationship type cs • The validation shows that ‘Course8’ and ‘Course9’ does notsatisfy the parent participation constraint of a minimum of 4 students percourse. • Similarly, we can check the childparticipation constraint of therelationship `cs'.
Conclusion • Contribution of this work • Definition of a formal mathematical semantics for the ORA-SS diagrammaticdata modeling notation. • It provides a rigorous formal foundation for the ORA-SS language. • It can be adopted by many semistructured data applicationswhich use the ORA-SS data model. • Definition of some guidelines for validatingthe ORA-SS data models at both the schema diagram level andthe XML instance level. • These can be used as a template for the applications that implement the validation algorithm of ORA-SSsemistructured data. • Demonstration of some reasoning steps using theZ ORA-SS semantics in validating customized ORA-SS schema diagrams andXML instances. • Proof steps are presented through a simple ORA-SS datamodel. • More complicated proofs can also be constructed for validating largesemistructured documents.
Conclusion • Future Work • Extending and concentrating the work on the automaticvalidation of semistructured data in Z. • Developing a translationprogram that automatically transforms an XML instance into its corresponding Z ORA-SS instance representation for machine validation. • Extending the current Z semantics of the ORA-SS language to modelthe normalization problems in semistructured data design.