210 likes | 339 Views
From UML to ROLAP multidimensional databases using a pivot model. Nicolas PRAT, ESSEC Business School Jacky AKOKA, CNAM Paris. BDA 2002, INT, Evry, October 2002. Overview. 1. Introduction 2. Unified multidimensional metamodel 3. Design method 4. Conclusion. 1. Introduction. Introduction.
E N D
From UML to ROLAP multidimensional databases using a pivot model Nicolas PRAT, ESSEC Business School Jacky AKOKA, CNAM Paris BDA 2002, INT, Evry, October 2002
Overview • 1. Introduction • 2. Unified multidimensional metamodel • 3. Design method • 4. Conclusion
1. Introduction Introduction • Data warehousing and OLAP market growing rapidly • => need for systematic, tool-supported method for data warehouse/multidimensional database design. • Difficulty of data warehouse design often underestimated by OLAP tool vendors. However, crucial phase. • => • Data warehouse design should follow the conceptual/logical/physical design phases (as in transactional database design).
1. Introduction State of the art • Many papers proposing multidimensional data models (sometimes with associated algebra/query language). • Only a few data warehouse design methods (Akoka 97, Akoka 01, Golfarelli 98, Cabibbo 98, Moody 00). • Distinction between conceptual/logical/physical: • Often unclear and/or missing phases. • Our contribution: • Data warehousedesign method based on UML, spanning the three design phases. • Metamodels for each design step (including unified multidimensional metamodel=>pivot model). • Transformations operating on the concepts of the metamodels. • Specification of the transformations in OCL (Object Constraint Language).
2. Unified multidimensional metamodel Multidimensional metamodel • Problem with the multidimensional metamodel: • No agreement on the concepts of this model (e.g. facts). • No agreement on the level of this model: physical, logical, or conceptual. • We consider the multidimensional metamodel to be at the logical level: • It exists independently of implementation. • Its concepts (e.g. dimension) are not as close to reality as concepts like the object or the entity. • Strong parallel with the relational model. • We have defined a unified multidimensional model.
2. Unified multidimensional metamodel Quantity sold 4 6 8 11 5 9 9 3 March 99 6 4 March 99 9 5 March 99 9 6 March 99 DAY YEAR MONTH QUARTER 3 7 March 99 Bordeaux 1 8 March 99 Brest Lyon 12 9 March 99 Nantes Paris CITY P1 P2 P3 P4 P5 P6 P7 REGION PRODUCT product name unit price LEGEND Measure DIMENSION CATEGORY Attribute Hierarchy Multidimensional modeling :
2. Unified multidimensional metamodel ModelElement ModelElement MultidimensionalModel MultidimensionalModel name name : : Name Name 1 1 1 + + ownedElement 1..* 1..* 1..* MultidimensionalModelElement MultidimensionalModelElement + owner + attribute DimensionLink DimensionLink 1 0..* Dimension Dimension DimensionAttribute DimensionAttribute DimensionHierarchy DimensionHierarchy 2..* 2..* 2..* 2..* 1..* 1..* 1..* 1..* + + dimensionLink dimensionLink 1..* 1..* 1..* 1..* + + dimensionHierarchy dimensionHierarchy 1 +source + dimensionLink 0..* { { ordered ordered } } +dimension +dimension 0..* 1 + target dimensionLink + + measure measure level level : : Integer Integer 0..* 0..* 0..* 0..* Dimensioning Dimensioning Measure Measure strong strong : : Boolean Boolean dummyMeasure dummyMeasure : : Boolean Boolean 0..* 0..* 0..* 0..* AggregateFunction AggregateFunction name name : : FunctionName FunctionName 1..* 1..* 1..* 1..* restrictionLevel restrictionLevel : : Integer Integer Unified multidimensional metamodel
3. Design method Universe Universe of of discourse discourse conceptual conceptual modeling modeling UML UML schema model CONCEPTUAL CONCEPTUAL DESIGN DESIGN enrichment enrichment /transformation /transformation Enriched Enriched / / transformed transformed UML UML model schema Logical mapping Logical mapping LOGICAL LOGICAL DESIGN DESIGN Unified multidimensional schema Unified multidimensional model Physical mapping Physical mapping PHYSICAL PHYSICAL DESIGN DESIGN MOLAP ROLAP star ROLAP snowflake schema schema schema Source confrontation Source confrontation DATA DATA CONFRON CONFRON - - TATION TATION Data Data Warehouse Warehouse Metadata Metadata Overview
3. Design method Conceptual design • Multidimensional representation of data (OLAP). • Conceptual phase necessary (vs. direct representation of data in ROLAP stars/snowflakes or MOLAP cubes). • Choice of UML for the conceptual phase: • Standard and well-known formalism • Simple and powerful constructs to represent data at a high level of abstraction • “Easy” mapping to relational and multidimensional systems. • 2-step conceptual design: • Definition of a UML model (class diagram without operations) • Enrichment/transformation of this model to facilitate further automatic mapping to a unified multidimensional model. • =>need to enrich UML metamodel.
3. Design method Enriched UML metamodel UMLModel UMLModel ModelElement ModelElement name name : : Name Name 1 1 1 1 1..* 1..* 1..* 1..* + + ownedElement ownedElement + + constraint constraint + + constrainedElement constrainedElement UMLModelElement UMLModelElement 0..* 0..* 0..* 0..* { { ordered ordered } } 0..* 0..* 0..* 0..* Constraint Constraint Relationship Relationship AssociationEnd AssociationEnd Attribute Attribute + + attribute attribute + + owner owner aggregation aggregation : : AggregationKind AggregationKind Class Class measure measure : : Boolean Boolean GeneralizationConstraint GeneralizationConstraint multiplicity multiplicity : : Multiplicity Multiplicity 0..* 0..* 0..* 0..* 1 1 1 1 { { ordered ordered } } + + connection connection 1 1 1 1 0..* 0..* 0..* 0..* 2..* 2..* { { ordered ordered } } +participant +participant +association +association +association +association {disjoint, {disjoint, complete complete } } 1 1 1 1 AttributeOfOrdinaryClass AttributeOfOrdinaryClass AttributeOfAssociationClass AttributeOfAssociationClass Association Association Generalization Generalization identifyingAttribute identifyingAttribute : : Boolean Boolean 0..* 0..* 0..* 0..* 0..* 0..* 0..* 0..* 1 1 1 1 {disjoint, {disjoint, complete complete } } + + generalization generalization + + child child {disjoint, {disjoint, complete complete } } OrdinaryClass OrdinaryClass AssociationClass AssociationClass OrdinaryAssociation OrdinaryAssociation + + specialization specialization 1 1 1 1 +parent +parent
3. Design method Product Product _type _type Media_type Media_type may may _ _ be be _ _ advertised advertised _in _in product product _type _type media_type media_type Conceptual design (step1) product product _unit _unit insertion insertion 1..* 1..* 1..* 1..* * * * * Region Region region region 1 1 1 1 1 1 1 1 1..* 1..* 1..* 1..* number number _of_ _of_ inhabitants inhabitants gets gets * * * * * * * * Media Media * * * * 1 1 1 1 exposure exposure Product Product media_ media_ name name percentage percentage _of_ _of_ region region media_ media_ exposure exposure advertising advertising _ _ price price product product _code _code product product _ _ name name * * * * * * * * * * * * 1..* 1..* 1..* 1..* 1..* 1..* 1..* 1..* 1 1 1 1 main_ main_ Target Target shareholder shareholder * * * * target target _code _code consumption consumption status status * * * * * * for for product product _ _ consumption consumption minimum_age minimum_age 1 1 1 1 maximum_age maximum_age Shareholder Shareholder sex sex * * * * * * is is _ _ strongly strongly _ _ influenced influenced _by _by shareholder shareholder _ _ name name * * * * * * * * * * Advertising Advertising _ _ campaign campaign Year Year campaign campaign _code _code {overlapping, complete} year year * * * * * * * * 1 1 1 1 Private Private _ _ shareholder shareholder Public_ Public_ shareholder shareholder 1..* 1..* 1..* 1..* during during public_ public_ shareholder shareholder _ _ level level Quarter Quarter 1 1 1 1 quarter quarter {disjoint, complete} 1 1 1 1 in in Person Person Company Company 1..* 1..* 1..* 1..* manager_ manager_ name name Date Date dd_mm_ dd_mm_ yy yy
3. Design method Conceptual design (step 2) • Enrichment/transformation of the UML model with 4 types of successive transformations: • Determination of identifying attributes • Determination of attributes representing measures • Migration of association attributes • Transformation of generalizations. • Determination of identifying attributes: • Identifier=not a standard concept in UML • Necessary in order to define dimensions in the logical phase • Necessary for ordinary classes only • Use of the tagged value {id}.
3. Design method Conceptual design (step 2) • Determination of attributes representing measures: • Measures vs. qualitative values • Distinction cannot based performed automatically based on types • Not necessary for identifiers (defined previously) • Use of the tagged value {meas}. • Migration of 1-1 and 1-N association attributes: • Check validity of representation first. • Transformation Tcc3 : Each attribute belonging to a 1-1 association is transferred to one of the classes involved in the association. • Transformation Tcc4 : Each attribute belonging to a 1-N association is transferred to the N-class, i.e. the class involved several times in the association. • Transformation of generalizations: • No direct mapping of UML generalizations to multidimensional hierarchies. • UML generalizations transformed into aggregations and classes.
3. Design method Shareholder Shareholder shareholder shareholder _ _ name name {id} {id} public_ public_ shareholder shareholder _ _ level level Transformation Tcc5 Shareholder Shareholder _type manager_ manager_ name name shareholder shareholder _name {id} _type {id} {overlapping, complete} * * * * * * * * 1 1 1 1 Shareholder Shareholder _type _type Private Private _ _ shareholder shareholder Public_ Public_ shareholder shareholder shareholder shareholder _type {id} _type {id} public_ public_ shareholder shareholder _ _ level level {disjoint, complete} 1 1 1 1 Private Private _ _ shareholder shareholder _type _type Person Person Company Company private private _ _ shareholder shareholder _type {id} _type {id} manager_ manager_ name name Occurrences of shareholder_type: {private,public,both} Occurrences of private_shareholder_type: {person,company,others} Conceptual design (step 2) • Transformation of generalizations (cont’d): • Transformation Tcc5 : For each level i of specialization of a class C, a class named Type-C-i is created. The occurrences of these classes define all the specializations of C. In case of overlapping between specializations, a special value is created for each overlapping between two or more sub-classes of C. In case of incomplete specialization, the special value “others” is created. A N-1 aggregation is created between the classes C and Type-C-i.
3. Design method Logical design • From enriched/transformed UML model to unified multidimensional model. Mapping of: • Ordinary classes and their attributes (transformations Tcl1 to Tcl3) • Associations and their attributes (transformations Tcl4 to Tcl6). • Mapping ordinary classes and their attributes : • Transformation Tcl1: The identifying attribute of each ordinary class is mapped into a dimension in the multidimensional model. • Transformation Tcl2: The non-identifying attributes of each ordinary class are mapped into dimension attributes in the multidimensional model if these non-identifying attributes are not measures of interest. • Transformation Tcl3: The non-identifying attributes of each ordinary class are mapped into measures in the multidimensional model if these non-identifying attributes are measures of interest.
3. Design method Logical design • Specifying transformation Tcl3 with OCL : Context UMLModel::Tcl3(nonIdentifier:Attribute, multidimensionalModel:MultidimensionalModel) :Measure pre: nonIdentifier.owner.oclIsTypeOf(OrdinaryClass) =true and nonIdentifier.identifyingAttribute=false and nonIdentifier.measure=true post:result.name=nonIdentifier.name post:nonIdentifier.owner.attribute-> forall(a1:Attribute| if a1.identifyingAttribute=true then result.dimension=Tcl1(a1) endif) post:multimensionalModel->includes(result)
3. Design method Media_type Media_type media_type {id} media_type {id} insertion insertion Region Region region region {id} {id} 1 1 1 1 number number _of_ _of_ inhabitants inhabitants { { meas meas } } 1..* 1..* 1..* 1..* gets gets * * * * * * * * Media Media exposure exposure media_ media_ name name {id} {id} advertising advertising _ _ price price media_ media_ exposure exposure { { meas meas } } * * * * Year Year * * * * year year {id} {id} Target Target * * target target _code {id} _code {id} 1 1 1 1 status status 1..* 1..* 1..* 1..* minimum_age minimum_age Quarter Quarter maximum_age maximum_age quarter {id} quarter {id} sex sex percentage percentage _of_ _of_ region region { { meas meas } } Enriched/transformed UML model Logical design Transformation Tcl1 dimension target_code dimension quarter dimension year dimension region dimension media_name dimension media_type attribute status [target_code] attribute minimum_age [target_code] attribute maximum_age [target_code] attribute sex [target_code] attribute insertion [media_type] attribute advertising_price [media_name] measure percentage_of_region [target_code] measure number_of_inhabitants [region] • Mapping ordinary classes and their attributes (example): Transformation Tcl2 Transformation Tcl3 Unified multidimensional model
3. Design method Logical design • Mapping associations and their attributes : • Transformation Tcl4: The attributes of each association class are mapped into measures, associated with dimensions obtained by mapping the identifying attributes of the ordinary classes directly or indirectly participating in the association class (transformation Tcl1). • Transformation Tcl5: A path formed by N-1 associations is mapped into a hierarchy in the multidimensional model. • Transformation Tcl6: Every N-M or N-ary association without at least one attribute that is always defined is mapped into a dummy measure, associated with dimensions obtained by mapping the identifying attributes of the ordinary classes directly or indirectly participating in the association (transformation Tcl1).
3. Design method Logical design dimension target_code dimension quarter dimension year dimension region dimension media_name dimension media_type attribute status [target_code] attribute minimum_age [target_code] attribute maximum_age [target_code] attribute sex [target_code] attribute insertion [media_type] attribute advertising_price [media_name] measure percentage_of_region [target_code] measure number_of_inhabitants [region] measure media_exposure [media_name,target_code,quarter] hierarchy time quarter->year hierarchy media_type media_name->media_type dummy measure gets [region,media_name] Media_type Media_type media_type {id} media_type {id} insertion insertion • Mapping associations and their attributes (example): Region Region region region {id} {id} 1 1 1 1 number number _of_ _of_ inhabitants inhabitants { { meas meas } } 1..* 1..* 1..* gets * * * * * * Media Media exposure exposure media_ media_ name name {id} {id} advertising advertising _ _ price price media_ media_ exposure exposure { { meas meas } } * * * * * * * * Year Year Target Target Transformation Tcl4 * * year year {id} {id} target target _code {id} _code {id} status status Transformation Tcl5 minimum_age minimum_age 1 maximum_age maximum_age 1..* sex sex Transformation Tcl6 Quarter Quarter percentage percentage _of_ _of_ region region { { meas meas } } quarter {id} quarter {id} Unified multidimensional model Enriched/transformed UML model
3. Design method Physical design • For each type of target system: metamodel + associated transformations (elaborates on/completes OMG’s Common Warehouse Metamodel). • Example transformation (ROLAP star) : • Transformation Tls4: Every hierarchy D1->D2->…->Dn of the logical model is mapped by considering all the sub-hierarchies Dj->Dj+1…->Dn where 1<=j<n and Dj dimensions at least one measure. A sub-hierarchy Dj->Dj+1…->Dn is mapped in the physical model by defining in the dimension table identified by Dj a column corresponding to each of the Di (where j<i<=n).
4. Conclusion Conclusion • Data warehouse design method based on UML: • Spans the conceptual/logical/physical levels • Each step: metamodels + associated transformations • Unified multidimensional metamodel at the logical level (pivot metamodel). • Tool support (prototype developed). • Future works: • Complete/specialise set of transformations • Further experimentation • Reverse engineering.