490 likes | 839 Views
Contents of this slideshow :. What is a datawarehouse? Multi-dimensional data modelling Data warehouse architecture. The hidden slides of this slideshow may be important. However, I will focus on leaning by exercises and therefore, rattling off new concepts are often done in hidden slides.
E N D
Contents of this slideshow: • What is a datawarehouse? • Multi-dimensional data modelling • Data warehouse architecture The hidden slides of this slideshow may be important. However, I will focus on leaning by exercises and therefore, rattling off new concepts are often done in hidden slides.
OLTP versus OLAPOLTP = On Line Transaction ProcessingOLAP = On Line Analytical Processing
A star shema datawarehouse has a central table (theFact table) surrouded by dimension tables with on-to-many relationships towards the fact table. An example of a Datawarehouse: The fixed data base structure implies that application programs (drilling functions/aggregates) can be generated automatically!
A dimension hierarchy is a set of tables connected by one-to-many relationships towards the fact table: Dimension hierarchies: In a dimension hierarchiy it is possible to aggregate data from the fact table to the different levels of the hierachy. Roll-up= aggregate along one or more dimensions. Drill-down= “de-aggregate” = break an aggregate into its constituents.
Two different types of drilling: • -Drilling in dimension hierarchies • -Drilling between dimensions.
Which star schemas or data marts can be build by using the illustrated integrated E-commerce/ERP data model?Which star schema would you recommend to be implemented first?
A galaxy is a set of star fact tableswith conformed (fælles tilpassede) dimensions: The value chain
Conceptual Modeling of Data Warehouses • Star schema: A fact table in the middle connected to a set of dimension tables • Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake • Galaxy schema: Multiple fact tables share dimension tables (Conformed dimensions), viewed as a collection of stars, therefore called galaxy schema or fact constellation
SELECT Product#, SUM(Qty*Price) AS TurnoverFROM Orderdetails JOIN ProductsGROUP BY Product# The aggregating level is the argument to the GROUP BY statement:
SELECT Product#, Salesman#, SUM(Qty*Price) AS TurnoverFROM Orderdetails JOIN Products JOIN Salesmen GROUP BY Product#, Salesman#; Drill down to the Product per Salesman level: Where should the Price be stored?
A Snowflake schema may have branches in the dimension hierarchies: Snowflake schema with branches: Are Customers related to the Regions?
Roll up can be executed by removing one or more argument to the GROUP BY statement. Roll up to the top level: Roll up to the product level. Roll up to the top level.
The aggregation level is the argument to the GROUP BY statement.
A dimension hierarchi is a set of tables connected by one-to-many relationships towards the fact table: Dimension hierarchies: A Snowflake schema may in contrast to star schemas have dimension hierarchies. Describe the advantage/disadvantage of using dimension hierarchies or Snowflake schema?
The figure illustrates an ER-diagram of a car rental company like Hertz or Avis. Exercise: Question 1.Design a star schema or Galaxy for the car rental company. Question 2.Is there advantages by storing suppliers as customers in e.g. an e-commerce data warehouse?
Contents of this slideshow: • What is a datawarehouse? • Multi-dimensional data modelling • Data warehouse architecture
Data Models • Relational models/ER-diagram used for OLTP databases • Stars, snowflakes and galaxies used for OLAP databases • Cubes used for OLAP databases
Et star schema DW can be illustrated as a multidimensinal cube:
Describe advantages/disadvantages of storing data in a cube in memory?
OLAP Cube operations: • OLAP operations: • Roll Up = Aggregatin to a higer level. For example from month to year) • Drill Down = recalculation with more details • Slice = Selecting a subset by using a fixed dimension value. • Drill Across = Join of fact data across conformed dimensions • Drill Through = Accessing related data from a OLTP system. • Aggregating • Pivoting = See next slide!
Pivoting =Transforming SQL query output to user friendly two dimensional screen layout day 2 day 1 Fact table view: Multi-dimensional cube:
OLAP Server Architectures • Relational OLAP (ROLAP) • Use relational or extended-relational DBMS to store and manage warehouse data • Include optimization of DBMS backend, implementation of aggregation navigation logic, and additional tools and services • greater scalability • Multidimensional OLAP (MOLAP) • Array-based multidimensional storage engine (sparse matrix techniques) • fast indexing to pre-computed summarized data • Hybrid OLAP (HOLAP) • Storage flexibility with mix of ROLAP and MOLAP • POLAP personel HOLAP
Contents of this slideshow: • What is a datawarehouse? • Multi-dimensional data modeling • Data warehouse design/implementation architectures 1. Kimball has a bottom-up architecture2. Inmonhas a top-down architecture3. Data Vault architecture is normalized tables extended with historic data tables. That is, the Data Vault can be used togenerate any data mart when needed.
Kimball’s Bottom-Up DW architecture: Kimball’s architecture uses conformed dimensions and conformed facts. Conformed dimensions makes it possible to drill across from one data mart to another to present data from different marts in the same view. Only the conformed data have top-down design.
Kimball’s Data Warehousing Architecture Query side ETL side Metadata Data sources Reporting Tools Query Services Presentation servers Desktop Data -Warehouse Browsing Access Tools - Extract -Accessand Security Data marts with aggregate-only data - Transform -Query Management - Load Data mining - Standard Reporting Conformed dimensions and facts Data Warehouse Bus -Activity Monitor Data Staging Area Data marts with atomic data Surrogate key (Surrogatnøgle) = A sequense number used as primary key.
William Inmon’s DATA WAREHOUSE architecture from 1990 has top-down design without conformed data. and: EDS = Enterprise Data Warehouse. Department dataware-houses The DSA (Data Staging Area) where transformation takes place is not illustrated.
The DATA VAULT architecture from 2002-2005 has full top-down design and buttom up implementation: Normalized Data Vault with historic data In the Data Vault database with historic information only the Extract activity has taken place. Therefore, the Data Vault architecture is not drowned in the design phase.
Classical Data warehousing 1 2 3 Source DSA EDW DM OLTP Aggregate Extraction Delta Detection Business Rules Filter Error handling Cleansing Trans- formation DSA = Data Staging Area EDS = Enterprise Data Warehouse
Classical Data warehousing 1 2 3 Source DSA EDW DM OLTP Aggregate Extraction Delta Detection Business Rules Filter Error handling Cleansing Trans- formation HANA from SAP is an In memory Data Warehouse product 1 Source DSA OLTP Aggregate Delta Detection Business Rules Error handling Cleansing Trans- formation Extraction Filter
Classical Data warehousing 1 2 3 Source DSA EDW DM Aggregate Extraction Delta Detection Business Rules Filter Error handling Cleansing Trans- formation In memory Data warehousing 1 How can OLTP and OLAP be integrated in a common In Memory database? OLTP Source DSA Aggregate Delta Detection Business Rules Error handling Cleansing Trans- formation Extraction Filter
Exercise: Transform the OLTP database to a Star schema DW for a Hospital. ER-diagram for a hospital.
End of session Thank you !!!
Inmon versus Kimball’s DW definitions: Kimball and Inmon agree in that OLAP datawarehouses do not use the OLTP databases. However, what is the difference in the architectures? Why do you think Kimball’s DW architecture is used most in practice?
Dates may be stored in different formats.As an example the First purchase date may be stored as a FK to a hierachical time dimension and Birth date as a SQL time stamp.Why is different Date formats used in the Customer table?
OLAP • OLAP = On-Line Analytical Processing • Interaktiv analyse • Eksplorativ opdagelse • Kræver hurtige svartider • Data kan vises som multidimensionelle terninger • Terninger/kuber kan have et vilkårligt antal dimensioner • Dimensioner har hierarkier, f.eks. dag-måned-år • OLAP operationer • Aggregering = Sammentælling af data, f.eks. med SUM, AVG, COUNT… • Startniveau, (Kvartal, Produkt) • Roll Up: mindre detalje, Kvartal->År • Drill Down : mere detalje, Kvartal->Måned • Slice: Projektering/selektering, År=1999 • Drill Across: “join” på fælles dimensioner • Drill Through: Opsøgning af kildedataene i de operative systemer • Pivoting
The Business Dimensional Lifecycle = Specifikation af krav Specifikation af krav Design af teknisk arkitektur Design af teknisk arkitektur Valg af produkt og installation Valg af produkt og installation Dimensionelmodellering Dimensionelmodellering Fysiskdesign Fysiskdesign ETL:design ogudvikling ETL:design ogudvikling Projektplanlægning Projektplanlægning Ibrugtagning Ibrugtagning Vedligehold og vækst Vedligehold og vækst Specifikation af applikationer Specifikation af applikationer Udvikling af applikationer Udvikling af applikationer Projektledelse Projektledelse Kimball’s activity model for DATAWAREHOUSE devellopment has three parallel tracks:
The Data Warehouse Bus Architecture = Arkitektur for design af en række data marts som tilsammen udgør virksomhedens data warehouse med fælles conformed dimensions og conformed facts. Data marts = afdelings data warehouse. Kimball bruger ordet mere generelt om en enkelt multidimensional database. Conformed dimensions = Fælles dimensioner, som er tilpasset kravere fra flere data marts. Stovepipe (kakkelovnsrør) = Skældsord for et data warehouse uden conformed dimensions.
Kimball’s datawarehouse concepts: Query side ETL side Metadata Data sources Reporting Tools Query Services Presentation servers Desktop Data -Warehouse Browsing Access Tools - Extract -Accessand Security Data marts with aggregate-only data - Transform -Query Management - Load Data mining - Standard Reporting Conformed dimensions and facts Data Warehouse Bus -Activity Monitor Data Staging Area Operationel systems Data marts with atomic data Service Data Element Inmonndoes not use the conformed facts and dimension tableconcepts!
In the DATA VAULT Architecture the data marts are loaded from a normalized database with historic information. Existing databases and systems (OLTP) New databases and systems (OLAP) DM DM DM Data Vault Appl. Data mining Appl. Appl. Visua- lization OLAP Appl. Appl. DB DB DB DB ETL … DB
In the future the DATA VAULT may be the only database and stored In-Memory. DM DM DM Data Vault Appl. Appl. OLAP Visua- lization Appl. Appl. Appl. Data mining DB DB DB DB ETL … DB SAP has already developed an In-Memory OLAP database called HANA