1 / 14

Amirkabir University

Explore the different architectures for designing a data warehouse, including single DDS, NDS+DDS, ODS+DDS, and more. Learn about ETL processes, metadata management, and data profiling.

dbabin
Download Presentation

Amirkabir University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Amirkabir University Data Warehouse Design Architectures Morteza Zaker Supervisor : Prof . Abbdolahzadeh

  2. Presentation plan • Introduction • Data Warehouse Architecture • Concepts of dimensional model • History of Data Warehouse • Modeling issues • Conclusions

  3. DW and OLAP – general concepts • Data Warehouses – contain historical data for supporting decision-making process • On-Line Analytical Processing systems - facilitate manipulation of DW data • DW and OLAP require clear definition of facts, dimensions, and hierarchies • DW logical level design based on star/snowflake schema

  4. Data Warehouse Architecture • Data Flow Architectures • Single DDS • NDS+DDS • ODS+DDS • System Architecture • Federated Architectures • ETL Architectures

  5. Extract and transform and Load (ETL) then bring data from various source system into a stage area. ETL Integrate and transform stage’s data then load it to dimensional data store (DDS). when loading data into DDS ,(DQETL) do various rules to check data then bad data push into DQ data base for reporting and correcting .Bad data Automatically be corrected or tolerated if it can be needed. User Can get data via several front-end tools and applications. Some Application operate on Multidimensional format. So data in the DDS is Loaded to Multidimensional database( MDB ). Multidimensional is a form of database that data is stored like a cub. Cells of cube represent number of variable which is called Dimensions. Value of dimension show when and where business event happened. ETL system is managed by Control system based on the rules in the metadata. Metadata is a database that contain information about the data structure data usage-quality rules and other information about data. Audit system is used for understanding what happen during ETL process and then logs system oprenation into Metadata database. The Data is examined to realize characteristic of data by data Profiler. Data profiler analyze data to find out that for example how many row does Table has? And which one is Null and so on. Spreadsheet Source system are OLTP system that contain data which is loaded to DW . OLTP : Capture and store Business Transaction online. Pivot tables Ad Hoc query Metadata Control system+Audit Data Profiler reports Source System stage DDS ETL DQ ETL MDB analytic DQ reports correction Data Mining Other BI Application reports

  6. Extract data from several source system • Push it in stage area. Stage area could be a database or files system. • Stage is necessary because of lacking memory space and so on . Single DDS Application Metadata Control system+Audit s1 DDS Source System ETL+DQ Advantage of Single DDS issimpleto design , because the data from the stage is loaded straight into the dimensional datastore, without going to any kind ofnormalizedstore.It is good for system which has just one source or just has one dimension. The mainِDisadvantage is that it ismore difficult, in this architecture, to create a secondDDS. The DDS in the single DDS architectureis the master data store. s2 Stage MDB • Control system + Audit manage ETL system concurrently . • Log ETL process to Metadata file or database • Metadata contain Data Structure and data processing within data warehouse • Second ETL package pick up data from Stage and Integrates them. • Apply some Data Quality rules • Puts consolidated data into a DDS Application

  7. NDS + DDS Control system+Audit Metadata Application • Data storage = Stage, NDS & DDS • Core DW Store = Normalized & Dimensional Format • Data Marts = 1 to N Data Marts in each DDS • ETL Engine = 4 ETL Package • NDS Contain Master table and transaction Table • Master Table  Dimensions in DDS • Transaction table  Facts in DDS s1 DDS s2 NDS-ETL +DQ NDS DDS-ETL DDS s2 Stage Application MDB • NDS is the in front of DDS and NDS is our master data .Master data contain all historical nadstructral data . • DDS is our Transactional data and just could contain Single years of data . Application

  8. We have got • Data storage = Stage, ODS & DDS • Core DW Store = Normalized & Dimensional Format • Data Marts = 1 to N Data Marts in each DDS • ETL Engine = 4 ETL Package • ODS Contain Master table and transaction Table but it is not Master data store • Master Table  Dimensions in DDS • Transaction table  Facts in DDS The advantage of this architecture is that The third normal form is slimmer than the NDS because it contains only current values. In this architecture we have a central place to integrate, maintain, and publish master data. The normalized relational store is updatable by the user application. The mainِDisadvantage is that it ismore difficult, in this architecture, to create a secondDDS. The DDS in the single DDS architectureis the master data store. ODS + DDS Control system+Audit Metadata Application s1 DDS s2 ODS-ETL +DQ ODS DDS-ETL DDS s2 Stage Application MDB Application ODS is hybrid data store so User can access data from ODS

  9. Federated DW FDW FDW FDW Application Application Application ETL EII ETL DW3 DM3 DW2 DW1 DW2 DW3 DM2 DW1 DM1 Relational DW • EII(Extract Information Integration) • is a method to integrate data by accessing different source systems online and aggregating the outputs on the fly before bringing the end result to the user. • All 3 DWs must be standardized as the same structure. Third normalized DW • Data marts in the same Data warehouse is nonintegrated data marts. • They can be dimensional, normalized, or neither Dimensional DW • The FDW ETL needs to match the Updating time frequency of the source DWs. • The FDW ETL needs to integrate the data from source DWs based on business rules. • Duplicate records need to be merged. • Subject area in here is very narrow that the source DWs.

  10. System Architecture

  11. ETL Architectures

  12. Main Issues that must be considered There are two different types of database software • Symmetric multiprocessing(SMP) It is a databassystem that runs on one or more machines with several identical processors sharing thesame disk storage. The database is physically located in a single disk storage system.Examples of SMPdatabase systems are SQL Server, Oracle, DB/2, Informix, and Sybase 2. Massively parallel processing (MPP) It is a database system that 20uns on more than one machine where eachmachine has its own disk storage. The database is physically located in several disk storagesystems that are interconnected to each other. An Examples of MPP database systems are Teradata, Neoview, Netezza. MPP database systemis faster and more scalable than an SMP database system. In an MPP database system, atable is physically located in several nodes, each with its own storage.

  13. Research challenges (1) • Spatial measure aggregations considering • Their types • Distributive – reuse of aggregates, e.g., spatial union • Algebraic – additional treatments for reusing aggregates, e.g., center of n points • Holistic - new calculation with a row data, e.g., equi-partition • Topological relationships between hierarchy levels • Types of hierarchies

More Related