6011SUMCOM Advanced e Technology Work Shop. Lecture 02 Introducing Business Intelligence. 2. Cube Architecture. BI supports different types of data storage. Some are based on a database called a Cube.
An Image/Link below is provided (as is) to download presentationDownload Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.Content is provided to you AS IS for your information and personal use only. Download presentation by click this link.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.During download, if you can't get a presentation, the file might be deleted by the publisher.
E N D
Presentation Transcript
6011SUMCOM Advanced eTechnology Work Shop
Lecture 02Introducing Business Intelligence
2
Cube Architecture BI supports different types of data storage. Some are based on a database called a Cube. A Cube is a subset of data from a Data WareHouse that is organized into multi-dimensional structures. To define a Cube. Select a Fact Table and identify the numerical columns (measures) of interest within it. Select Dimension Tables that provide descriptions for the set of data to be analyzed.
Cube (Models,Regions,Quarters) Regions US Europe Asia Models Elise Exige Evora Cars_Sold Total_Sales Costs Quarters 1st 2nd 3rd
Cube Architecture ...cont Each dimension has discrete members. i.e. Region may contain US, Europe & Asia. Additionally, each Cube Dimension can have a hierarchy of levels that allow users to ask questions at a more specific level. i.e. Region may include Country & City... ...similarly, Quarters may include Week & Day.
Cube Architecture ...cont Cubes & multi-dimensional structures are managed by a special database system called an MDBMS (Multi-DimensionalDatabaseManagementSystem). SQL Server calls this AnalysisServices.
Aggregations Data is stored in the Fact Table in its most detailed form so reports can make use of it. However, a typical query on a Fact Table fetches many, perhaps millions of rows. The only practical operation is to apply an aggregate (sum, max, avg). This different use of data can reduce ad hoc query performance if executed on low level data, because resource intensive calculations are required.
Aggregations ...cont As a result, low level data from the Fact Table should be summarized and stored in advance. Such tables are called Aggregate Tables and the whole process is called aggregation. Take an ad hoc query that displays a month’s total sales. We must sum every sale for every day in the month. If there are 500 sales per day in 500 stores, this query = 7,500,000 (500 x 500 x 30) rows. If the data is aggregated using monthly sales by store, the table will have only 500 rows (the monthly total for each of 500 stores).
Aggregation Amount There are two extreme solutions to this. No aggregation at all. Exhaustive aggregation for every possible combination of queries that users will require. No aggregation at all is out of the question due to the massive performance problems. A Data WareHouse without any aggregation is probably unusable as a production data store.
Aggregation Amount ...cont The opposite solution is also not acceptable. The disk space to store Aggregate Data. The maintenance of Aggregate Tables. The initial Data Load is too lengthy. Thus, Aggregate Tables should be carefully planned and constructed. During the planning phase, consider... Where is the data concentrated? Which aggregates most improve performance?
Aggregation Amount ...cont The planning of Aggregate Tables depends on the concentration of data in the columns of the Fact Table. In a Data WareHouse, when there is no activity on a given day, the rows are not stored at all. When the system loads many rows compared to the total rows that can be loaded, aggregating those columns will improve performance. In contrast, when the system loads few rows compared to the total rows that can be loaded, aggregating those columns is not efficient.
Aggregation Amount ...cont The choice of aggregates that would most improve performance depends on end users. Thus, at the start of a BI project, end users should be consulted to gain an insight on how data will be queried and how many rows will be retrieved by those queries.
Physical Cube Storage OLAP (OnLineAnalyticalProcessing) systems use one of these architectures to store data. Relational OLAP (ROLAP) Multi-Dimensional OLAP (MOLAP) Hybrid OLAP (HOLAP) Generally, these three vary the way in which they store the cube measure groups (non-key columns in the Fact Table) and aggregations. Despite this they are logically equivalent.
Physical Cube Storage ...cont In ROLAP pre-computed data is not stored. Queries access the Relational Database in order to bring back the data required to answer the end users questions. No duplicate data which saves storage space. In MOLAP pre-computed data is stored in a Cube. Queries do not have to calculate the end users questions each time they are required so the query response may be faster than in ROLAP.
Physical Cube Storage ...cont HOLAP storage is a combination of the MOLAP and ROLAP storage architectures. Any aggregations are stored using MOLAP. Thus, for queries that use aggregations there is no difference in performance with MOLAP. The cube measure groups remain in the Relational Database as with ROLAP.