1 / 10

Data Warehousing & OLAP

Data Warehousing & OLAP. Nuosang Du Jon B. Arnason CSCI 5707 November 19, 2013. 1 st affect

tiara
Download Presentation

Data Warehousing & OLAP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Warehousing & OLAP Nuosang Du Jon B. Arnason CSCI 5707 November 19, 2013

  2. 1st affect The on-line analytical processing has some advantages such as flexible analysis function, intuitive data manipulation and the visualization to analysis results which makes easy and efficient to the analysis based on a complex large number of data. Users can make the right judgments quickly. It can be used to confirm the proposed complex hypothesis, the result is in the form of graphics or tables. It doesn’t mark the abnormal information. OLAP should be based on a large amount of historical data on different time points and the complex analysis of multidimensional and integrated information. OLAP requires that the user has subjective information requirements definition, so the system efficiency is better. 2nd origin Is the concept of on-line analytical processing (OLAP), as the farther of the relational database E.F. Codd proposed in 1993, he also put forward 12 rules of OLAP, which caused great repercussions. Rule 1: OLAP model must provide multidimensional conceptual view Rule 2: Transparency Rule 3: Access ability Rule 4: Stable report ability Rule 5: Client/server architecture Rule 6: Equal dimension Rule 7: Dynamic sparse matrix processing Rule 8: Multi-user support ability Rule 9: Unlimited operating across dimensions Rule 10: Direct manipulation of data

  3. Rule 11: Flexible report generation Rule 12: Unlimited dimension and gathering II_ID Des J Prod_ID B_ID PSC_ID J PSC_ID PSC_Des PSC_ID PC_ID J PC_ID PC_Des J Cus_ID Name CCat_ID J CCat_ID CCat_Des J City_ID Cname Pro_ID J Pro_ID PName J Region_ID Reg_Des City_ID J Week_No Jjjjj jjjjj Provience Week Snow-flake Schema Time_ID Workday Week_No City Sales ID Region_ID Time_ID Product_ID Customer_ID Revenue Quantity_Sold Day Sales Region Fact Table Time Dimension Geography Dimension Brand Customer Dimension Product Dimension Customer Product Product Sub_Category Customer Category Product Category

  4. 3rd Category Today’s data processing can be divided into two categories: online transaction processing (OLTP), on-line Analytical processing (OLAP). OLTP is traditional relational database application for basic and daily transaction processing, such as bank transactions. OLAP is the main application of data warehouse system which supporting complex analysis operation, focusing on the decision supporting, it provides straightforward query results. The following table lists the comparison between OLTP and OLAP. user function database design data access work for size of database

  5. Customer_ID Customer Name Customer Category 4th Development background (You can find them on the internet easily.) 5thFunction In general, data warehouse system is a comprehensive enterprise database which can carry on the fast and accurate analysis to a large number of data for making better business decisions. It consists of three parts:

  6. 1. The data layer: the implementation of enterprise operating data extraction, transformation, cleaning and summary, forming the information, and storing in the database. 2. The application layer: through the on-line analytical processing, and even data mining application processing, realize the analysis of the data. 3. The presentation layer: through the front desk analysis tool, the query statements, statistical analysis, and the conclusion of multi-dimensional online analysis and data mining, it can be shown in front of the user. 6 Concepts OLAP is shown in front of the user its multidimensional view. D (Dimension): is the observation data of a particular perspective, it can be considered as a class attribute, the attribute sets from a D (time Dimension, etc.). D Level (Level): is the observation data of a particular perspective (i.e., a D), can also be different in every detail description field (time D: date, month, quarter). D (Member): a dimension value is the description of the data item in one dimensional position. (“yy/mm/dd” is a description of the position on the time dimension). M (Measure): values of multidimensional array. (Sep 2013, Minneapolis, 5707, database principle). Operation of OLAP multidimensional analysis have Drill (Drill up and Drill down), Slice, Dice and Pivot, etc. Drill: is to change the dimensional hierarchy and the analysis granularity. It includes the drill down and drill up. Drill up is the low level of detail data will be summarized to a high level of summary data, or reducing dimension; and drill down, on the other hand, transforms from the summary data into the detail data to observe or add new dimensions.

  7. Slice and dice: Given selected values in a part of D, we are concerned with measurement data in the distribution of the remaining dimensions. If the rest only has two dimensions it is called sliced; if it has three or more, it is called cut. Pivot: The transformation of dimension in the direction, reschedule the placement of D in the tables (for example, the swap of columns and rows). 7th Architecture The architecture of a data warehouse and OLAP are a complementary relationship. The modern OLAP system is based on data warehousing, in other words, generally drawn for a subset of the detailed data from the data warehouse and after the necessary gathering of the storage in the OLAP memory for the reading of the front-end analysis tools. A typical OLAP system architecture as shown below: OLAP system according to the data storage format can be divided into relational-OLAP(ROLAP), multidimensional OLAP(MOLAP) and hybrid-OLAP(HOLAP) three types. ROLAP MOLAP R: Use existing relational database technology M: Especially designed for the on-line analytical processing R: The response speed is slower than the MOLAP; M: Existing relational database for OLAP already do a lot of optimization, including parallel storage, parallel query, parallel data management, query optimization based on the cost, the bitmap index, resulting an improved performance. R: Data loading speed is fast M: Data loading speed is slow

  8. R: Storage cost is small, there is no limit to the dimension M: Need to calculate, may lead to data explosion, dimension is limited; failing to support the dynamic change of dimension R: Use RDBMS stored data, there is no file size limit M: The file size be limited in the operating system platform (only 10~20 g) R: Can be realized through SQL detailed data and summary data storage M: Lacking the standards of data model and data access The realization of the on-line analytical processing has three different methods: 1. ROLAP 2. MOLAP 3. Front-end display, on-line analytical processing (Desktop OLAP) Among these, Desktop OLAP needs to download all the data to the client, and then take report format/data structure reorganization on the client. The user can realize the dynamic analysis in the native. The method is more flexible, but it can support the very limited amount of data which seriously affect the scope and efficiency of use. So it has been eliminated now. You can find some examples below: Hyperion Oracle Cognos MicroStrategy IBM Brio

  9. Widget: Silverlight Winforms Based on a CUBE calculation and analysis.

  10. Bibliography Mailvaganam, Hari. “Introduction to OLAP”. www.dwreview.com/OLAP/Introduction_OLAP.html Chaudhuri, Surajit; Dayal, Umeshwar; Narasayya, Vivek. “An Overview of Business Intelligence”. Communications Of The ACM, Vol. 54, No. 8, August 2011.

More Related