1 / 52

Understanding OLAP in Data Warehouse

Learn about OLAP features, dimensional analysis, models, and implementation in a data warehouse environment. Discover how OLAP supports strategic decision-making and multidimensional analysis.

ccorley
Download Presentation

Understanding OLAP in Data Warehouse

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OLAP IN THE DATA WAREHOUSE T.ROKAYAH BAYAN

  2. CHAPTER OBJECTIVES • Review the major features and functions of OLAP in detail • Grasp the intricacies of dimensional analysis and learn the meanings of hypercubes,drill-down and roll-up, and slice-and-dice • Examine the different OLAP models and determine which model is suitable for your environment • Consider OLAP implementation by studying the steps and the tools

  3. Introduction • OLAP stand for online analytical processing. • The term OLAP or online analytical processing was introduced in a paper entitled “Providing On-Line Analytical Processing to User Analysts,” by Dr. E. F. Codd, the acknowledged “father” of the relational database model. • The paper, published in 1993, defined 12 rules or guidelines for an OLAP system. • As the name implies,OLAP has to do with the processing of data as it is manipulated for analysis. • data warehouse provides the best opportunity for analysis and OLAP is the vehicle for carrying out involved analysis. • In today’s data warehousing environment, with such huge progress in analysis tools from various vendors, you cannot have a data warehouse without OLAP.

  4. OLAP definition: On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access in a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.

  5. DEMAND FOR ONLINE ANALYTICAL PROCESSING • data warehouse is meant for performing substantial analysis using the available data. • The analysis leads to strategic decisions that are the major reasons for building data warehouses in the first place. • For performing meaningful analysis, data must be cast in a way suitable for analysis of the values of key indicators over time along business dimensions. • the traditional methods of analysis provided in a data warehouse are not sufficient and perceive what exactly is demanded by the users to stay competitive and to expand. • Need for Multidimensional Analysis

  6. Example : • Imagine a business analyst looking for reasons why profitability dipped sharply in the recent months in the entire enterprise. The analyst starts this analysis by querying for the overall sales for the last five months for the entire company, broken down by individual months. The analyst notices that although the sales do not show a drop, there is a sharp reduction in profitability for the last three months. The analysis proceeds further when the analyst wants to find out which countries show reductions. The analyst requests a breakdown of sales by major worldwide regions and notes that the European region is responsible for the reduction in profitability. Now the analyst senses that clues are becoming more pronounced and looks for a breakdown of the European sales by individual countries. The analyst finds that the profitability has increased for a few countries, decreased sharply for some other countries, and been stable for the rest.

  7. Figure 15-1 showing the steps through the single analysis session.

  8. Example : discuss • How many steps are there? • Many steps, but a single analysis session and train of thought. • Each step in this train of thought constitutes a query. • The analyst formulates each query, executes it, waits for the result set to appear on the screen, and studies the result set. • Each query is interactive because the result set from one query forms the basis for the next query. • Did you notice that none of the queries in the above analysis session included any serious calculations? • This is not typical. In a real-world analysis session, many of the queries require calculations, sometimes complex calculations.

  9. OLAP is the Answer • the tools being used in the OLTP and basic data warehouse environments do not match up to the task. • We need different set of tools and products that are specifically meant for serious analysis. We need OLAP in the data warehouse.

  10. guidelines for an OLAP system The initial twelve guidelines for an OLAP system: • Multidimensional Conceptual View: Provide a multidimensional data model that is intuitively analytical and easy to use. Business users’ view of an enterprise is multidimensional in nature. Therefore, a multidimensional data model conforms to how the users perceive business problems. • Transparency. Make the technology, underlying data repository, computing architecture, and the diverse nature of source data totally transparent to users. Such transparency, supporting a true open system approach, helps to enhance the efficiency and productivity of the users through front-end tools that are familiar to them.

  11. guidelines for an OLAP system • Accessibility. Provide access only to the data that is actually needed to perform the specific analysis, presenting a single, coherent, and consistent view to the users. The OLAP system must map its own logical schema to the heterogeneous physical data stores and perform any necessary transformations. • Consistent Reporting Performance. Ensure that the users do not experience any significant degradation in reporting performance as the number of dimensions or the size of the database increases. Users must perceive consistent run time, response time, or machine utilization every time a given query is run.

  12. guidelines for an OLAP system • Client/Server Architecture. Conform the system to the principles of client/server architecture for better performance, flexibility, adaptability, and interoperability. Make the server component sufficiently intelligent to enable various clients to be attached with a minimum of effort and integration programming. • Generic Dimensionality. Ensure that every data dimension is equivalent in both structure and operational capabilities. Have one logical structure for all dimensions. The basic data structure or the access techniques must not be biased toward any single data dimension.

  13. guidelines for an OLAP system • Unrestricted Cross-dimensional Operations. Provide ability for the system to recognize dimensional hierarchies and automatically perform roll-up and drill-down operations within a dimension or across dimensions. Have the interface language allow calculations and data manipulations across any number of data dimensions, without restricting any relations between data cells, regardless of the number of common data attributes each cell contains. • Intuitive Data Manipulation. Enable consolidation path reorientation (pivoting),drill-down and roll-up, and other manipulations to be accomplished intuitively and directly via point-and-click and drag-and-drop actions on the cells of the analytical model. Avoid the use of a menu or multiple trips to a user interface.

  14. guidelines for an OLAP system • Dynamic Sparse Matrix Handling. Adapt the physical schema to the specific analytical model being created and loaded that optimizes sparse matrix handling. When encountering a sparse matrix, the system must be able to dynamically deduce the distribution of the data and adjust the storage and access to achieve and maintain consistent level of performance. • Multiuser Support. Provide support for end users to work concurrently with either the same analytical model or to create different models from the same data. In short, provide concurrent data access, data integrity, and access security.

  15. OLAP advantages • Enables analysts, executives, and managers to gain useful insights from the presentation • of data. • Can reorganize metrics along several dimensions and allow data to be viewed from • different perspectives. • Supports multidimensional analysis. • Is able to drill down or roll up within each dimension. • Is capable of applying mathematical formulas and calculations to measures. • Provides fast response, facilitating speed-of-thought analysis. • Complements the use of other information delivery techniques such as data mining. • Improves the comprehension of result sets through visual presentations using • graphs and charts. • Can be implemented on the Web. • Designed for highly interactive analysis

  16. OLAP Applications • Finance: Budgeting, activity-based costing, financial performance analysis, and financial modeling. • Sales: Sales analysis and sales forecasting. • Marketing: Market research analysis, sales forecasting, promotions analysis, customer analysis, and market/customer segmentation. • Manufacturing: Production planning and defect analysis.

  17. OLAP Key Features • Multi-dimensional views of data. • Support for complex calculations. • Time Intelligence.

  18. Representation of Multi-Dimensional Data • OLAP database servers use multi-dimensional structures to store data and relationships between data. • Multi-dimensional structures are best-visualized as cubes of data, and cubes within cubes of data. Each side of a cube is a dimension.

  19. Representation of Multi-Dimensional Data • Multi-dimensional databases are a compact and easy-to-understand way of visualizing and manipulating data elements that have many inter-relationships. • The cube can be expanded to include another dimension, for example, the number of sales staff in each city. • The response time of a multi-dimensional query depends on how many cells have to be added on-the-fly. • As the number of dimensions increases, the number of cube’s cells increases exponentially.

  20. Lattice of Cuboids all 0-D(apex) cuboid time item location supplier 1-D cuboids time,item time,location item,location location,supplier 2-D cuboids time,supplier item,supplier time,location,supplier time,item,location 3-D cuboids item,location,supplier time,item,supplier 4-D(base) cuboid time, item, location, supplier

  21. CUBE day 2 day 1 Multi-dimensional cube: Fact table view: dimensions = 3

  22. Aggregates • Add up amounts for day 1 • In SQL: SELECT sum(amt) FROM SALE • WHERE date = 1 81

  23. Aggregates • Add up amounts by day • In SQL: SELECT date, sum(amt) FROM SALE • GROUP BY date

  24. Aggregates • Operators: sum, count, max, min, median, avg • “Having” clause • Using dimension hierarchy • average by region (within store) • maximum by month (within date)

  25. Cube Aggregation rollup drill-down Example: computing sums day 2 . . . day 1 129

  26. Cube Operators day 2 . . . day 1 sale(c1,*,*) 129 sale(c2,p2,*) sale(*,*,*) sale(*,p1,*)

  27. Extended Cube * day 2 sale(*,p2,*) day 1

  28. Aggregation Using Hierarchies day 2 day 1 customer region country (customer c1 in Region A; customers c2, c3 in Region B)

  29. Pivoting day 2 day 1 Fact table view: Multi-dimensional cube:

  30. Cube Aggregates Lattice day 2 day 1 129 all city product date city, product city, date product, date use greedy algorithm to decide what to materialize city, product, date

  31. Dimension Hierarchies all state city

  32. Dimension Hierarchies all product city date product, date city, product city, date state city, product, date state, date state, product state, product, date not all arcs shown...

  33. Interesting Hierarchy all years weeks quarters conceptual dimension table months days

  34. SAMPLE CUBE Date 2Qtr 1Qtr sum 3Qtr 4Qtr TV Product U.S.A PC VCR sum Canada Country Mexico sum All, All, All Total annual sales of TV in U.S.A. Total annual sales of PC in U.S.A. Total sales In U.S.A Total annual sales of VCR in U.S.A. Total Q1 sales In U.S.A Total sales In Canada Total Q1 sales In Canada Total sales In Mexico Total Q1 sales In Mexico Total Q2 sales In all countries TOTAL SALES Total Q1 sales In all countries

  35. Typical OLAP Operations • Roll up (drill-up): summarize data • by climbing up hierarchy or by dimension reduction • Drill down (roll down): reverse of roll-up • from higher level summary to lower level summary or detailed data, or introducing new dimensions • Slice and dice: • project and select • Pivot (rotate): • reorient the cube, visualization, 3D to series of 2D planes. • Other operations • drill through: through the bottom level of the cube to its back-end relational tables (using SQL)

  36. SampleOLAP Drill down onlinereport

  37. Cube Operation • Cube definition and computation in OLAP • define cube sales[item, city, year]: sum(sales_in_dollars) • compute cube sales • Transform it into a SQL-like language (with a new operator cube by) SELECT item, city, year, SUM (amount) FROM SALES CUBE BY item, city, year • Need compute the following Group-Bys (date, product, customer), (date,product),(date, customer), (product, customer), (date), (product), (customer) () () (city) (item) (year) (city, item) (city, year) (item, year) (city, item, year)

  38. Roll-up and Drill-down The roll-up operation performs aggregation on a data cube, either by climbing up a concept hierarchy for a dimension or by dimension reduction such that one or more dimensions are removed from the given cube. Drill-down is the reverse of roll-up. It navigates from less detailed data to more detailed data. Drill-down can be realized by either stepping down a concept hierarchy for a dimension or introducing additional dimensions.

  39. Slice and dice The slice operation performs a selection on one dimension of the given cube, resulting in a sub_cube. The dice operation defines a sub_cube by performing a selection on two or more dimensions.

  40. Drill-Down

  41. Roll-Up

  42. Slicing

  43. Dicing (Sub-cube)

  44. Other OLAP Operations • Drill-Across: Queries involving more than one fact table • Drill-Through: Makes use of SQL to drill through the bottom level of a data cube down to its back-end relational tables • Pivot (rotate): Pivot (also called "rotate") is a • visualization operation which rotates the data axes in • view in order to provide an alternative presentation of • the data. Other examples include rotating the axes in a • 3-D cube, or transforming a 3-D cube into a series of 2- • D planes.

  45. Other OLAP Operations • Moving Averages • Growth Rates • Depreciation • Currency Conversion • Statistical Functions • Top N or Bottom N queries

More Related