840 likes | 1.04k Views
Chapter 34. OLAP Transparencies. Chapter 34 - Objectives. The purpose of Online Analytical Processing (OLAP). The relationship between OLAP and data warehousing. The key features of OLAP applications. Chapter 34 - Objectives. How to represent multi-dimensional data.
E N D
Chapter 34 OLAP Transparencies
Chapter 34 - Objectives • The purpose of Online Analytical Processing (OLAP). • The relationship between OLAP and data warehousing. • The key features of OLAP applications.
Chapter 34 - Objectives • How to represent multi-dimensional data. • The rules for OLAP tools. • The main categories of OLAP tools. • OLAP extensions to the SQL standard. • How Oracle supports OLAP.
Business Intelligence Technologies • Accompanying the growth in data warehousing is an ever-increasing demand by users for more powerful access tools that provide advanced analytical capabilities. • There are two main types of access tools available to meet this demand, namely Online Analytical Processing (OLAP) and data mining.
Business Intelligence Technologies • OLAP and Data Mining differ in what they offer the user and because of this they are complementary technologies. • An environment that includes a data warehouse (or more commonly one or more data marts) together with tools such as OLAP and /or data mining are collectively referred to as Business Intelligence (BI) technologies.
Online Analytical Processing (OLAP) • Original definition - The dynamic synthesis, analysis, and consolidation of large volumes of multi-dimensional data, Codd (1993). • Describes a technology that is designed to optimize the storing and querying of large volumes of multi-dimensional data that is aggregated (summarized) to various levels of detail to support the analysis of this data.
Online Analytical Processing (OLAP) • Enables users to gain a deeper understanding and knowledge about various aspects of their corporate data through fast, consistent, interactive access to a wide variety of possible views of the data. • Allows users to view corporate data in such a way that it is a better model of the true dimensionality of the enterprise.
Online Analytical Processing (OLAP) • Can easily answer ‘who?’ and ‘what?’ questions, however, ability to answer ‘why?’ type questions distinguishes OLAP from general-purpose query tools. • Types of analysis ranges from basic navigation and browsing (slicing and dicing) to calculations, to more complex analyses such as time series and complex modeling.
OLAP Benchmarks • OLAP Council published an analytical processing benchmark referred to as the APB-1 (OLAP Council, 1998). • Aim is to measure a server’s overall OLAP performance rather than the performance of individual tasks.
OLAP Benchmarks • APB-1 assesses the most common business operations including: • bulk loading of data from internal or external data sources • incremental loading of data from operational systems; • aggregation of input level data along hierarchies;
OLAP Benchmarks • APB-1 assesses the most common business operations including (continued): • calculation of new data based on business models; • time series analysis; • queries with a high degree of complexity; • drill-down through hierarchies; • ad hoc queries; • multiple online sessions.
OLAP Benchmarks • OLAP applications are judged on their ability to provide just-in-time (JIT) information, a core requirement of supporting effective decision-making. • This requirement is more than measuring processing performance but includes its abilities to model complex business relationships and to respond to changing business requirements.
OLAP Benchmarks • APB-1 uses a standard benchmark metric called AQM (Analytical Queries per Minute). • AQM represents the number of analytical queries processed per minute including data loading and computation time. Thus, the AQM incorporates data loading performance, calculation performance, and query performance into a singe metric.
OLAP Benchmarks • Publication of APB-1 benchmark results must include both the database schema and all code required for executing the benchmark. • An essential requirement of all OLAP applications is the ability to provide users with JIT information, which is necessary to make effective decisions about an organization's strategic directions.
OLAP Applications • JIT information is computed data that usually reflects complex relationships and is often calculated on the fly. Also as data relationships may not be known in advance, the data model must be flexible.
OLAP Applications • Although OLAP applications are found in widely divergent functional areas, they all have the following key features: • multi-dimensional views of data • support for complex calculations • time intelligence
OLAP Applications - multi-dimensional views of data • Core requirement of building a ‘realistic’ business model. • Provides basis for analytical processing through flexible access to corporate data. • The underlying database design that provides the multi-dimensional view of data should treat all dimensions equally.
OLAP Applications - support for complex calculations • Must provide a range of powerful computational methods such as that required by sales forecasting, which uses trend algorithms such as moving averages and percentage growth. • Mechanisms for implementing computational methods should be clear and non-procedural.
OLAP Applications – time intelligence • Key feature of almost any analytical application as performance is almost always judged over time. • Time hierarchy is not always used in the same manner as other hierarchies. • Concepts such as year-to-date and period-over-period comparisons should be easily defined.
Multi-dimensional Data and OLAP cubes • Multi-dimensional data is facts (numeric measurements) such as property sales revenue data and the association of this data with dimensions such as location (of the property) and time (of the property sale). • Which is the best representation of multi-dimensional data: relational table, matrix or data cube?
Multi-dimensional data and OLAP cubes • We consider cubes as solid 3-D structures with equal sides. However, the OLAP cube is n-dimensional structure (with sides that need not be equal). • Alternative representation for n-dimensional data is to consider a data cube as a lattice of cuboids. Each cuboid represents a subset of the given dimensions.
0-D cuboid (highest-level) all 1-D cuboid time location type office 2-D cuboid time, location time, type time, office location, type location, office type, office 3-D cuboid time, location, type time, location, office time, type, office location, type, office 4-D cuboid (lowest-level) time, location, type, office Multi-dimensional data and OLAP cubes
Dimensionality Hierarchy • The lattice of cuboids does not show the hierarchies that are commonly associated with dimensions. • A dimensional hierarchy defines mappings from a set of lower-level concepts to higher level concepts.
country year region 2-D data quarter season city area month week zipCode day Dimensionality Hierarchy
Dimensional Operations • The analytical operations that can be performed on data cubes include: • Roll-up • Drill-down • Slice and Dice • Pivot
Dimensional Operations • Roll-up performs aggregations on the data by moving up the dimensional hierarchy or by dimensional reduction e.g. 4-D sales data to 3-D sales data. • Drill-down is the reverse of roll-up and involves revealing the detailed data that forms the aggregated data. Drill-down can be performed by moving down the dimensional hierarchy or by dimensional introduction e.g. 3-D sales data to 4-D sales data.
Dimensional Operations • Slice and dice - ability to look at data from different viewpoints. The slice operation performs a selection on one dimension of the data whereas dice uses two or more dimensions. For example a slice of sales revenue (type = ‘Flat’) and a dice (type = ‘Flat’ and time = ‘Q1’).
Dimensional Operations • Pivot - ability to rotate the data to provide an alternative view of the same data e.g. sales revenue data displayed using the location (city) as x-axis against time (quarter) as the y-axis can be rotated so that time (quarter) is the x-axis against location (city) is the y-axis.
OLAP Tools • There are many varieties of OLAP tools available in the marketplace. • This choice has resulted in some confusion with much debate regarding what OLAP actually means to a potential buyer and in particular what are the available architectures for OLAP tools.
Codd’s Rules for OLAP Systems • In 1993, E.F. Codd formulated twelve rules as the basis for selecting OLAP tools.
Codd’s Rules for OLAP Systems • Multi-dimensional conceptual view • Transparency • Accessibility • Consistent reporting performance • Client-server architecture • Generic dimensionality
Codd’s rules for OLAP • Dynamic sparse matrix handling • Multi-user support • Unrestricted cross-dimensional operations • Intuitive data manipulation • Flexible reporting • Unlimited dimensions and aggregation levels
Codd’s Rules for OLAP Systems • There are proposals to re-defined or extended the rules. For example to also include • Comprehensive database management tools • Ability to drill down to detail (source record) level • Incremental database refresh • SQL interface to the existing enterprise environment
Categories of OLAP Tools • OLAP tools are categorized according to the architecture used to store and process multi-dimensional data. • There are three main categories: • Multi-dimensional OLAP (MOLAP) • Relational OLAP (ROLAP) • Hybrid OLAP (HOLAP)
Multi-dimensional OLAP (MOLAP) • Use specialized data structures and multi-dimensional Database Management Systems (MDDBMSs) to organize, navigate, and analyze data. • Data is typically aggregated and stored according to predicted usage to enhance query performance.
Multi-dimensional OLAP (MOLAP) • Use array technology and efficient storage techniques that minimize the disk space requirements through sparse data management. • Provides excellent performance when data is used as designed, and the focus is on data for a specific decision-support application.
Multi-dimensional OLAP (MOLAP) • Traditionally, require a tight coupling with the application layer and presentation layer. • Recent trends segregate the OLAP from the data structures through the use of published application programming interfaces (APIs).
MOLAP Tools - Development Issues • Underlying data structures are limited in their ability to support multiple subject areas and to provide access to detailed data. • Navigation and analysis of data is limited because the data is designed according to previously determined requirements.
MOLAP Tools - Development Issues • MOLAP products require a different set of skills and tools to build and maintain the database, thus increasing the cost and complexity of support.
Relational OLAP (ROLAP) • Fastest-growing style of OLAP technology due to requirements to analyze ever-increasing amounts of data and the realization that users cannot store all the data they require in MOLAP databases.
Relational OLAP (ROLAP) • Supports RDBMS products using a metadata layer - avoids need to create a static multi-dimensional data structure - facilitates the creation of multiple multi-dimensional views of the two-dimensional relation.
Relational OLAP (ROLAP) • To improve performance, some products use SQL engines to support the complexity of multi-dimensional analysis, while others recommend, or require, the use of highly denormalized database designs such as the star schema.
ROLAP Tools - Development Issues • Performance problems associated with the processing of complex queries that require multiple passes through the relational data. • Middleware to facilitate the development of multi-dimensional applications. (Software that converts the two-dimensional relation into a multi-dimensional structure).
ROLAP Tools - Development Issues • Development of an option to create persistent, multi-dimensional structures with facilities to assist in the administration of these structures.