330 likes | 553 Views
Data Warehouse [ Example ]. J. Han and M. Kamber , Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001, ISBN 1558604898 . OLTP. it design for optimal transaction. OLAP. It design to give overview analysis of what happened! It is uses to built report answer the following :
E N D
Data Warehouse [ Example ] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001, ISBN 1558604898 Data Mining: Concepts and Techniques
OLTP it design for optimal transaction Data Mining: Concepts and Techniques
OLAP • It design to give overview analysis of what happened! • It is uses to built report answer the following : • Q1: who the supervisor that gave most discount? • Q2:in which Zip code did product a sell the most? To answer the questions, OLAP Cube are created. Data Mining: Concepts and Techniques
Example Assume we have made a record of the weather conditions during a two-week period, along with the decisions of a tennis player whether or not to play tennis on each particular day. We have values of four independent variables (outlook, temperature, humidity, windy) and one dependent variable (play) Consider our data stored in a relational table as follows: Data Mining: Concepts and Techniques
Example (cont.) Data Mining: Concepts and Techniques
Example (cont.) • By querying a DBMS containing the above table we may answer questions like: • What was the temperature in the sunny days? {85, 80, 72, 69, 75} • Which days the humidity was less than 75? {6, 7, 9, 11} • Which days the temperature was greater than 70? {1, 2, 3, 8, 10, 11, 12, 13, 14} • Which days the temperature was greater than 70 and the humidity was less than 75? The intersection of the above two: {11} Data Mining: Concepts and Techniques
Example (cont.) OLAP: Using OLAP we can create a Multidimensional Model of our data (Data Cube). For example using the dimensions: time, outlook and playwe can create the following model. Data Mining: Concepts and Techniques
Example (cont.) Obviously here timerepresents the days grouped in weeks (week 1 - days 1, 2, 3, 4, 5, 6, 7; week 2 - days 8, 9, 10, 11, 12, 13, 14) over the vertical axis. The outlookis shown along the horizontal axis and the third dimension playis shown in each individual cell as a pair of values corresponding to the two values along this dimension - yes / no. Thus in the upper left corner of the cube we have the total over all weeks and all outlook values. Data Mining: Concepts and Techniques
Example (cont.) By apply "Drill-down" to our data cube over the time dimension. This assumes the existence of a concept hierarchy for this attribute. We can show this as a horizontal tree as follows: Data Mining: Concepts and Techniques
Example (cont.) Data Mining: Concepts and Techniques
Example (cont.) The drill-down operation is based on climbing down the concept hierarchy, so that we get the following data cube: Data Mining: Concepts and Techniques
Multidimensional data model By using same example and change some values: • play has just two values - yes and no, it can replace them by 1 and 0 • This will allows us to add up values and thus get the total number of days when tennis was played and at the same time the number of days tennis was not played • Rename the day attribute into time, which is more general and will allow us to use other time units (e.g. weeks). Thus we get the following relational table: Data Mining: Concepts and Techniques
Multidimensional data model (cont.) Data Mining: Concepts and Techniques
Concept hierarchies 1- attributes day, temperature and humidity we can group values in subsets and name these subsets as following : Day: all ______|_________ | | week 1 week 2 _____|_____ _______|_______ | | | | | | | | | | | | | | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Data Mining: Concepts and Techniques
Concept hierarchies (cont.) Temperature: all ____________|_____________ | | | hot mild cool _ |___ __|____ ___|____ | | | | | | | | | | | | 80 81 83 85 70 71 72 75 64 65 68 69 Data Mining: Concepts and Techniques
Concept hierarchies (cont.) Humidity: all ___|___________ | | high normal ______|_______ ___|____ | | | | | | | | | | 85 86 90 91 95 96 65 70 75 80 Data Mining: Concepts and Techniques
Concept hierarchies (cont.) We may also extend the sets of numbers or replace them with intervals, which will make the hierarchy complete (covering all possible values). For example, humiditymay look like this: all ____|____ | | high normal | | [85,96] [65,84] Data Mining: Concepts and Techniques
Concept hierarchies (cont.) 2- For the nominal (non numeric) attributes outlook and windy we define one-level hierarchies, as their values cannot be ordered or grouped. outlook: all _______|________ | | | sunny rainy overcast Data Mining: Concepts and Techniques
Concept hierarchies (cont.) windy: all ___|____ | | true false Data Mining: Concepts and Techniques
Data cube • The number of dimensions define the total number of data cubes that can be created. number of elements is 2N elements; N is an number attributes Data Mining: Concepts and Techniques
Data cube (cont.) To create a data cube we have to: 1- Select dimensions, that is select a subset of attributes. • For example, select time and temperature. Thus we will create a two-dimensional data cube. 2- Select levels in the concept hierarchies. • For example, let us select weeks for time and degrees for temperature. 3- Select a measure to populate the cube. This is the attribute whose values will be aggregated across the dimensions (obviously it has to be numeric). • For example, Let us select play. Data Mining: Concepts and Techniques
Data cube (cont.) • The numbers in the internal cells are obtained by adding up the values of the play attribute, where the time and the temperature attribute are equal to the values in the corresponding row and column • For example the value 2 (row 2, column 8) means that tennis was played two days during week 2 when the temperature was 75. By placing the time values in the rows and the temperature values in the columns we get the following cube: Data Mining: Concepts and Techniques
OLAP operations Rollup: assume we want to change the level that we selected for the temperature hierarchy to the intermediate level (hot, mild, cool). Roll up produces the following cube: Data Mining: Concepts and Techniques
OLAP operations (cont.) Drill-down the drill down of the pervious data cube over the time dimension produces the following: Data Mining: Concepts and Techniques
OLAP operations (cont.) Data Mining: Concepts and Techniques
Lattice of cubes, slice and dice operations Lattice : there are five dimension: Time, outlook, temperature, humidity, windy. Data Mining: Concepts and Techniques
Lattice of cubes, slice and dice operations (cont.) • 0-D (apex) cuboids : { all} • 1-D cuboids:{ Time}, {Outlook}, {Temperature}, {Humidity}, { Windy} • 2-D cuboids: { {Time, Outlook}, {Time, Temperature}, { Time, Humidity}, {Time, Windy}, {Outlook, Temperature}, {Outlook, Humidity}, {Outlook, Windy}, {Temperature, Humidity}, { Temperature, Windy}, {Humidity, Windy} } Data Mining: Concepts and Techniques
Lattice of cubes, slice and dice operations (cont.) • 3- D Cuboids :{ { Time, Outlook, Temperature}, {Time, Outlook, Humidity}, {Time, Outlook, Windy}, {Time, Temperature, Humidity}, {Time, Temperature, Windy}, {Time, Humidity, Windy} {Outlook, Temperature, Humidity}, { Outlook, Temperature, Windy}, {Outlook, Humidity, Windy} {Temperature, Humidity, Windy} } Data Mining: Concepts and Techniques
Lattice of cubes, slice and dice operations (cont.) • 4-D cuboids: { { Time, Outlook, Temperature, Humidity}, {Time, Outlook, Temperature, Windy}, {Time, Outlook, Humidity, Windy}, {Time, Temperature, Humidity, Windy} {Outlook, Temperature, Humidity, Windy} } • 5- D cuboids { Time, Outlook, Temperature, Humidity, Windy} Data Mining: Concepts and Techniques
Lattice of cubes, slice and dice operations (cont.) • There are two other OLAP operations that are related to the selection of a cube - slice and dice. Slice : performs a selection on one dimension of the given cube, thus resulting in a subcube. • For example, if we make the selection (temperature=cool) we will reduce the dimensions of the cube from two to one, resulting in just a single column from the pervious tables. So, the result will be as following: Data Mining: Concepts and Techniques
Lattice of cubes, slice and dice operations (cont.) Data Mining: Concepts and Techniques
Lattice of cubes, slice and dice operations (cont.) • The dice operation works similarly and performs a selection on two or more dimensions. • For example, applying the selection (time = day 3 OR time = day 4) AND (temperature = cool OR temperature = hot) to the original cube we get the following subcube (still two-dimensional): Data Mining: Concepts and Techniques
The End Data Mining: Concepts and Techniques