400 likes | 556 Views
Chapter 8. Accessing Organizational Information – Data Warehouse. Learning Outcomes. 8.1 Describe the roles and purposes of data warehouses and data marts in an organization
E N D
Chapter 8 Accessing Organizational Information – Data Warehouse
Learning Outcomes 8.1 Describe the roles and purposes of data warehouses and data marts in an organization 8.2 Compare the multidimensional nature of data warehouses (and data marts) with the two-dimensional nature of databases
Learning Outcomes 8.3 Identify the importance of ensuring the cleanliness of information throughout an organization 8.4 Explain the relationship between business intelligence and a data warehouse
Intro to Data Warehousing This video must be watched Data warehouse demystified
History of Data Warehousing • Data warehouses extend the transformation of data into information • In the 1990’s executives became less concerned with the day-to-day business operations and more concerned with overall business functions • The data warehouse provided the ability to support decision making without disrupting the day-to-day operations
Data Warehouse Fundamentals • Data warehouse – a logical collection of information – gathered from many different operational databases – that supports business analysis activities and decision-making tasks • The primary purpose of a data warehouse is to aggregate information throughout an organization into a single repository for decision-making purposes
Data Warehouse – Definitions • A data warehouse is a type of computer based information system developed to provide an organization with business intelligence to support decision making and to monitor the operations in a company. • Integrates data from many different sources and makes it available to end users in a what they can understand and use in a business context in a timely manner.
Primary difference between a database and data warehouse • database stores information for a single application, whereas a data warehouse stores information from multiple databases, or multiple applications, and external information such as industry information
Data Warehouse Fundamentals • Extraction, transformation, and loading (ETL) – a process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse • Data mart – contains a subset of data warehouse information. The ETL process also gathers data from the data warehouse and passes it to the data marts
Data Warehouse: Extended Definition … … … Analysis Spreadsheets Charts Reporting Data Warehouse ExternalData OperationData ExternalData OperationData “Sales”
Components of a Data Warehouse Metadata means data about data
Multidimensional Analysis and Data Mining • Databases contain information in a series of two-dimensional tables • In a data warehouse and data mart, information is multidimensional, it contains layers of columns and rows
Dimensions • Dimensions could include such things as: • Products • Promotions • Stores • Category • Region • Stock price • Date • Time • Weather
slicing-and-dicing • The ability to look at information from different dimensions can add tremendous business insight • By slicing-and-dicing the information a business can uncover great unexpected insights
Multidimensional Analysis and Data Mining • Cube – common term for the representation of multidimensional information
Data Warehouses Are Multidimensional A Multidimensional Data Warehouse with Information from Multiple Operational Databases
Online analytical processing (OLAP) • OLAP is is an approach to quickly answer multi-dimensional analytical queries • The typical applications of OLAP are in business reporting for sales, marketing, management reporting
OLAP • The output of an OLAP query is typically displayed in a matrix (or pivot) format. • Excel pivot table is a simple analogy to OLAP.
Data Marts – Smaller Data Warehouses • Data mart - a subset of a data warehouse in which only a focused portion of the data warehouse information is kept.
Multidimensional Analysis and Data Mining • Data mining – the process of analyzing data to extract information not offered by the raw data alone • To perform data mining users need data-mining tools • Data-mining tool – uses a variety of techniques to find patterns and relationships in large volumes of information and infers rules that predict future behavior and guide decision making
Data-Mining Systems • Process data using statistical techniques like regression analysis and decision tree analysis • Look for patterns and relationships to anticipate events or predict outcomes • Example: Market-basket analysis – computes correlation of items on past orders to determine items that are frequently purchased together.
Decision Tree Figure CE14-3
Example of Data Mining • Several years ago, a large retailer implemented a data warehouse to analyze sales. • Loaded huge volumes (Terabytes) of Point of Sale data into the warehouse • Built an application, based on specialized ‘data mining’ software to perform ‘market basket analysis’ • What items are purchased with other items in the same in the same transactions
“Diapers and Beer” • Noticed some unusual correlations, one was many transactions where beer in same market basket as diapers • Analysis identified a ‘micro-segment’ of customer base – young fathers, buying diapers, deciding to get beer at same time • Based on information, retailer reorganized diaper aisle – placed beer at end on aisle • Beer sales increased.
Problems with Operational Data for BI • Raw data usually unsuitable for sophisticated reporting or data mining • Non-Standard • Dirty data ( gender, age, phone #, misspelling, email address) • Values may be missing ( gender….) • Inconsistent data ( time zone) • Data can be too fine ( clickstream) or too coarse (totals)
Information Cleansing or Scrubbing • An organization must maintain high-quality data in the data warehouse • Information cleansing or scrubbing – a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information
Information Cleansing or Scrubbing • Contact information in an operational system
Information Cleansing or Scrubbing • Standardizing Customer name from Operational Systems
Cleaning of raw data • Cleaning is a process used to remove unwanted ‘noise’ from data and to make data more presentable • Consider the following shipping address: • Address 1: 123 Main * deliver to rear entrance * • City: Antigonish • Postal Code: B2G 2A1 • Province: NS • Cleaning process would remove the *comment * in the address field
Standardization of data • Consider the following addresses: • These should be standardized as: • 123 Main, Antigonish, Nova Scotia, B2G 2A1
Information Cleansing or Scrubbing • Information cleansing activities
Information Cleansing or Scrubbing • Accurate and complete information
Business Intelligence (BI) Systems • Provide information for improving decision making hence competitive advantage • Primary systems: • Reporting systems • Data-mining systems
What is a report? • It’s a program that retrieves data from a database, formats and presents it to a user • Can allow filtering, sorting, aggregation • A report can be: • displayed online • exported to excel or other software • Emailed as an attachment • printed Data Report program Lists, charts, spreadsheets
Reporting Systems • Integrate data from multiple sources • Process data by sorting, grouping, summing, averaging, and comparing • Results formatted into reports • Improve decision making by providing right information to right user at right time
Tools for looking at data Digital Dashboard • Instead of paper reports, warehouses give users tools for looking at data differently. They also allow those users to manipulate their data. • There are times when a color coded ‘speedometer’ speaks volumes over a simple paper report. • An interactive table that allows the user to drill down into detail data with the click of a mouse can answer questions that might take months to answer in a traditional system.
Freedom from IS department resource limitations • One of the problems with computer systems is that they usually require computer experts to use them. • When a report is needed, the requesting manager calls the IS department. IS then assigns a programmer to write a program to produce the report. The report can be created in a few days or, in extreme cases, in over a year. • With a warehouse, users create most of their reports themselves. Thus, if a manager needs a report for a meeting in half an hour, they, or their assistant, can create that report in a matter of minutes.