310 likes | 547 Views
Data Warehouse Concepts. Alan Schneider. Overview. Organizational Operating Levels Operational Systems Database Focus Transition What is a Data Warehouse (DW)? Why is a DW Required Purposes of a DW Data Marts Data Mining Cost-Benefit Analysis. Organizational Operating Levels.
E N D
Data Warehouse Concepts Alan Schneider
Overview • Organizational Operating Levels • Operational Systems • Database Focus Transition • What is a Data Warehouse (DW)? • Why is a DW Required • Purposes of a DW • Data Marts • Data Mining • Cost-Benefit Analysis
Organizational Operating Levels • The Business Cycle below shows us that any enterprise must operate at two levels • Operational (i.e., the day-to-day running of the business); • Strategic (i.e., the definition of organization's vision, goals and objectives).
Operational Systems • Meet the “Data” requirements of organizations • With OLTP applications organizations: • Could meet customer needs while enforcing business policies consistently. • Could grow without significantly expanding its manpower base.
Operational Systems • Due to the dynamic nature of business operational systems are unable to provide decision-makers with the “Information” they needed when they needed it
Failure of Operational Systems • With only OLTP systems • The data needed to produce the report are typically scattered throughout different operational systems • Users have to go through reports/output manually in an attempt to integrate and then derive and the information they really need
Failure of Operational Systems • With only OLTP systems • The processing required to extract the data from each operational system demands so much of the system resources that the user must wait until non-operational hours before running the queries/report. • These delays are dangerous for the organization because when the report is finally produced, the data may be inconsistent, inaccurate, or obsolete.
Database Focus Transition • Once the day-to-day “Data” needs were being met by the OLTP systems the focus shifted to meeting the Decisional business requirements of an enterprise. • Note: Some enterprises have actually succeeded in developing and deploying data warehouses within their respective organizations, long before the term data warehouse even became fashionable.
What is a Data Warehouse? • “A collection of integrated, subject-oriented databases designed to supply the information required for decision-making" • A data warehouse brings together data from the various operational systems to provide an integrated view of the “customer” and the full scope of their relationship with the organization
What is a Data Warehouse? • Integrated • A financial institution has the following systems to support the process: • Deposit systems • Loan systems • General ledger • Each of these operational systems records different types of business transactions and enforces the policies of the enterprise regarding these transactions. • If each of the operational systems has been custom built or an integrated system was not implemented as a solution, then it is unlikely that these systems are integrated.
What is a Data Warehouse? • Subject Oriented • A data warehouse goes beyond traditional data requirements and provides information views by: • Focusing on enterprise-wide subjects such as customers, sales, and profits. • These subjects span both organizational and process boundaries and require information from multiple sources to provide a complete picture.
What is a Data Warehouse? • Databases • Although the term data warehousing technologies is used to refer to the gamut of technology components that are required to plan, develop, manage, implement, and use a data warehouse, the term data warehouse itself refers to a large, read-only repository of data.
What is a Data Warehouse? • Databases • At the very heart of every data warehouse lie the large databases that store the integrated data of the enterprise, obtained from both internal and external data sources. • The term internal data refers to all data that are extracted from the operational systems of the enterprise. • External data are data provided by third-party organizations, including business partners, customers, government bodies, and organizations that choose to make a profit by selling their data (e.g., credit bureaus). • Also stored in the databases are the metadata that describe the contents of the data warehouse.
What is a Data Warehouse? • Atomic and Summarized Data • Data warehouses hold data at different levels of detail. • Data at the most detailed level, i.e., the atomic level, are used to derive the summarized of aggregated values. • Aggregates (pre-summarized data) are stored in the warehouse to speed up responses to queries at higher levels of granularity. • If the data warehouse stores data only at summarized levels, its users will not be able to drill down on data items to get more detailed information. However, the storage of very detailed data results in larger space requirements.
Why is a DW Required • Users are better able to examine, derive, summarize, and analyze data at various levels of detail, over different periods of time • Unlike operational systems which are normalized to preserve and maintain data integrity, a data warehouse is designed in a de-normalized manner to better support the usability of the data warehouse
Why is a DW Required • The database is de-normalized to mimic a business user's dimensional view of the business. • For example, • A finance manager is interested in the profitability of the various products of a company • A product manager will be more interested in the sales of the product in the various sales regions.
Why is a DW Required • Each Unit of Data Is Relevant to a Point in Time • The time-stamping of each fact makes it possible for decision-makers to recognize trends and patterns in customer or market behavior over time
Why is a DW Required • By providing business users with the ability to dynamically view more or less of the data on an ad hoc, as-needed basis, the data warehouse eliminates delays in getting information and removes the IT professional from the report-creation loop • A decision-maker starts with a short report that summarizes the performance of the enterprise. • When the summary calls attention to an area that bears closer inspection, the decision-maker should be able to point to that portion of the report, then obtain greater detail on it dynamically, on an as-needed basis, with no further programming.
Purposes of a DW 1. To Provide Business Users with Access to Data • The data warehouse provides access to integrated enterprise data previously locked away in unfriendly, difficult-to-access environments. • Because of its integrated nature, a data warehouse spares business users from the need to learn, understand, or access operational data in their native environments and data structures.
Purposes of a DW 2. Provide One Version of the Truth • The data in the data warehouse are consistent and quality assured before being released to business users. • Note that "one version of the truth" is often possible only after much discussion and debate about the terms used within the organization.
Purposes of a DW 3. Record the Past Accurately • Many of the figures and numbers that managers receive have little meaning unless compared to historical figures. For example • Comparing the company's performance now against its performance last year are quite common. • Reports that show the company's performance for the same month over the past three years are likewise of interest to decision-makers. • A data warehouse is used to record the past accurately; the OLTP systems is free to focus on correctly recording current transactions and balances. • Actual historical values are not stored on the operational system nor derived by adding or subtracting transaction values against the latest balance. • Historical data are loaded and integrated with other data in the warehouse for quick access.
Purposes of a DW 4. Slice and Dice Through Data • Dynamic reports allow users to view warehouse data from different angles, at different levels of detail. • Business users with the means and the ability to slice and dice through warehouse data can actively meet their own information needs. • The ready availability of different data views also improves business analysis by reducing the time and effort required to collect, format, and distill information from data.
Purposes of a DW 5. Separate Analytical and Operational Processing • Decisional processing and operational information processing have totally divergent architectural requirements. • Attempts to meet both decisional and operational information needs through the same system or through the same system architecture merely increase the brittleness of the IT architecture and will create system maintenance nightmares. • Data warehousing disentangles analytical from operational processing by providing a separate system architecture for decisional implementations. This makes the overall IT architecture of the enterprise more resilient to changing requirements.
Data Marts • Unlike data warehouses, a data mart typically contains only a subset of the data that would have been stored in an enterprise data warehouse. • Data mart data are selected to meet the specific needs of a subset of the organization. • It is not unusual to find a data mart developed and implemented for a department, a division, or a geographical location. • Data marts are often preferred by enterprises as a first step to building a data warehouse, since these can be used as a "proof of concept." • Initial success with the data mart can be used to convince skeptics in the enterprise and loosen the enterprise's purse
Data Mining • Looks for meaningful patterns and relationships among data. • Helps identify potential customers based on their prior purchases • Focuses on methods of obtaining valuable business knowledge from a data warehouse
Cost-Benefit Analysis / Return on Investment • Senior management typically requires a cost-benefit analysis (CBA) or a study of return on investment (ROI) prior to embarking on a data warehousing initiative. • Although the task of calculating ROI for data warehousing initiatives is unique to each enterprise, it is possible to classify the type of benefits and costs that are associated with data warehousing.
Cost-Benefit Analysis / Return on InvestmentCosts • Data warehousing costs typically fall into one of four categories. These are: • Hardware. • This item refers to the costs associated with setting up the hardware and operating environment required by the data warehouse. • Software. • This item refers to the costs of purchasing the licenses to use software products that automate the extraction, cleansing, loading, retrieval, and presentation of warehouse data. • Services. • This item refers to services provided by systems integrators, consultants, and trainers during the course of a data warehouse project. • Internal staff costs. • This item refers to costs incurred by assigning internal staff to the data warehousing effort, as well as costs associated with training internal staff on new technologies and techniques.
Cost-Benefit Analysis / Return on InvestmentBenefits • Redeployment of staff assigned to old decisional systems. • The quantification of such costs in terms of staff hours and erroneous data may yield surprising results. • Benefits of this nature, however, are typically minimal, since warehouse maintenance and enhancements require staff as well. • At best, staff will be redeployed to more productive tasks.
Cost-Benefit Analysis / Return on InvestmentBenefits • Improved productivity of analytical staff due to availability of data • Analysts go through several steps in their day-to-day work: • Locating data • Retrieving data • Analyzing data to yield information • Presenting information • Recommending a course of action. • Unfortunately, much of the time (sometimes up to 40 percent) spent by enterprise analysts on a typical day is devoted to locating and retrieving data • The availability of integrated, readily accessible data in the data warehouse will significantly reduce the time that analysts spend with data collection tasks and increase the time they have available to actually analyze the data they have collected. • This leads either to shorter decision cycle times or improvements in the quality of the analysis.
Cost-Benefit Analysis / Return on InvestmentBenefits • Business improvements resulting from analysis of warehouse data. • The most significant business improvements in warehousing result from the analysis of warehouse data, especially if the easy availability of information yields insights heretofore unknown to the enterprise. • The goal of the data warehouse is to meet decisional information needs; it therefore follows naturally that the greatest benefits of warehousing are obtained when decisional information needs are actually met and sound business decisions are made both at the tactical and strategic level.
Cost-Benefit Analysis / Return on InvestmentROI Considerations • The costs and benefits associated with data warehousing vary significantly from one enterprise to another. • The differences are influenced by • The current state of technology within the enterprise • The culture of the organization in terms of decision-making styles and attitudes towards technology • The company's position in its chosen market vs. its competitors. • It’s important to note that data warehouse justification is often complicated by the fact that much of the benefit may take some time to realize and therefore is difficult to quantify in advance.