390 likes | 694 Views
Chapter 6: Designing and Building the Data Warehouse. Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas. 6-1: The Enterprise Model Approach To DW Design. The first step in planning is to define the information requirements of the enterprise.
E N D
Chapter 6: Designing and Building the Data Warehouse Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas © 2003, Prentice-Hall
6-1: The Enterprise Model Approach To DW Design • The first step in planning is to define the information requirements of the enterprise. • Enterprise model – a number of separate models that, when properly integrated, provide a coherent picture of the enterprise. • One important element of the EM is the corporate process model, a highly-structured description of the elementary processes in the enterprise. • Another is the corporate data model, which describes the entities generated by or required by the processes. © 2003, Prentice-Hall
Design Strategies • Top-down DW design – the data warehouse design is based on the enterprise model itself. It implies a strategic, rather than operational, perspective of the data. • Bottom-up DW design – focuses more on making use of the data available in the current system. This is less effort than the top-down approach, but may end up with a DW that does not satisfy all of the organization’s information needs. © 2003, Prentice-Hall
The Concept of Enterprise Integration Integration must occur on three distinct levels: • Horizontal integration: each application is fully integrated within its own boundaries • Vertical integration: the application is in harmony with its stated business requirements. • Enterprise integration: the data and processes throughout the organization, and all interactions between them, will be standardized. © 2003, Prentice-Hall
Entity identification – what are the building blocks? ER diagramming – how do entities relate to others? Process identification – what is the workflow? Entity life cycle analysis – does work modify entities? Event analysis – how do events cluster to processes? Association diagramming – how do objects interact? Critical factor analysis – what is needed for success? SWOT analysis – what are strengths, weaknesses? Information needs analysis – what data does each business area need most? Systems evaluation – are current systems sufficient? Activities Associated With EM Construction © 2003, Prentice-Hall
6-2: The Data Warehouse Project Plan First on the agenda is a readiness assessment. Readiness has five key factors: • Strong management sponsorship – perhaps the most important factor • Compelling business motivation – the value of a DW is what it does for the business • Quality of the IS-business partnership – the IS group needs to be a central part of the business © 2003, Prentice-Hall
6-2: The Data Warehouse Project Plan (cont.) First on the agenda is a readiness assessment. Readiness has five key factors: • Culture of the organization – the organization must place a high value on information and analysis • DW feasibility – are the current data accurate and standardized enough to make integration feasible? © 2003, Prentice-Hall
A Litmus Test for DW Readiness Kimball, et al developed some tests that make it relatively easy to determine readiness. Table 12-2 contains full details, but some tests are: For management sponsorship: low-level readiness is signaled by not being able to access top execs or their lack of understanding for what DW development entails. For the IS-business partnership: low-level readiness is signaled by a firm using too many outside IS consultants, or if the DW team was formed without any IS participation. © 2003, Prentice-Hall
Addressing DW Readiness Shortfall • If the litmus test identifies only a few shortfalls, there are several techniques that can be applied to address lack of readiness. • A business requirements analysis performed at a high level can identify the areas to target. • Another approach is to do a business requirements prioritization which will help us determine in what order to proceed. • We can do a pilot study or isolated rollout of technology as a proof of concept. © 2003, Prentice-Hall
Example of a Project Prioritization Matrix Project 6 would be the first choice for implementation, then Project 7. The two lowest in rank are 1 and 8. © 2003, Prentice-Hall
Defining the DW Project Scope • The scope establishes a boundary for the project and defines both what it is and what it is not. • One of the first things to do is decide “what we are scoping”. Near term? Three years? • An unwritten rule is that the project manager wants to make as few promises as possible. • You want to avoid “scope creep”, where the project continues to expand. © 2003, Prentice-Hall
Guidelines for Developing Initial Scope • It should be a joint project between IS and other business functions. • The initial scope should be meaningful and manageable. • The initial focus should be on a single business requirement from a single process. • The number of initial users should be limited. • Success criteria should be developed at the start, along with the scope. © 2003, Prentice-Hall
Defining the Business Justification • Once the scope is established, the next step is to develop the business case. • This is really no mystery, just an identification of project (1) anticipated benefits and (2) likely costs. • If a project cannot be justified before it begins, no such justification will be coming after money and time are invested in it. © 2003, Prentice-Hall
Issues in Making the Business Case • The best way to view the DW project is as a major capital investment by the corporation. • There are three perspectives from which to argue for the project: facts, faith and fear. • During the 1980s, when companies made heavy IS investments, there really was no increase in productivity. On the other hand, studies have shown that IS staffers were the company’s most productive. • This productivity paradox is perhaps due to the fact that many IS investments were not justifiable in the first place. © 2003, Prentice-Hall
Three bases of project Justification • “The data warehouse project will have a net present value of $753,000.” • “The data warehouse will yield a minimum reduction in operating cost of $193,000 annually.” • “The estimated increase in market share is 14.7 percent within the first 24 months of operation.” • “A data warehouse is part of the infrastructure, we can’t cost justify it like a new fleet of trucks.” • “It seems reasonable to assume that this data warehouse will reduce our costs of servicing this market segment.” • “Trust me. This is why you hired me as IT director.” • “Our competitors are doing this even as we speak.” • “Our shareholders will view us as technologically behind if we don’t do this now.” • “We have a small window of opportunity here and we are wasting precious time trying to decide.” Facts Faith Fear © 2003, Prentice-Hall
Economic Feasibility Analysis • This activity is a type of cost-benefit analysis, complicated by the fact that many benefits are intangible and hard to measure. • Tangible benefits are either cost reductions or revenue increases. • Intangible benefits include improved morale, increased product quality, decrease in time to market, reduction in turnover, increased competitive advantage and more timely information. © 2003, Prentice-Hall
Tangible and Intangible Costs • Tangible development costs include salaries, consultant fees, hardware and software purchases and data conversion. • Tangible operating costs include annual licenses, upgrades, repairs, user training and depreciation. • Intangible costs include disruption to environment, loss of goodwill, reduction in morale and diversion of attention to daily responsibilities. © 2003, Prentice-Hall
An Example of Measuring Intangibles Suppose we believe that there is about a 60 percent chance that one out of every 10 workers will be distressed during the initial implementation. Our normal profits are $2,000,000, and we believe they would be zero if the entire workforce was distressed. Profit reduction is: (.1)(2000000) = $200,000 and it has a 60% chance of occurring, so we estimate the “distress” cost at (.6)(200000) = $120,000 Is this right? Probably not! But, isn’t it better to have an estimate based on reasonable assumption than making no estimate at all? © 2003, Prentice-Hall
Economic Feasibility Measures • Assessment of feasibility has to recognize the time value of money since many costs occur immediately while benefits accrue over several years. • On the next two slides we show a project that returns a net present value of $890,806 over five years. The internal rate of return is 38% and breakeven is estimated at 2 years and 7 months. © 2003, Prentice-Hall
Spreadsheet Model of NPV Analysis © 2003, Prentice-Hall
Example of Graphical Breakeven Analysis © 2003, Prentice-Hall
Developing the DW Project Plan • Staffing issues: Table 12-8 lists 14 different staffing roles in a typical DW project. Some key roles are project manager, business systems analyst, database manager, application developer and quality assurance analyst. © 2003, Prentice-Hall
Developing the DW Project Plan (cont.) • The Formal Plan:two key characteristics of a well-designed DW project plan are high integration and high detail. Although each one is different, the plan should contain a list of required resources, an estimate of project schedule, estimated start and end dates, a clear list of task dependencies and a set of flags that will be signals that the project is late. © 2003, Prentice-Hall
6-3: Specifying the Analysis and Design Tools • While an entity-relationship modeling approach from relational database design could be used, the dimensional modeling approach to logical design is more often used for a data warehouse. • One reason is that an enterprise-level ERM would be too complex to understand. • There are three basic elements in the dimensional model (1) a central table called the fact table, (2) two or more smaller tables called the dimensional tables, and (3) and a set of relationships called the star join. © 2003, Prentice-Hall
Dimensional Model for a Retail Sales Forecast © 2003, Prentice-Hall
ERM versus DM • The relationship of the two is that an ER diagram can be decomposed into several DM diagrams. • The ERM represents multiple processes while a DM just one. • The star join is flexible and scalable. New data rows can be added to existing tables, and entirely new tables can be accommodated. © 2003, Prentice-Hall
Designing a DW Fact Table In general, there are four basic steps: • Choosing the data mart for the small group of end users we deal with. • Fact table granularity (the smallest defined level of data in the table) is determined. • Fact table dimensions are selected. • Determine the facts for the table. In most cases, the granularity is at the transaction level, so the fact is the amount. © 2003, Prentice-Hall
6-4: Warehouse Architecture Specification and Development The next phase is the design of the technical architecture. Certain elements need to be included: • Common sources: a data mart needs to be stored in a central location so others can use it. • Common dimensions: these can be shared as well. • Common business rules: the DW team needs to identify as many of these as possible. © 2003, Prentice-Hall
6-4: Warehouse Architecture Specification and Development (cont.) The next phase is the design of the technical architecture. Certain elements need to be included: • Common semantics: there should be enterprise-wide definition of terms. • Common metrics: things should be measured the same way across the enterprise. © 2003, Prentice-Hall
DW Architecture Key Component Areas • Data architecture – each area in a business draws on different dimensions. Where they intersect they need to be defined the same (the customer that buys is the same we bill). • Infrastructure architecture – issues of size, scalability, and capacity need to be addresses. • Technical architecture – this is driven by the metadata catalog. Services should draw parameters from tables, not hard-code them. © 2003, Prentice-Hall
The Data Staging Process An important part of the technical architecture is the data staging process, which covers five major areas: • Extraction of data from multiple sources • Transformation and cleansing • Loading the data, often to multiple targets • Security access and encryption policies • Job control, which includes scheduling, monitoring and notification. © 2003, Prentice-Hall
A Word About Metadata • In the sequence of steps from specifying the warehouse model, to implementing it, to using it for queries, few steps actually involve “real” data. All the rest involve metadata. • For architectural definition, the major elements of the metadata catalog should include business rules, transformation information, operations information and tool-specific information. © 2003, Prentice-Hall
Developing a DW Technical Architecture • An architecture requirements document is drafted first. Next to each business requirement are listed the architectural implications. • Developing a DW architecture is a difficult task and needs to be faced with the proper approach. © 2003, Prentice-Hall
Developing a DW Technical Architecture (cont.) • There are numerous prescriptions for this, but a simple four-layer process is often sufficient: • business requirements • technical architecture • standards definition • application product development. © 2003, Prentice-Hall
Benefits of a Sound Architecture • Provides an organizing framework – shows where the components are and how they fit • Improved flexibility and maintenance – allows plug and play, permits quick addition of new data sources • Faster development and reuse – developers are better able to understand the DW process and content © 2003, Prentice-Hall
Benefits of a Sound Architecture (cont.) • Management and communications tool – sets our expectations and defines our responsibilities • Coordinate parallel efforts – multiple independent efforts have a better chance to converge © 2003, Prentice-Hall
6-5: Data Warehouse Project Success Factors Building a DW is much like building any other complex information system in terms of what makes it a success. Some guidelines for a successful project are: • Have a strong project sponsor • Generate user buy-in • Identify business needs • Start with a narrow scope • Understand the organizational culture • Create clear requirements documents • Define all data and communication requirements • Prototype often © 2003, Prentice-Hall