390 likes | 817 Views
CHAPTER 6: REQUIREMENTS AS THE DRIVING FORCE FOR DATA WAREHOUSING. CHAPTER OBJECTIVES Understand why business requirements are the driving force Discuss how requirements drive every development phase Specifically learn how requirements influence data design
E N D
CHAPTER 6: REQUIREMENTS AS THE DRIVING FORCE FOR DATA WAREHOUSING
CHAPTER OBJECTIVES • Understand why business requirements are the driving force • Discuss how requirements drive every development phase • Specifically learn how requirements influence data design • Review the impact of requirements on architecture • Note the special considerations for ETL and metadata • Examine how requirements shape information delivery
DATA DESIGN • Structure for Business Dimensions • Structure for Key Measurements • THE ARCHITECTURAL PLAN • Composition of the Components • Tools and Products • DATA STORAGE SPECIFICATIONS • DBMS Selection • Storage Sizing • INFORMATION DELIVERY STRATEGY • Queries and Reports • Types of Analysis • Information Distribution • Real Time Information Delivery
Fundamental principle Figure 6-1 Business requirements as the driving force.
DATA DESIGN • In the data design phase you come up with the data model for the following data repositories: • The staging area where you transform, cleanse, and integrate the data from the source systems in preparation for loading into the data warehouse repository. • The data warehouse repository itself.
Figure 6-2 Requirements driving the data model. Figure 6-2. The base of the pyramid represents the data model for the enterprise-wide data repository and the top half of the pyramid denotes the dimensional data model for the data marts. What do you need in the requirements definition to build the two halves of the pyramid? Two basic pieces of information are needed: the source system data models and the information package diagrams.
If you are adopting the practical approach of building your data warehouse as a conglomeration of conformed data marts, your data model at this point will consist of the dimensional data model for your first set of data marts. On the other hand, your company may decide to build the large corporate-wide data warehouse first along with the initial data mart fed by the large data warehouse. In this case, your data model will include both the data model for the large data warehouse and the data model for the initial data mart.
Structure for Business Dimensions In the data models for the data marts, the business dimensions along which the users analyze the business metrics must be featured prominently. In an information package diagram, the business dimensions are listed as column headings. For example, look at the business dimensions for Automaker Sales in Figure 6-3 Figure 6-3 Business dimensions in the data model.
Structure for Key Measurements Key measurements are the measures that are used for business analysis and monitoring. Users measure performance by using and comparing key measurements. For example, if an information package diagram has product, customer, time, and location as the business dimensions, these four dimensions will be four distinct components in the structure of the data model.
THE ARCHITECTURAL PLAN Planning the architecture: Involves reviewing each of the major architectural components Involves the interfaces among the various components How can the management and control module be designed to coordinate and control the functions of the different components? What is the information you need to do the planning? How will you know to size up each component and provide the appropriate infrastructure to support it? Of course, the answer is business requirements. All the information you need to plan the architecture must come from the requirements definition.
Composition of the Components Let us review each component and ascertain what exactly is needed in the requirements definition to plan for the data warehouse architecture. In the following list, the points under each component indicate the type of information that must be contained in the requirements definition to drive the architectural plan.
Figure 6-4 provides a useful summary of the architectural components driven by requirements. The figure indicates the impact of business requirements on the data warehouse architecture. Figure 6-4 Impact of requirements on architecture
Data Pollution Sources • System conversions and migrations • Heterogeneous systems integration • Inadequate database design of source systems • Data aging • Incomplete information from customers • Input errors • Internationalization/localization of systems • Lack of data management policies/procedures • Types of Data Quality Problems • Dummy values in source system fields • Absence of data in source system fields • Multipurpose fields • Cryptic data • Contradicting data • Improper use of name and address lines • Violation of business rules • Reused primary keys • Non-unique identifiers
Tools and Products In general, tools are available for the following functions: • Data Extraction and Transformation • Middleware • Data extraction • Data transformation • Data quality assurance • Load image creation • Warehouse Storage • Data marts • Meta data • Information Access/Delivery • Report writers • Query processors • Alert systems • Dashboards • Scorecards • DSS applications • Data mining
DATA STORAGE SPECIFICATIONS DBMS Selection Whatever your choice of the database management system may be that system will have to interact with back-end and front-end tools. The back-end tools are the products for data transformation, data cleansing, and data loading. The front-end tools relate to information delivery to the users. If you are trying to find the best tools to suit your environment, the chances are these tools may not be from the same vendors who supplied the database products. Therefore, one important criterion for the database management system is that the system must be open. It must be compatible with the chosen back-end and front-end tools. Broadly, the following elements of business requirements affect the choice of the DBMS:
Level of User Experience. If the users are totally inexperienced with database systems, the DBMS must have features to monitor and control runaway queries. If many of your users are power users, then they will be formulating their own queries. Types of Queries. The DBMS must have a powerful optimizer if most of the queries are complex and produce large result sets. Alternatively, if there is an even mix of simpleand complex queries, there must be some sort of query management in the databasesoftware to balance the query execution. Data Loads. The data volumes and load frequencies determine the strengths in the areas of data loading, recovery, and restart. Metadata Management. If your metadata component does not have to be elaborate, then a DBMS with an active data dictionary may be sufficient. Data Repository Locations. Is your data warehouse going to reside in one central location, or is it going to be distributed? The answer to this question will establish whether the selected DBMS must support distributed databases. Data Warehouse Growth. Your business requirements definition must contain information on the estimated growth in the number of users, and in the number and complexity of queries.
Storage Sizing How big will your data warehouse be? How much storage will be needed for all the data repositories? What is the total storage size? Answers to these questions will impact the type and size of the storage medium. How do you find answers to these questions? Let us summarize. You need to estimate the storage sizes for the following in the requirements definition phase: Data Staging Area. Calculate storage estimates for the data staging area of the overall corporate data warehouse from the sizes of the source system data structures for each business subject. Overall Corporate Data Warehouse. Estimate the storage size based on the data structures for each business subject. You know that data in the data warehouse is stored by business subjects. For each business subject, list the various attributes, estimate their field lengths, and arrive at the calculation for the storage needed for that subject.
Data Marts—Conformed, Independent, Dependent, or Federated. While defining requirements, you create information diagrams. A set of these diagrams constitutes a data mart. Each information diagram contains business dimensions and their attributes. The information diagram also holds the metrics or business measurements that are meant for analysis. Use the details of the business dimensions and business measures found in the information diagrams to estimate the storage size for the data marts. Multidimensional Databases. These databases support OLAP or multidimensional analysis. How much online analytical processing (OLAP) is necessary for your users? The corporate data warehouse or the individual conformed or dependent data mart supplies the data for the multidimensional databases.
INFORMATION DELIVERY STRATEGY • The impact of business requirements on the information delivery mechanism in a data warehouse is straightforward. During the requirements definition phase, users tell you what information they want to retrieve from the data warehouse. . • The broad areas of the information delivery component directly impacted by business requirements are: • Queries and reports • Types of analysis • Information distribution • Real time information delivery • Decision support applications • Growth and expansion
The following subsections contain some valuable tips for requirements definition in order to make the highly significant information delivery component effective and useful. Study these carefully. Queries and Reports • Find out who will be using predefined queries and preformatted reports. • Get the specifications for the production and distribution frequency for the reports. How many users will be running the predefined queries? • The second type of queries is the users formulate their own queries and they themselves run the queries. • The set of reports in which the users supply the report parameters and print fairly sophisticated reports themselves. • Get as many details of this type of queries and this type of report sets as you can.
Types of Analysis Most data warehouse and business intelligence environments provide several features to run interactive sessions and perform complex data analysis. Analysis encompassing drill-down and roll-up methods is fairly common. Review with your users all the types of analysis they would like to perform. Get information on the anticipated complexity of the types of analysis. In addition to the analysis performed directly on the data marts, most of today’s data warehouse and business intelligence environments equip users with OLAP. Using the OLAP facilities, users can perform multidimensional analysis and obtain multiple views of the data from multidimensional databases. This type of analysis is called slicing and dicing.
CHAPTER SUMMARY • Accurate requirements definition in a data warehouse project is many times more important than in other types of projects. Clearly understand the impact of business requirements on every development phase. • Business requirements condition the outcome of the data design phase. • Every component of the data warehouse architecture is strongly influenced by the business requirements. • In order to provide data quality, identify the data pollution sources, the prevalent types of quality problems, and the means to eliminate data corruption early in the requirements definition phase itself. • Data storage specifications, especially the selection of the DBMS, are determined by business requirements. Make sure you collect enough relevant details during the requirements phase. • Business requirements strongly influence the information delivery mechanism. • Requirements define how, when, and where the users will receive information from the data warehouse.