570 likes | 584 Views
This chapter explores the essential steps and concepts in defining and recording business requirements for a data warehouse, including the impact on architecture and data design.
E N D
Data Warehouse Fundamentals Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen 1
Chapter 5- Objectives • Learn what is the definition of the business requirements • Understand the role of business dimensions related to DW business requirements • Learn specifically the steps in defining and recording DW business requirements • Review methods for gathering requirements (JAD, Interviews and Sampling) • Discuss briefly architecture concepts impacted by business requirements
Definition of The Business Requirements • The definition of requirements is the user’s statement of how he or she wants to do business, and the information required to support his or her new methods of operations.
Definition of The Business Requirements The requirements can be broadly divided into two areas: 1 Functional requirements—written in user terminology since it is user operations that are being described. 2 Non-functional requirements –these are the limitations and demands imposed upon the computing solutions; such as architectural plan, data storage specifications and information system performance expectations.
Requirements As the Driving Force for Data Warehousing • Understand why business requirements are the driving force • Discuss how requirements drive every development phase • Specifically Learn how requirements influence data design • Review the impact of requirements on architecture • Note the special considerations for ETL and metadata • Examine how requirements shape information delivery
Business Requirements As the Driving Force Business Requirements Maintenance Planning & Management Deployment Design Architecture Infrastructure Data Acquisition Data Storage Information Delivery Construction Architecture Infrastructure Data Acquisition Data Storage Information Delivery
Dimensional Nature of Business Data The business data of sales units (fact) is measured and analyzed in three dimensional. Product Geography Time
Examples of Business Facts and Dimensions Manufacturing Company Supermarket Chain Supermarket Chain Cust-ship-to Time Time Promotion Ship from Sale Unit Shipment Ship Mode Store Product Deal Product Airline Company Insurance Business Customer Time Agent Time Flight Claim Frequent Flyer Flights Claims Fare Class Insured Party Status Airport Status Policy
Defining and Recording Information Requirements for a Data Warehouse • Nine-Step Methodology includes the following steps: Step 1: Choosing the process Step 2: Choosing the grain Step 3: Identifying and conforming the dimensions Step 4: Choosing the facts Step 5: Storing pre-calculations in the fact table Step 6: Rounding out the dimension tables Step 7: Choosing the duration of the database Step 8: Tracking slowly changing dimensions Step 9: Deciding the query priorities and the query modes.
Step 1: Choosing The Process (Subject Area) • The process (function) refers to the subject matter of a particular data mart. • First data mart built should be the one that is most likely to be delivered on time, within budget, and to answer the most commercially important business questions.
Subject Area • Selecting the first subject area or areas to be populated Use the enterprise level data model in selecting appropriate subject area(s) • Three Options: -- Implement a single subject area (best option) -- Implement a subset of a subject area -- Implement a subset of several subject areas (most common) • Determine how much data should be loaded and its variety
Step 2: Choosing The Grain • Decide what a record of the fact table is to represent. • Identify dimensions of the fact table. The grain decision for the fact table also determines the grain of each dimension table. • Also include time as a core dimension, which is always present in star schemas. Due to disk space constraint, data selected must be time relevant in terms of trend, predictability, and profitability for the enterprise.
Step 3: Identifying And Conforming The Dimensions • Dimensions set the context for asking questions about the facts in the fact table. • If any dimension occurs in two data marts, they must be exactly the same dimension, or one must be a mathematical subset of the other. • A dimension used in more than one data mart is referred to as being conformed (shared).
Step 4: Choosing The Facts • The grain of the fact table determines which facts can be used in the data mart. • Facts should be numeric and additive. • Unusable facts include: • non-numeric facts • non-additive facts • fact at different granularity from other facts in table.
Fact Criteria Weight the Fact attributes based upon the following criteria: • They exhibit measurable results to the Users and Management. • They are visible within the business and through management. • They are manageable.
Subject Area Subject areas are collections of like data that support analysis of the major subjects in a business. Election criteria: • They consist of two or more attributes. • They are essential to the successful operation of the target system or business area to meet client objectives. • They can be defined by governing business rules.
Step 5: Storing Pre-Calculations In The Fact Table • Once the facts have been selected each should be re-examined to determine whether there are opportunities to use pre-calculations.
ADD DERIVED DATA • Benefits Less space used Enhanced performance Breaking_lease Percentage_of_breaking_lease (< 3_months) Percentage_of_breaking_lease (>3 but < 6 months) Percentage_of_breaking_lease (>6 but <9 months) Percentage_of_breaking_lease (>9 but <12 months) Percentage_of_breaking_lease (> 12 months)
Add Summarization Schemes • Simple summation • Summation by group • Aggregation • Vertical summarization
Simple Summation --Add Summarization Schema Individual Daily sales Date Product Qty Sales $ Jan 1 nuts 100 300 Jan 1 nuts 200 600 Jan 2 nuts 300 900 Jan 2 nuts 100 300 Jan 3 Nuts 50 150 Jan 3 Nuts 40 120 Daily Sales Summary Date Product Qty Sales $ Jan 1 Nuts 300 900 Jan 2 Nuts 400 1,200 Jan 3 Nuts 90 20
Summation By Group Group data attributes based on usage and stability. • Group stable and slowly changing data all in one table • Group unstable and frequently changing data all in another table
Aggregation Aggregation is used to create data marts. For instance, a group of users frequently perform analysis comparing sales across geographic regions, broken by product line. If a data mart were created that stores the sales data already aggregated to the desired level, the users’ queries would be simpler.
Aggregation Add up amounts by day in sql: SELECT date, sum (amt) FROM SALE GROUP BY date p sale Store date amt • ans date sum p1 p2 p1 p2 1 1 2 1 1 2 4 3 c1 c2 c3 c1 6 4 1 2 Roll Up Drill Down
Vertical Summarization Summarization building upon a single dimensional theme: Monthly renters • Total # of all renters • Total # of new renters • Total rental income Monthly sales • Staff name • Total sales $ • Total houses sold
Step 6: Rounding Out The Dimension Tables • Text descriptions are added to the dimension tables. • Text descriptions should be as intuitive and understandable to the users as possible. • Usefulness of a data mart is determined by the scope and nature of the attributes of the dimension tables.
Step 7: Choosing The Duration Of The Database • Duration measures how far back in time the fact table goes. For ex. Insurance &Tax Considerations. • Very large fact tables raise at least two very significant data warehouse design issues. • Often difficult to source increasing old data. • It is mandatory that the old versions of the important dimensions be used, not the most current versions, known as the ‘Slowly Changing Dimension’ problem.
Step 8: Tracking Slowly Changing Dimensions • Slowly changing dimension problem means that the proper description of the old dimension data must be used with old fact data. • Often, a generalized key must be assigned to important dimensions in order to distinguish multiple snapshots of dimensions over a period of time.
Step 9: Deciding The Query Priorities And The Query Modes • Most critical physical design issues affecting the end-user’s perception includes: • physical sort order of the fact table on disk • presence of pre-stored summaries or aggregations. • Additional physical design issues include administration, backup, indexing performance, and security.
Criteria For Assessing The Dimensionality Of A Data Warehouse • Criteria proposed by Ralph Kimball to measure the extent to which a system supports the dimensional view of data warehousing. • Twenty criteria divided into three broad groups: architecture, administration, and expression.
Architectural Criteria Architectural criteria describes way the entire system is organized. • Explicit declaration • Conformed dimensions and facts • Dimensional integrity • Open aggregate navigation • Dimensional symmetry • Dimensional scalability • Sparsity tolerance
Administration Criteria Administration criteria are considered to be essential to the ‘smooth running’ of a dimensionally-oriented data warehouse. • Graceful modification • Dimensional replication • Changed dimension notification • Surrogate key administration • International consistency
Expression Criteria Expression criteria are mostly analytic capabilities that are needed in real-life situations. • Multiple-dimension hierarchies • Ragged-dimension hierarchies • Multiple valued dimensions • Slowly changing dimensions • Roles of a dimension • Hot-swappable dimensions • On-the-fly fact range dimension • On-the-fly behaviour dimension
Business Requirements (Use Automaker Sales as an example) In order to get an idea of the data to be used by the sales and Inventory department, a facilitation session was held with 15 key end users and the IT data warehouse team. The following business questions were generated from that meeting: • What is the sales trend in quantity and dollar amounts sold each Make, Model, Series and Color for a specific dealer, for each
Matching User Requirements to DW Data Requirements (Develop Fact Table) Primary Key • dealer_id • month_year • sales_area_id • make • model • series
Determine Dimensions & Attributes Dimensions • sales_area_dim • sales_time_dim • dealer_dim Attributes • dealer_mms_sales_qty • dealer_mms_sales_dollar_amt • dealer_ytd_mms_sales_qty • dealer_ytd_mms_sales_amt • dealer_inventory_qty
Collecting The Business Requirements via JAD Sessions JAD (Joint Application Development) vs. Traditional Way of Gathering Requirements • JAD sessions (also called facilitated session) are used to gather information and feedback and confirm the results of requirements gathering. • JAD sessions replace the traditional way of conducting a series of interviews on a one-to-one basis with the users. Advantages: Achieving consensus during the session when multiple sources of information exist, raising and addressing issues or assigning them for resolution, and immediately confirming information.
JAD Session • JAD sessions are used to scope the project. Each session should last two to three day. They are very focused and fast-paced. • JAD sessions can be very formal and follow strict guidelines or be informal group sessions.
JAD - Roles Whether they are formal or informal, there are four necessary roles to be filled: • Facilitator The Facilitator is the session leader. It is the facilitator’s responsibility to ensure that the objectives of the sessions are met. • Scribes(s) Scribes are responsible for recording the minutes of the session and optionally constructing deliverables using an automated tool as the session progresses.
JAD - Roles • User The users provide knowledge specific to the scope of the project. • Developers Developers are the team members who will be building the system.
JAD Session The session is divided into three segments: • Introduction: Welcoming remarks; description of the facilities such as rest room locations, messages, reviewing the agenda and setting expectations. • Conducting the session: To confirm deliverables set out in the session objectives. • Wrapping up the session: By summarizing progress towards the objectives; reviewing the agenda for the next one and obtaining feedback from the participants.
JAD Session Potential drawbacks • The commitment of a large block of time for all participants • Requirements collected could be less than satisfactory due to unpredictability of the JAD session or organizational culture not sufficiently developed to enable the concerted efforts required to be productive in a JAD setting.
Five Steps in Interview Preparation • Reading background material • Establishing interview objectives • Deciding when to interview • Preparing the interviewee • Deciding on question type and structure
Two Types of Questions – Open-End Questions vs. Closed Questions Open-ended interview questions Open describes the interviewee’s options for responding. They are open. Advantages: • Putting the interviewee at ease • Allowing more spontaneity Disadvantages: • Possibly losing control of the interview • May not get the types of answers you want
Two Types of Questions – Open-End Questions vs. Closed Questions Closed interview questions Such as “ How many subordinates do you have? Benefits: • Getting to relevant data • Keeping control over the interview Drawbacks: • Failing to obtain rich detail • Intimidating the interviewee
Three Basic Ways of Structuring Interviews • Pyramid Structure: Starting from closed questions, then gradually expand into open territory. • Funnel Structure: The reverse of pyramid structure approach. • Diamond-Shaped: A combination of the two above structures.
The Needs for Sampling • Containing costs • Speeding up the data gathering • Improving effectiveness • Reducing bias
Sampling Design Four steps: • Determine the data to be collected or described • Determine the population to be sampled • Choose the type of sample • Decide on the sample size
Kinds of Information Sought in Investigation Type of hard data (other than interviewing and observation) - Quantitative Data • Reports for decision making • Performance reports • Records • Data capture forms
Kinds of Information Sought in Investigation Qualitative Data • Memos • Signs in bulletin boards and in work areas • Corporate websites • Manuals • Policy handbooks