240 likes | 340 Views
Class Agenda: 03/26 – 4/02. Review Test 1 Discuss Analytical Reports (essay questions and class reports) Contrast transactional vs. data warehouse design Discuss process of data warehouse design Contrast different approaches to data warehouse design Design a data warehouse.
E N D
Class Agenda: 03/26 – 4/02 • Review Test 1 • Discuss Analytical Reports (essay questions and class reports) • Contrast transactional vs. data warehouse design • Discuss process of data warehouse design • Contrast different approaches to data warehouse design • Design a data warehouse
Written Analysis • An analysis requires: • Thesis statement; • Key points that are used to analyze the thesis statement: • Facts or • Fact-like data (examples, cases, logical reasoning); • Interpretation of the key points; and • Recommendations.
What is a thesis statement? • A one or two sentence presentation of the analysis. • A strong thesis statement: • Takes a stand. • Justifies why there is a controversy and presents the key argument. • Expresses one main idea. • Is specific.
Strong and weak thesis statements • Weak • There are benefits and drawbacks to implementing business intelligence in XYZ organization. • This paper analyzes the need for initial and ongoing governance of business intelligence within XYZ. • Organizations such as XYZ should enhance their customer relationship management by using a business intelligence system. • Strong • Implementing an enterprise data warehouse may be costly and time-consuming, but XYZ needs integrated data and business intelligence analytical tools to improve market share. • While it may be difficult to gain agreement on the data composition of a data warehouse, that initial implementation effort pales in comparison to the long-term support required to maintain a comprehensive business intelligence system. This paper analyzes how XYZ supports long-term use of their BI system with an effective governance structure.
Question #1: How does big data differ from not-so-big-data? Is an organization that is competing on analytics using big data? Is big data required for an organization to be using business intelligence effectively? What is the essence of these questions? How do we differentiate the terms of our field: big data, business intelligence, competing on analytics? Thesis statement possibilities: Organizations use data to support decision making at differing levels; some organizations use complex algorithms that require sophisticated data sets while others are content to view simple reports. An organization may make data driven decisions without incorporating the use of big data required to support complex statistical algorithms. The concept of “big data” is in its infancy and its definition is based on the context of its use. What one organization might regard as “big data” another would view as “regular old data.” Both organizations could use their relative data sets to support effective decision making. There is a clear difference between organizations who are using “big data” and those that are simply making decisions supported by data sets. Both types of organizations could be using business intelligence systems. However, organizations with big data use not only a large volume of data, but incorporate both internal and external data, have data sets that change quickly, and gather data from many diverse data sources.
Goals for Transaction Database Design • Make required data available to support business processes. • Protect the integrity of the data. • Reduce data redundancy. • Prevent data anomalies. • Provide for change. • Prevent inflexible data structures. • Anticipate changes.
How do we achieve those goals? • Effective systems analysis and design techniques. • Relational DBMS. • Normalization.
Goals for data warehouse design • Make complete and accurate information easily accessible. • Present information consistently. • Be adaptive and flexible to change. • Provide reasonable and expected performance for information to support decision making. • Protect/secure information.
How do we achieve those goals? • More effective systems analysis and design techniques. • Knowledge of required decision support systems. • Appropriate DBMS. • Appropriate use (or non-use) of normalization.
Three different data models • Transaction (operational) data model: Contains current data required by separate and/or integrated operational systems. Supports the transactional processing of the organization. Is frequently used to support day-to-day decision making. 3rd normal form. • Reconciled (enterprise data warehouse) data model: Contains detailed, current data intended to be the single, authoritative source for all decision support applications. Usually in 3rd normal form. • Derived (data mart) data model: Contains data that are selected, formatted and aggregated for end-user decision support applications. Star or snowflake schema. Probably not normalized.
Reconciled (EDW) Independent of specific decisions Centralized control; usually owned by IT Historical Not usually summarized Normalized Flexible Many data sources Long life Starts large, becomes larger Derived (Data Mart) Specific decisions One central subject Usually accessed directly by users; usually decentralized into user area Closely defined subject area Detailed and/or summarized Usually denormalized Restrictive – few sources Short life span Starts small, becomes large Reconciled and Derived Data Models
Enterprise Data Warehouse (Bill Inmon) Focus is on enterprise subjects that will be needed to support comprehensive decision making. Emphasis on creating design that is consistent among subject areas. Implementation is of a data mart. Uses ERD for modeling. Relies on comprehensive blueprint for interrelation of data. Interrelated Data Marts (Ralph Kimball) Focus is on business subject area for data warehouse. Emphasis on creating simple design that can be implemented quickly. Implementation is of a data mart. Uses “dimensional model” for modeling. Kind of like an ERD with UML-type aspects. Relies on consistent interrelation of data by integration of existing data models. Two general approaches to design
Compare/Contrast Approaches • Similarities: • Both focus on subject areas for development of data model. • Both require extensive input from data warehouse stakeholders. • Both produce a subject-oriented, non-volatile, time-related data warehouse. • Both try to quickly implement a prototype data mart. • Differences: • Inmon creates a more integrated and consistent data warehouse by attempting to design an enterprise-wide warehouse at the beginning of the first data warehouse project. This is called a “reconciled” DW design. • Kimball relies on future project teams referencing existing data warehouse models for new projects.
What do both approaches yield? • A design for a data mart. • The design for a data mart is based on the concept of a data warehouse “cube.” • A cube is a logical construct containing a “fact” table that is accessed on multiple “dimension” tables. • A fact table contains values that a manager uses to make decisions. • A dimension table is used as a reference for the values in the fact table.
Steps of data warehouse design • Identify the stakeholders that need data to support their decisions. • Define and describe the data needs of those stakeholders. • Define the subject area. • Choose (EDW and data mart) or just data mart. • Select the data of interest. May be internal, external. May be purchased. May be stored in a transaction database – may not. May be generated just for the data warehouse. • Add element of time. • Add derived data. • Determine granularity level. • Summarize data. • Identify and attempt to solve potential performance issues.
How do you identify those people within an organization who require data to support their decision making processes?
Define and describe the data needs • Usually termed “stakeholder analysis”. • Differing levels of decision making require differing sets of data. • Internal vs. external data. • Integrated vs. non-integrated data. • Detailed vs. summarized data. • Different stakeholders require different access mechanisms. • Online vs. reports. • Pre-formatted vs. ad-hoc availability of data. • Different stakeholders require different timing. • Online, real time vs. delay. • Relative size of delay/timeliness is always an issue.
Define the subject area • Potential subject areas in common to many businesses: • Customers: people and organizations who acquire and/or use the company’s products. • Equipment: Machinery, devices, tools and their components. • Facilities: Real estate and their components. • Sales: Transactions that move a product from company to a customer. • Suppliers: Entities that provide a company with goods and services. • Products: Goods and services that the company, or its competitors, provide to customers. • Materials: Goods and services that the company uses to produce its products. • Financials: Information about money that is received, retained, expended, invested or in any way tracked by the company. • Human resources: Individuals who perform work for the company – may be employees, contracts, or simply positions.
Select the data of interest • Use the existing transaction database model. • Identify and understand the necessary business decisions. • Identify external data that could help support decisions. • Use tables to help sort available attributes. • Look at appendix 1 of the TEC exercise.
Add element of time • Data warehouse is a historical model rather than a current “point in time” model. • Must have a way to incorporate changes that occur over time. • Important issues: • Fact table must include a time component. • Ranges of time vs. effective period in time • Time also relates to dimension tables • May have to deal with differing time periods. Examples are fiscal years, “holiday rush,” billing cycle, etc.
Add derived data • Derived data includes any kind of calculated field. • Examples: total sales; net sales amount; total funds raised; total cost of products. • Issues: • Must be identified, defined and agreed upon by data warehouse stakeholders. • Must be documented in metadata. • Must be consistent.
Determine granularity level • What are the benefits and drawbacks of a low level of granularity? • What are the benefits and drawbacks of a high level of granularity? • What factors should be considered when determining the level of granularity in the data warehouse?
Summarize (aggregate) data • What is summarized data? • How is data summarized? • Does summarized data save disk space? • Why summarize data?