270 likes | 417 Views
Agenda: 02/05/2013. Explore the components of a BI System Definition Need Characteristics Architectures Development. Purpose of the readings. Introduce: Need for technology architecture. Hardware/software architecture. Issues related to the hardware/software architecture.
E N D
Agenda: 02/05/2013 • Explore the components of a BI System • Definition • Need • Characteristics • Architectures • Development
Purpose of the readings • Introduce: • Need for technology architecture. • Hardware/software architecture. • Issues related to the hardware/software architecture. • Reading #6 discusses both architecture and data warehouse design. We will not focus on data warehouse design until next Tuesday. • Reading #7 discusses the key research issues being discussed in the computer science literature regarding hardware/software architecture necessary to support BI.
A Business Intelligence “System” • A business intelligence system encompasses all processes, hardware and software necessary to extract data, transform it, integrate it, save it, and provide information made accessible by users to support decision making.
Why develop a BI system? • Most organizations already have data available through computer technology. • Why design, create and implement a specialized system for business intelligence?
Many systems…much data • Operational Systems. • Description and purpose. • Examples. • Informational Systems. • Description and purpose. • Examples. • How are they differentiated?
What are the components of BI? • Data warehouse • Structured, unstructured, internal, external, transaction-level, and derived data. • Data storage repository. • Extract, transform and load methods • Methods of loading accurate and consistent data into the data warehouse. • Methods of integrating data from disparate sources. • Metadata repository • Data definitions and meanings. • Business rules and process decisions. • Analytical tools • OLAP: Online Analytical Processing • Statistical analysis. • Data Mining. • Data Visualization/End-User Presentation Tools • Dashboards. • Graphics, tables, pictures.
What is a data warehouse? • A data warehouse is a collection of data gathered into a database specifically designed to support decision making. • Types of decisions supported by data warehouses: • Operational • Short-term • Long-term • Strategic • An organization may have one or may have multiple data warehouses designed to suit multiple applications and/or decisions.
Characteristics of a data warehouse • Subject-oriented. • Integrated. • Time-variant. • Non-volatile.
Other potential characteristics of DW • Summarized (or not...) • Not normalized (or normalized...) • Web based (or not...) • Real-time (or batch...) • Single version of truth (or one of many...)
What is a data mart? • A data mart stores data for a limited number of subject areas. • An “independent” data mart is loaded directly from operational data. • A “dependent” data mart is loaded from an enterprise data warehouse. • Usually not normalized.
Data warehouse architecture • Continuum of choices. • Basic architectural issues: • Where is the data stored? • Within the operational data stores and then concatenated on the fly? • Within a centralized data warehouse that is optimized for decision making? • Within a series of data marts? • When is the data cleaned (or whether the data is cleaned at all)? • How is the data accessed?
What is an enterprise data warehouse? • A data warehouse that is created to encompass multiple subject areas. • Is usually normalized. • Can be used for decision making in multiple areas.
What is an operational data store? • An operational data store consolidates data from multiple source systems and provides a near real-time, integrated view of volatile, current data. • Its purpose is to provide integrated data for operational purposes. It has add, change, and delete functionality. • Sometimes they are created to avoid a full blown ERP implementation.
Factors that may affect the decision • Information interdependence among organizational units • Upper management’s information needs • Urgency of need for a data warehouse • Nature of end-user tasks • Identified role of the data warehouse prior to implementation • Compatibility with existing systems • Perceived ability of the in-house IT staff • Social/political factors
Metadata repository • A metadata repository contains information about all data objects stored in the data warehouse. • Contains the following segments: • Business segment: describes the business definition of the data element. • Technical segment: describes the computer-related technical properties of each element (size, data type, unit of measure, etc.). • Process segment: describes how the element is processed to be placed in the data warehouse. • Usage segment: describes the relative usage of the element including who accesses it and how often and in what manner. Used for performance tuning.
Extract, Transform, Load (ETL) • Extract • Take data from source systems. • May require middleware to gather all necessary data. • Transformation • Put data into consistent format and content. • Validate and fix data – check for accuracy, consistency using pre-defined and agreed-upon business rules. • Convert data as necessary. • Load • Use a batch (bulk) update operation that keeps track of what is loaded, where, when and how. • Keep a detailed load log to audit updates to the data warehouse.
Online analytical processing tools • Provides multi-dimensional data analysis techniques. • Works primarily with data aggregation. • Provides advanced statistical analysis. • Provides advanced graphical output. • Supports access to very large databases. • Provides enhanced query optimization algorithms. • Can be add-ons to existing products, example is Excel. Can have their own user interfaces.
Data mining tools • Data mining tools: • analyze the data; • uncover patterns hidden in the data; • form computer models based on the findings; and • use the models to predict business behavior. • Proactive tools, used for prediction and discovery of behavior. • Some are based on standard statistical tools of correlation, regression, factor analysis and structural equation modeling. • Most are based on artificial intelligence software such as decision trees, neural networks, fuzzy logic systems, inductive nets and classification networking.
Contrast between OLAP and Data Mining Decision Support Questions
Data visualization/End-User Presentation • Text portrayal of data. • Tables. • Graphical portrayal of data. • Graphics include: • Standard graphics (bar chart, pie chart, line chart, etc.) • Pictures • Scatter diagrams combined with pictures. • Animation. • Cool 3-D images… • Audio. • Video.
Tools • Excel!! • Query generators • Report writers • Dashboards • A very long list of possibilities from a very long list of ever-evolving vendors!