610 likes | 789 Views
Data Warehousing & Business Intelligence Fundamentals. Agenda. Evolution of DSS Business Intelligence. Evolution of Decision Support Systems. Overview. Most organizations began information processing on a small scale, automating one application at a time.
E N D
Agenda • Evolution of DSS • Business Intelligence
Overview • Most organizations began information processing on a small scale, automating one application at a time. • Systems tended to grow independently to support defined functional areas. • Each functional area tended to plan and develop systems in isolation from other areas.
Evolution of Information Processing • File Transaction Processing • Data Based Management Systems (DBMS) • 4GLs & MIS • Extract Processing • Decision Support Systems (DSS) • Data Warehouses • OLAP • Data Mining
File Transaction Processing • The traditional approach to file processing encouraged each functional area to develop and maintain specialized applications. • Individual applications ran on unique master files housed on magnetic tapes.
Problems With Traditional File Processing: • Data redundancy • Lack of data integrity • Program-data dependency • Lack of flexibility • Poor security • Lack of data sharing and availability
Traditional File Processing Data redundancy and inconsistency across files: Registration Application Library Application Financial aids Application Credit Records Application File A Student ID Name Address ZIP Code Phone Number File B Student ID First Name Last Name Address & ZIP Phone Number File C Student ID First & Last Name Address ZIP Code Phone Number File D Social Security Name Address ZIP Code Phone Number
Evolution of Information Processing • File Transaction Processing • Data Based Management Systems (DBMS) • 4GLs & MIS • Extract Processing • Decision Support Systems (DSS) • Data Warehouses • OLAP • Data Mining
Data Base Management Systema single source for all processing Registration IS application Library IS application Financial aids application Credit Records IS application Common Data Dictionary Data Definition Language DBMS- database management system Data Manipulation Language INTEGRATED STUDENTS DATABASE Students: Name Address Credit Records: Number of credits Course number Books: Title Author
Evolution of Information Processing • File Transaction Processing • Data Based Management Systems (DBMS) • 4GLs & MIS • Extract Processing • Decision Support Systems (DSS) • Data Warehouses • OLAP • Data Mining
PCs, 4GLs, MIS • PCs and Fourth-Generation Languages brought access to data and users at the level of the end-user. • MIS – focus on providing pre-specified reports to drive management decisions
Evolution of Information Processing • File Transaction Processing • Data Based Management Systems (DBMS) • 4GLs & MIS • Extract Processing • Decision Support Systems (DSS) • Data Warehouses • OLAP • Data Mining
Extract Programs Program extracting data from files and databases based on criteria. Reasons for Extract Programs • Accessibility • move data out of online processing systems • Performance • perform analytical functions separate from online processing functions • Control • shift in control of the data • the end-user ends up “owing” it
Problems with Natural Evolution of Data Extraction • Credibility of data • Inconsistent conclusions as a result of extracting data at different times, using different criteria, using data from different unsynchronized data sources • Low Productivity • Locating data, multiple storage technologies, multiple communication technologies, different data definitions • Inability to transform data into information • Difficult to integrate extracted data, lack of historical data
Evolution of Information Processing • File Transaction Processing • Data Based Management Systems (DBMS) • 4GLs & MIS • Extract Processing • Decision Support Systems (DSS) • Data Warehouses • OLAP • Data Mining
Decision Support Systems • Computer system at the management level of an organization that combines data, sophisticated analytical models, and user-friendly software to support semi-structured and unstructured decision making. • DSS often tend to be stand-alone systems, developed by end-user groups not under central IS control
Components of DSS • DSS Data base • A collection of current and historical, internal & external data • DSS Model base • A collection of analytical (math, statistical) models that can easily be made accessible to the DSS user. • DSS software system • User interface and development environment that permits easy interaction between the users of the system and the DSS data & model base.
Which are our lowest/highest margin customers ? Who are my customers and what products are they buying? What is the most effective distribution channel? What product promo-tions have the biggest impact on revenue? Which customers are most likely to go to the competition ? What impact will new products/services have on revenue and margins? A producer wants to know….
Data, Data everywhereyet ... • I can’t find the data I need • data is scattered over the network • many versions, subtle differences • I can’t get the data I need • need an expert to get the data • I can’t understand the data I found • available data poorly documented • I can’t use the data I found • results are unexpected • data needs to be transformed from one form to other
Problem: Heterogeneous Information Sources “Heterogeneities are everywhere” Personal Databases World Wide Web Scientific Databases Digital Libraries • Different interfaces • Different data representations • Duplicate and inconsistent information
Problem: Data Management in Large Enterprises • Vertical fragmentation of informational systems (vertical stove pipes) • Result of application (user)-driven development of operational systems Sales Planning Suppliers Num. Control Stock Mngmt Debt Mngmt Inventory ... ... ... Sales Administration Finance Manufacturing ...
What are the users saying... • Data should be integrated across the enterprise • Summary data has a real value to the organization • Historical data holds the key to understanding data over time • What-if capabilities are required
Dilemma: Most of the business analysts’ time is not spent in true data analysis • These logistic factors can negatively impact and slow down efficiency and effectiveness of business analysis: • Growing Volume of Data • Data stored in many different systems and formats • The criticality of quick decision making • Introduction to new products and market dynamics • Change in organizational strategies
Evolution of Information Processing • File Transaction Processing • Data Based Management Systems (DBMS) • 4GLs & MIS • Extract Processing • Decision Support Systems (DSS) • Data Warehouses • OLAP • Data Mining
Information Data What is Data Warehousing? A process of transforming data into information and making it available to users in a timely enough manner to make a difference [Forrester Research]
Data Warehousing -- It is a process • Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previous possible • A decision support database maintained separately from the organization’s operational database
What is a Data Warehouse?A practitioners view “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.” Barry Devlin, IBM Consultant
What is a Data Warehouse? • Defined in many different ways, but not rigorously. • A decision support database that is maintained separately from the organization’s operational database • Support information processing by providing a solid platform of consolidated, historical data for analysis. • “A data warehouse is asubject-oriented, integrated, time-variant, and nonvolatilecollection of data in support of management’s decision-making process.”—W. H. Inmon • Multidimensional database with reporting and query tools, that stores current and historical data extracted from various operational systems and consolidated for management reporting and analysis. • Addresses the problem of integrating key operational data from around the company in a form that is consistent, reliable, and easily available for reporting.
RelationalDatabases ExtractionCleansing Optimized Loader ERP Systems Data Warehouse Engine AnalyzeQuery Purchased Data LegacyData Metadata Repository Data Warehouse Architecture
Warehousing data outside the operational systems • The primary concept of data warehousing is that the data stored for business analysis can most effectively be accessed by separating it from the data in the operational systems. • Fundamental differences between operational and informational (DW) environment: • Nature of the data • Development cycle • Supporting technology • User community • Processing characteristics
Evolution of Information Processing • File Transaction Processing • Data Based Management Systems (DBMS) • 4GLs & MIS • Extract Processing • Decision Support Systems (DSS) • Data Warehouses • OLAP • Data Mining
What Is OLAP? • Online Analytical Processing - coined by EF Codd in 1994 • “A group of technologies and applications that collect, manage, process, and present multidimensional data for analysis and management purposes.” • “A category of database software which provides an interface such that users can transform or limit raw data according to user-defined or pre-defined functions, and quickly and interactively examine the results in various dimensions of the data.”
OLTP database applications are developed to meet the day-to-day and operational data retrieval needs of end-users Provide read-write capability Data Warehouses along with OLAP tools are being developed to meet information exploration and historical trend analysis management needs Provides read-only capability OLTP vs. OLAP
Evolution of Information Processing • File Transaction Processing • Data Based Management Systems (DBMS) • 4GLs & MIS • Extract Processing • Decision Support Systems (DSS) • Data Warehouses • OLAP • Data Mining
Data Mining • The exploration and analysis, by automatic or semiautomatic means, of large quantities of data in order to discover valid, meaningful patterns and rules to assist with business decisions.
Data Mining works with Warehouse Data • Data Warehousing provides the Enterprise with a memory • Data Mining provides the Enterprise with intelligence
We want to know ... • Given a database of 100,000 names, which persons are the least likely to default on their credit cards? • Which types of transactions are likely to be fraudulent given the demographics and transactional history of a particular customer? • If I emphasize ease-of-use of the product as opposed to its technical capabilities, what will be the net effect on my revenues? • Which of my customers are likely to be the most loyal? Data Mining helps extract such information
Data Mining Application Areas Industry Application Finance Credit Card Analysis Insurance Claims, Fraud Analysis Telecommunication Call record analysis Transport Logistics management Consumer goods promotion analysis Data Service providers Value added data Utilities Power usage analysis
Data Mining in Use • The US Government uses Data Mining to track fraud • A supermarket becomes an information broker • Basketball teams use it to track game strategy • Cross selling • Holding on to good customers • Weeding out bad customers
What makes data mining possible? • Advances in the following areas are making data mining deployable: • data warehousing • better and more data (i.e., operational, behavioral, and demographic) • the emergence of easily deployed data mining tools and • the advent of new data mining techniques. • -- Gartner Group
Objectives • Identify potential solutions to business problems using the organization’s data. • Describe alternatives for distributing business intelligence to the organization. • Describe methods to measure the effectiveness of a business solution.
Business Case Study Sterling Airlines is a small, commercial airline. For several years, quarterly reports have shown gross revenues unchanged, while operating costs have continued to rise.
Business Case Study In recent years, the airline industry has experienced increased costs in areas such as • operations • personnel • greater security measures • equipment maintenance and upgrades.
Business Case Study Sterling Airlines developed questions surrounding its business processes.
Questions about Business Processes 1. What are the time trends in flight delay frequency by hub, location, and day of the week? 2. How much does it cost to offer a compensation package for flight delays? 3. How will costs be affected if the compensation package is changed? 4. What is the impact of delays on repeat business?
Building a Data Warehouse To answer these questions,Sterling Airlines must turnits business data intobusiness intelligence.
Knowledge as Intellectual Capital The data warehouse exists to increase the value of the corporate body of knowledge: • Human capital — potentially valuable staff knowledge relevant to the business • Structural capital — underlying assets for capture, storage, and exchange of information • Relationship capital — with clients, channels, and so on.
Guidelines for Building a Data Warehouse 1. Plan from the top down. 2. Implement from the bottom up. 3. Build your warehouse one subject area at a time. 4. Define, load, and test warehouse data sources and data stores as you go.