480 likes | 632 Views
Final Exam. Thursday Dec. 9, 19:30, BSB B154 Lecture notes 20 True/False and 20 multiple Choice questions: 40 points 4 Short answer questions (application and conceptual): 40 points. Project Presentation. Email me Power Point presentation the day before
E N D
Final Exam • Thursday Dec. 9, 19:30, BSB B154 • Lecture notes • 20 True/False and 20 multiple Choice questions: 40 points • 4 Short answer questions (application and conceptual): 40 points
Project Presentation • Email me Power Point presentation the day before • Presentation by all members: 15 minutes • Question and answer: 5 minutes • Peer evaluation: 1)Objective and the value of the project 2)Data modeling 3)Functions of the application (with demo) 4)Quality of presentation 5) Overall
Data Warehouse • The idea of a data warehouse is to put a wide range of operational data from internal and external sources into one place so it can be better utilized by executives, line of business managers and other business analysts. • Once the information is gathered, OLAP (on-line analytical processing ) software comes into play by providing the desktop analysis tools for querying, manipulating and reporting the data from the data warehouse.
Data Warehouse environment • the source systems from which data is extracted • the tools used to extract data for loading the data warehouse • the data warehouse database itself where the data is stored • the desktop query and reporting tools used for decision support
The Data Warehouse • The Data Warehouse is an integrated, subject-oriented, time-variant, non-volatile database that provides support for decision making.
Creating A Data Warehouse Figure 13.3
The Data Warehouse • Integrated • The Data Warehouse is a centralized, consolidated database that integrates data retrieved from the entire organization. • Subject-Oriented • The Data Warehouse data is arranged and optimized to provide answers to questions coming from diverse functional areas within a company.
The Data Warehouse • Time Variant • The Warehouse data represent the flow of data through time. It can even contain projected data. • Non-Volatile • Once data enter the Data Warehouse, they are never removed. • The Data Warehouse is always growing.
Operational DB Similar data can have different representations or meanings Functional or process orientation Current transaction Frequent updating Data Warehouse Unified view of all data elements Subject orientation for decision support Historical information with time dimension Data are added without change Operational Database vs. Data warehouse
Data Mart • A data mart is a small, single-subject data warehouse subset that provides decision support to a small group of people.
Data Mart • Data Marts can serve as a test vehicle for companies exploring the potential benefits of Data Warehouses. • Data Marts address local or departmental problems, while a Data Warehouse involves a company-wide effort to support decision making at all levels in the organization.
Star Schema • The star schema is a data modeling technique used to map multidimensional decision support into a relational database. • Star schemas yield an easily implemented model for multidimensional data analysis while still preserving the relational structure of the operational database.
Star Schema • Four Components: • Facts • Dimensions • Attributes • Attribute hierarchies
Figure 13.17 Attribute Hierarchies in Multidimensional Analysis
Star Schema Representation • Facts and dimensions are normally represented by physical tables in the data warehouse database. • The fact table is related to each dimension table in a many-to-one (M:1) relationship. • Fact and dimension tables are related by foreign keys and are subject to the primary/foreign key constraints.
Star Schema • Performance-Improving Techniques • Normalization of dimensional tables • Multiple fact tables representing different aggregation levels • Denormalization of fact tables • Table partitioning and replication
Data Warehouse Implementation • The Data Warehouse as an Active Decision Support Network • A Company-Wide Effort that Requires User Involvement and Commitment at All Levels • Satisfy the Trilogy: Data, Analysis, and Users • Apply Database Design Procedures
On-Line Analytical Processing • On-Line Analytical Processing (OLAP) is an advanced data analysis environment that supports decision making, business modeling, and operations research activities. • Four Main Characteristics of OLAP • Use multidimensional data analysis techniques. • Provide advanced database support. • Provide easy-to-use end user interfaces. • Support client/server architecture.
Data Mining • The data warehouse that enterprises are building until now have largely ignored • Factors make data mining feasible • organizations are gathering more data from on-line TPS with lower storage cost • high computation power allows using complex data mining algorithm
Data Mining • With data mining, it is possible to better manage product warranties, predict purchases of retail stock, unearth fraud, determine credit risk, and define new products and services.
Four Phases of Data Mining 1. Data Preparation • Identify and cleanse data sets. • Data Warehouse is usually used for data mining operations. 2. Data Analysis and Classification • Identify common data characteristics or patterns using • Data groupings, classifications, clusters, or sequences. • Data dependencies, links, or relationships. • Data patterns, trends, and deviations.
Four Phases of Data Mining 3. Knowledge Acquisition • Select the appropriate modeling or knowledge acquisition algorithms. • Examples: neural networks, decision trees, rules induction, genetic algorithms, classification and regression tree, memory-based reasoning, or nearest neighbor and data visualization). 4. Prognosis • Predict future behavior and forecast business outcomes using the data mining findings.
Data Mining • Data mining yields five basic type of information: • Association - occurrences are linked to a single event. “beer purchasers also buy peanuts 70% of the time” • Sequences - events are linked over time. “a new carpet purchase linked to new curtains” • Classification - patterns are recognized that describe the characteristics of a group, such as customers who cancel credit cards
Data Mining • Clustering - discovers undiscovered groupings ``Buyers of expensive sport cars are typically young urban professionals whereas luxury sedans are bought by elderly wealthy persons.'' • Forecasting - estimates future value such as inventory turnover
Database Marketing • It seems a lot of companies are taking a friendly interest in your life these days • Companies are collecting mountains of information about you, crunching it to predict how likely you are to buy a products, and using that knowledge to craft a marketing message precisely calibrated to get you to do so.
Database Marketing • The trend: • Mass marketing • Marketing segmentation • Individual marketing • Nothing is more powerful than knowledge about customers’ individual practice and preferences.
Database Marketing • Gathering massive quantity of data about consumers from multiple sources • Data are combined and analyzed using powerful tools • Has a primary goal of better understanding current and potential customers in order to boost sales and build customer loyalty • American Express • Reader’s Digest
Pioneers of New Marketing • General Motors surveys 12 million GM Card holders on their car preferences • Blockbuster has a database of 36 million households and 2 million daily transactions. It is testing a system that will recommend movies based on a customer’s past rentals
Pioneers of New Marketing • Kraft amassed a list of 30 million users from coupons and survey questions. It regularly send them tips on nutrition and recipes, as well as coupons for specific brands • 56% of manufacturers and retailers are currently building a database for marketing • 85% believe they will do database marketing in 2000.
Some concerns • Private intelligence-gathering gives some people the creeps • Targeted marketing efforts are intrusive and annoying • The collection, manipulation, and combination of lists of personal information amount to an ominous invasion of privacy
A Sample Of Current Data Warehousing And Data Mining Vendors Table 13.10
Deploying Data Mining for Competitive Advantage • The act of building data-mining models does not, by itself, guarantee any business value • To be used as competitive weapon, data mining must be part of a larger process that ensures that the information learned by data mining is transformed into actionable results
A process of deploying data mining for competitive advantage • Problem definition • Discovery • Implementation • Taking action • Monitoring the results
Anderson 2001 survey: Data mining in retailer industry • Using data mining: 52.5% in total • 75% of very large retailers (>$500 million) • 46.4% of large ($200-499 million) • 34.8% of medium ($50-199 million) • 20% of small (<$50 million) • Effect (contribution to the bottom line) • 52.5% said "no contribution" • 19.8% said "very little." • 17.8% said "somewhat” • 8.9% said "very much."
Hunt for terrorists • Banks probe credit, debit card records in hunt for terrorists: 'Data mining': Any suspicious transactions handed over to RCMP • Canada's major financial institutions are reviewing thousands of their customers' confidential transactions as part of a probe into terrorist funding that has gone beyond a list of 27 suspects provided by U.S. law enforcement agencies. Source: Financial Post (National Post) September 27, 2001