320 likes | 484 Views
Decision support systems for E-commerce. Working Definition of DSS. A DSS is an integrated, interactive computer system, consisting of analytical tools and information management capabilities, designed to aid decision makers in solving relatively large, unstructured problems
E N D
Working Definition of DSS • A DSS is an integrated, interactive computer system, consisting of analytical tools and information management capabilities, designed to aid decision makers in solving relatively large, unstructured problems • Decision Making samples • what were the sales volumes by region and product category for the last year? • How did the share price of computer manufacturers correlate with quarterly profits over the past 10 years? • Central Issue in DSSsupport and improvement of decision making
Management Decision Making • Strategic • CEO, board of directors, top executives • Develop overall strategies of organization • Tactical • Regional managers, plant managers, division supervisors • Carry out strategic managers plans • Operational • Direct managers, team leaders • Carry out tactical managers plans Different Technologies are invented to meet different Decision Making Goals!
other sources Extract Transform Load Refresh Operational DBs The Big Picture: DBs, Data Warehouse, & OLAP, Data Mining OLAP Server Analysis Query Reports Data mining Data Warehouse Serve Data Storage OLAP Engine Front-End Tools
Why Build a Data Warehouse? • Separate transactional and analysis systems : to make Tactical or even Strategic decisions for Regional managers or CEOs • Easy formulation of complex queries • Access to historical data (not in operational systems) • Improved data quality (fewer errors and missing values) • Access to data from multiple sources, have a comprehensive data collection
Potential Applications of Data Warehousing and Mining in EC • Analysis of user access patterns and buying patterns • Customer segmentation and target marketing • Cross selling and improved Web advertisement • Personalization • Association (link) analysis • Customer classification and prediction • Time-series analysis • Typical event sequence and user behavior pattern analysis • Transition and trend analysis
Data Warehousing • The phrase data warehouse was coined by William Inmon in 1990 • Data Warehouse is a decision support database that is maintained separately from the organization’s operational database • Definition: A DW is a repository of integrated information from distributed, autonomous, and possibly heterogeneous information sources for query, analysis, decision support, and data mining purposes
Characteristics (cont’d) • Integrated • No consistency in encoding, naming conventions, … among different application-oriented data from different legacy systems, different heterogeneous data sources • When data is moved to the warehouse, it is consolidated converted, and encoded
Characteristics (cont’d) • Non-volatile • New data is always appended to the database, rather than replaced • The database continually absorbs new data, integrating it with the previous data • In contrast, operational data is regularly accessed and manipulated a record at a time and update is done to data in the operational environment
Characteristics (cont’d) • Time-variant • Operational database contain current value data. • Operational data is valid only at the moment of access-capturing a moment in time. • The time horizon for the data warehouse is significantly longer than that of operational systems. • Data warehouse data is nothing more than a sophisticated series of snapshots, taken as of some moment in time.
System Architecture End User Analysis, Query Reports, Data Mining . . . Detector Detector Detector Detector Legacy Flat-file RDBMS OODBMS
Data Warehouse Back-End Tools and Utilities • Data extraction: • Extract data from multiple, heterogeneous, and external sources • Data cleaning (scrubbing): • Detect errors in the data and rectify them when possible • Data converting: • Convert data from legacy or host format to warehouse format • Transforming: • Sort, summarize, compute views, check integrity, and build indices • Refresh: • Propagate the updates from the data sources to the warehouse
On-Line Analytical Processing (OLAP) Front-end to the data warehouse. Allowing easy data manipulation Allows conducting inquiries over the data at various levels of abstractions Fast and easy because some aggregations are computed in advance No need to formulate entire query
Date 2Qtr 1Qtr sum 3Qtr 4Qtr TV Product U.S.A PC VCR sum Canada Country Mexico sum All, All, All OLAP: Data Cube OLAP uses data in multidimensional format (e.g., data cubes) to facilitate query and response time. Overall sales of TV’s in the US in 3rd quarter
OLAP: Data Cube Operations • Slicing: Selecting the dimensions of the cube to be viewed. • Example: View “Sales volume” as a function of “Product ” by “Country “by “Quarter” • Dicing: Specifying the values along one or more dimensions. • Example: View “Sales volume” for “Product=PC” by “Country “by “Quarter”
OLAP: Data Cube Operations • Drilling down: from higher level aggregation to lower level aggregation or detailed data (Viewing by “state” after viewing by “region” ) • Rolling-up: Summarize data by climbing up hierarchy or by dimension reduction (E.g., viewing by “region” instead of by “state”)
Cube Operations Illustrated Drilling down Rolling up
Actual Application Com.1 • Query: • “overall & detail production performance” • manufacturer: Com1 • products: all products • date interval: 01-Jan-94 until 01-Jan-1999 • source: USDA
Lot#1 Com.1 Contract Number 1 Com.1 Contract Number 2 Lot#2 Com.1 Contract Number 3 Lot#3
Data Mining “Data Mining is the exploration and analysis by automatic or semi-automatic means, of large or small quantities of data in order to discover meaningful patterns, trends and rules.”
Data Mining Data Analysis Database Statistics AI & ML Data Warehouse OLAP
Data Analysis • Classification • Regression • Clustering • Association • Sequence Analysis
Data Analysis (cont.) Modeling Y1 Numeric Numeric X1 f Regression 3, 4.5, 102, … Categorical X2 Y2 Categorical hot, cold, high, low, … Classification Crisp X3 Y3 Crisp 0, 1, yes, no, … Output Variables or Dependent Variables or Classes or Targets Input Variables or Independent Variables or Attributes or Descriptors Linear Models or Non-linear Models or A set of rules
Clustering Association Income 1, chips, coke, chocolate 2, gum, chips 3, chips, coke 4, … Probability (chips, coke) ? Probability (chips, gum) ? Age Sequence Analysis …ATCTTTAAGGGACTAAAATGCCATAAAAATCCATGGGAGAGACCCAAAAAA… Xt-1 Xt T Data Analysis (cont.)
Data Analysis (cont.) • Classification • Regression • Linear Discriminant Analysis • Naïve Bayes / Bayesian Network • OneR • Neural Networks • Decision Tree (ID3, C4.5, …) • K-Nearest Neighbors (IB) • Support Vector Machines (SVM) • … • Multiple Linear Regression • Principal Components Regression • Partial Least Square • Neural Networks • Regression Tree (CART, MARS, …) • K-Nearest Neighbors (LWR) • Support Vector Machines (SVR) • … • Clustering • Association & Sequence Analysis • K-Mean Clustering • Self Organizing Map • Bayesian Clustering • COBWEB • … • A Priori • Markov Chain • Hidden Markov Models • …
Challenges • Faster, more accurate and more scalable techniques • Incremental, on-line and real-time learning algorithms • Parallel and distributed data processing techniques
Opportunities • Data mining is a ‘top ten’ emerging technology • Data mining is finding increasing acceptance in science and business areas which need to analyze large amounts of data to discover trends and patterns which they could not otherwise find. • Data mining is an exciting and challenging field with the ability to solve many complex scientific and business problems.