440 likes | 461 Views
Explore computer systems that automate decision steps, utilizing tools like DSSs, data mining, EISs, and more. Learn about DSS architecture and the history of decision support systems. Discover how DSS aims to assist decision makers and the taxonomy of DSS. Dive into Executive Information Systems (EIS) and understand data mining and its motivations. Uncover the knowledge discovery process in data mining and its components, including data cleaning and integration.
E N D
Introduction (2) • Most computer systems support decision making because all software programs involve automating decision steps that people would take • Decision making is a process that involves a variety of activities, most of which handle information • A wide variety of computer-based tools and approaches can be used to confront the problem at hand and work through its solution
Introduction (3) • Computer technologies that support decision making • Decision support system (DSSs) • Data mining • Executive information systems (EISs) • Expert systems (ESs) • Agent-based modeling • Multidisciplinary foundations for DS technologies • Database research, artificial intelligence, statistical inference, human-computer interaction, simulation methods, software engineering etc.
Case Example---A Problem-Solving Scenario • Using an EIS to discover a sales shortfall in one region • Investigate several possible causes • Economic conditions • Competitive analysis • Written sales reports • A data mining analysis • Result: no clear problems revealed
Decision Support Systems---History • Two contributing areas of research in 1950s-1960s • Organizational decision making in CMU • Interactive computer systems in MIT • Middle 1970s: single user and model-oriented DSS • Middle and late 1980s: EIS, GDSS, ODSS • 1990s: Data warehousing and OLAP • Late 1990s-2000s • Data mining • Web-based analytical applications
What is a DSS? • A DSS aims to use IT to relieve humans of some decision making or help us make more informed decisions • Systems that support, not replace, managers in their decision-making activities • DSSs are defined as: • Computer-based systems • That help decision makers • Confront ill-structured problems • Through direct interaction • With data and analysis models
DSS Architecture (2) • The Dialog Component • Linking the user to the system • The Data Component • Data sources --- use all the important data sources within and outside the organization in the form of summarized data (DW & DM) • The Model Component • Models provide the analysis capabilities for a DSS • Using a mathematical representation of the problem, algorithmic processes are employed to generate information to support decision making
A Taxonomy of DSS • Using the mode of assistance as the criterion • A model-driven DSS • A communication-driven DSS • A data-driven DSS or data-oriented DSS • A document-driven DSS • A knowledge-driven DSS
Executive Information System (1) • The emphasis of EIS is on graphical displays and easy-to-use user interfaces • EIS can be viewed as a DSS that: • Provides access to summary performance data • Uses graphics to display and visualize the data in an easy-to-use fashion, and • Has a minimum of analysis for modeling beyond the capability to "drill down" in summary data to examine components
Executive Information System (2) • EISs aim to provide both internal and external information relevant to meeting the strategic goals of the organization • Gauge company performance • Scan the environment • EIS and data warehousing technologies are converging in the marketplace • The term EIS has lost popularity in favor of Business Intelligence
Data Mining: Motivations • The explosive growth of data: from TB to PB • Data collection and data availability • Automated data collection tools, database systems, Web, computerized society • Major sources of abundant data • Business: Web, e-commerce, transactions, stocks, … • Science: remote sensing, bioinformatics, … • Society and everyone: news, digital cameras, YouTube • We are drowning in data, but starving for knowledge! • “Necessity is the mother of invention”—Data mining—Automated analysis of massive data sets
What Is Data Mining? • Data mining (knowledge discovery from data) • Extraction of interesting patterns or knowledge from huge amount of data • Alternative names • Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc. • Watch out: Is everything “data mining”? • Simple search and query processing • (Deductive) expert systems
Knowledge Discovery (KDD) Process Knowledge • Data mining—core of knowledge discovery process Pattern Evaluation Data Mining Task-relevant Data Selection Data Warehouse Data Cleaning Data Integration Databases
Graphical User Interface Pattern Evaluation Knowledge-Base Data Mining Engine Database or Data Warehouse Server data cleaning, integration, and selection Data Warehouse World-Wide Web Other Info Repositories Database Architecture: A Typical Data Mining System
Database Technology Statistics Data Mining Visualization Machine Learning Pattern Recognition Other Disciplines Algorithm Data Mining: Confluence of Multiple Disciplines
Why Not Traditional Data Analysis? • Tremendous amount of data • Algorithms must be highly scalable to handle TB of data • High-dimensionality of data • Micro-array may have tens of thousands of dimensions • High complexity of data • Data streams and sensor data • Time-series data, temporal data, sequence data • Structure data, graphs, social networks and multi-linked data • Heterogeneous databases and legacy databases • Spatial, spatiotemporal, multimedia, text and Web data • Software programs, scientific simulations • New and sophisticated applications
Multi-Dimensional View of Data Mining (1) • Data to be mined • Relational, data warehouse, transactional, stream, object-oriented/relational, active, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW • Knowledge to be mined • Characterization, discrimination, association, classification, clustering, trend/deviation, outlier analysis, etc. • Multiple/integrated functions and mining at multiple levels
Multi-Dimensional View of Data Mining (2) • Techniques utilized • Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, etc. • Applications adapted • Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text mining, Web mining, etc.
Data Mining Functionalities (1) • Multidimensional concept description: characterization and discrimination • Generalize, summarize, and contrast data characteristics, e.g., dry VS. wet regions • Frequent patterns, association, correlation vs. causality • Diaper Beer [0.5%, 75%] • Classification and prediction • Construct models (functions) that describe and distinguish classes or concepts for future prediction • E.g., classify countries based on (climate), or classify cars based on (gas mileage) • Predict some unknown or missing numerical values
Data Mining Functionalities (2) • Cluster analysis • Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns • Maximizing intra-class similarity & minimizing interclass similarity • Outlier analysis • Outlier: Data object that does not comply with the general behavior of the data • Noise or exception? Useful in fraud detection, rare events analysis • Trend and evolution analysis • Trend and deviation: e.g., regression analysis • Periodicity analysis
Major Issues in Data Mining (1) • Mining methodology • Mining different kinds of knowledge from diverse data types, e.g., bio, stream, Web • Performance: efficiency, effectiveness, and scalability • Pattern evaluation: the interestingness problem • Incorporation of background knowledge • Handling noise and incomplete data • Parallel, distributed and incremental mining methods • Integration of the discovered knowledge with existing one: knowledge fusion
Major Issues in Data Mining (2) • User interaction • Data mining query languages and ad-hoc mining • Expression and visualization of data mining results • Interactive mining of knowledge at multiple levels of abstraction • Applications and social impacts • Domain-specific data mining & invisible data mining • Protection of data security, integrity, and privacy
Artificial Intelligence (1) • AI is a group of technologies that attempts to mimic our senses and emulate certain aspects of human behavior such as reasoning and communication • 1956, a conference in Dartmouth College • John McCarthy, Marvin Minsky, Allen Newell and Herbert Simon ( MIT, CMU and Stanford) • 1965, H. A. Simon: "machines will be capable, within twenty years, of doing any work a man can do" • 1967, Marvin Minsky: "Within a generation ... the problem of creating 'artificial intelligence' will substantially be solved" • Heavily funded by DARPA
Artificial Intelligence (2) • They had failed to recognize the difficulty of some of the problems they faced: • The lack of raw computing power • The intractable combinatorial explosion of their algorithms, • The difficulty of representing commonsense knowledge and doing commonsense reasoning, • The incredible difficulty of perception and motion • The failings of logic • First AI Winter • In 1974, DARPA cut off all undirected, exploratory research in AI
Artificial Intelligence (3) • In the early 80s, the field was revived by the commercial success of expert systems • By 1985 the market for AI had reached more than a billion dollars. • Minsky and others warned the community that enthusiasm for AI had spiraled out of control and that disappointment was sure to follow • Second AI Winter • The collapse of the Lisp Machine market in 1987
Artificial Intelligence (4) • In the 90s AI achieved its greatest successes • Artificial intelligence was adopted throughout the technology industry, providing the heavy lifting for • Data mining • Logistics • Medical diagnosis • …
Expert System • An expert system is an automated type of analysis or problem-solving model that deals with a problem the way an "expert" does • The process involves consulting a base of knowledge or expertise to reason out an answer based on the characteristics of the problem
Architecture of an ES User Interface Inference Engine Description of a problem User Knowledge Base Advice and explanation
Knowledge Representation • In AI, the primary aim of knowledge representation is to store knowledge so that programs can process it and achieve the verisimilitude of human intelligence • The representation theory has its origin in cognitive science • Knowledge can be represented in a number of ways • Case-based reasoning • Artificial neural networks • Stored as rules
Case-based Reasoning (1) • Case-based reasoning • The process of solving new problems based on the solutions of similar past problems • A case consists of a problem, its solution, and, typically, annotations about how the solution was derived
Case-based Reasoning (2) • Case-based reasoning as a four-step process • Retrieve: given a target problem, retrieve cases from memory that are relevant to solving it • Reuse: map the solution from the previous case to the target problem • Revise: test the new solution, if necessary, revise it. • Retain: After the solution has been successfully adapted to the target problem, store the resulting experience as a new case in memory
Supervised vs. Unsupervised Learning • Supervised learning • Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations • New data is classified based on the training set • Unsupervised learning • The class labels of training data is unknown • Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
Artificial Neural Network (1) • An interconnected group of artificial neurons • Using a mathematical or computational model for information processing based on a connectionistic approach to computation. • An adaptive system that changes its structure based on external or internal information that flows through the network. • ANNs can be used to model complex relationships between inputs and outputs or to find patterns in data • Non-linear statistical data modeling or decision making tools
Artificial Neural Network (2) Training set: (1) high salary, owns a house, has a dog, [profitable customer] (2) less than 3 years on job, prior bankruptcy, owns a dog, [deadbeat] ......
Rule-based Systems (1) • Knowledge stored as rules • The most commonly used form of rules is the if-then statement • e.g. IF some condition THEN some action • A rule-based inference model: decision tree • Each internal node (non-leaf node) denotes a test on an attribute • Each branch represents an outcome of the test • Each leaf node holds a class label
Rule-based Systems (2) Training dataset for decision tree buys_computer
age? <=30 overcast >40 31..40 student? credit rating? yes excellent fair no yes no yes no yes Rule-based Systems (3) Decision tree buys_computer
Agent-based Modeling • Simulate the behavior that emerges from the decisions of a large number of distinct individuals • Computer generated agents, each making decisions typical of the decisions an individual would make in the real world • Trying to understand the mysteries of why businesses, markets, consumers, and other complex systems behave as they do
Toward the Real-Time Enterprise • The essence of the phrase real-time enterprise is that organizations can know how they are doing at the moment • Digitization and automation of some crucial enterprise activities traditionally completed by people • Esp. information analysis • Better sense-and-response
Real-time Reporting • Real-time reporting is occurring on a whole host of fronts including: • Enterprise nervous systems • A network that connects people, applications and devices • To coordinate company operations • Straight-through processing • To reduce distortion in supply chains • Real-time CRM • To automate decision making relating to customers, and • Communicating objects • To gain real-time data about the physical world • E.g. radio frequency identification device (RFID)
The Dark Side of Real Time • Object-to-object communication could compromise privacy • Knowing the exact location of a company truck every minute of the day is an invasion the driver's privacy • In the era of speed, a situation can become very bad very fast • E.g. "circuit breaker" to stop deep dives in NYSE