330 likes | 514 Views
Chapter 1 Why & What is Data Mining?. Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor. Data Mining is a subset of Business Intelligence (BI). Topics to Discuss in Session #1. What is Data Mining (DM)? Who uses DM? Why DM Where DM
E N D
Chapter 1Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.
Topics to Discuss in Session #1 • What is Data Mining (DM)? • Who uses DM? • Why DM • Where DM • When DM • How DM • Why study DM
What, Who Data Mining – Definition & Goal • Definition • DM is the exploration and analysis of large quantities of data in order to discover meaningful patterns and rules. • Goal • To allow an “enterprise”* to IMPROVE its ______ through better understanding of its ______ . • Potential for Competitive Advantage. * Synonyms include: corporation, firm, non-profit organization, government agency
Foundations of Data Mining • Data mining is the process of using “raw” data to infer important “business” relationships. • Despite a consensus on the value of data mining, a great deal of confusion exists about what it is. • Data Mining is a collection of powerful techniques intended for analyzing large amounts of data. • There is no single data mining approach, but rather a set of techniques that can be used stand alone or in combination with each other.
Why, Where, When Data Mining – Why now? • Data are being produced • Data are being warehoused • Computing power is more affordable • Competitive pressures are enormous • Data Mining software is available
How Customer Relationship Management (CRM)
How Customer Relationship Management (CRM) • Notice – what its customers are doing • Remember – what it and its customers have done over time • Learn – from what it has remembered • Act On – what it has learned to make customers more profitable In order to form a learning relationship with its customers, an enterprise (firm) must be able to:
How Based on “Transaction” Data
How Based on “Transaction” Data
How Identifying and Remembering Relationships is the Key!
Group Exercise #1 • Time Box = 15 minutes • Teams of 4 or less • Discuss DM situations among yourselves and pick one to report to the class • What to report (verbally – 5 minute max): • Describe the DM situation • How does it help the enterprise? • Presentations…another 15 to 30 minutes
Topics to Discuss in Session #2 • Data Mining History • Data Warehouse • Data Mart
Data Mining History • The approach has roots in practice dating back over 40 years. • In the early 1960s, data mining was called statistical analysis, and the pioneers were statistical software companies such as SAS and SPSS. • By the late 1980s, the traditional techniques had been augmented by new methods such as fuzzy logic, heuristics and neural networks.
Definitions of a Data Warehouse “A subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process” 1. - W.H. Inmon “A copy of transaction data, specifically structured for query and analysis” 2. - Ralph Kimball
Data Warehouse • For organizational learning to take place, data from many sources must be gathered together and organized in a consistent and useful way – hence, Data Warehousing (DW) • DW allows an organization (enterprise) to remember what it has noticed about its data • Data Mining techniques make use of the data in a DW
Data Warehouse Enterprise “Database” Customers Orders Transactions Vendors Etc… Etc… • Data Miners: • “Farmers” – they know • “Explorers” - unpredictable Copied, organized summarized Data Warehouse Data Mining
Data Warehouse • A data warehouse is a copy of transaction data specifically structured for querying, analysis and reporting – hence, data mining. • Note that the data warehouse contains a copy of the transactions which are not updated or changed later by the transaction system. • Also note that this data is specially structured, and may have been transformed when it was copied into the data warehouse.
Data Mart • A Data Mart is a smaller, more focused Data Warehouse – a mini-warehouse. • A Data Mart typically reflects the business rules of a specific business unit within an enterprise.
Data Mart Data Mart Data Mart Data Warehouse to Data Mart Decision Support Information Data Warehouse Decision Support Information Decision Support Information
Data Warehouse & Mart • Set of “Tables” – 2 or more dimensions • Designed for Aggregation
Group Exercise #2 • Time Box = 15 minutes • Teams of 4 or less • Discuss Data Warehouse to Data Mart situations among yourselves and pick one to report to the class • What to report (verbally – 5 minute max): • Describe the DW to Data Mart situation • How does it help the enterprise’s “business” unit? • Presentations…another 15 to 30 minutes
Topics to Discuss in Session #3 • Data Mining Flavors • Data Mining Examples • Data Mining Tasks • Data Mining’s Biggest Challenge • What does all of this mean?
Data Mining Flavors • Directed – Attempts to explain or categorize some particular target field such as income or response. • Undirected – Attempts to find patterns or similarities among groups of records without the use of a particular target field or collection of predefined classes.
Data Mining Examples in Enterprises For Illustration Purposes Only • US Government • FBI – track down criminals (SD Police also) • Treasury Dept – suspicious int’l funds transfer • Phone companies • Supermarkets & Superstores (Vons, Albertsons, Wal-Mart, Costco) • Mail-Order, On-Line Order (L.L. Bean, Victoria’s Secret, Lands End) • Financial Institutions (BofA, Wells Fargo, Charles Schwab) • Insurance Companies (USAA, Allstate, State Farm) • Tons of others…
Data Mining Tasks • Classification – example: Fr, So, Jr, Sr • Estimation – example: household income • Prediction – example: predict credit card balance transfer average amount • Affinity Grouping – Example: people who buy X, often buy Y also with probability Z% • Clustering – similar to classification but no predefined classes • Description and Profiling – behavior begets an explanation such as “More guys prefer In-n-Out Burger than do gals.”
Data Mining’s Biggest Challenge • The largest challenge a data miner may face is the sheer volume of data in the data warehouse. • It is quite important, then, that summary data also be available to get the analysis started. • A major problem is that this sheer volume may mask the important relationships the data miner is interested in. • The ability to overcome the volume and be able to interpret the data is quite important.
What Does All of This Mean? • On a regular basis, “farmers” and “explorers” utilize their data warehouses to give guidance for and/or answer a limitless variety of questions. • Nothing is free, however, and the benefits do come with a cost. • The value of a data warehouse and subsequent data mining is a result of the new and changed business processes it enables – competitive advantage also. • There are limitations, though - A Data Warehouse cannot correct problems with its data, although it may help to more clearly identify them.