810 likes | 1.04k Views
Easy BI. Data Mining for Fun and Profit. How it works and how to work it. Why do it…. “The half-life of BI is typically shorter than the life of the project needed for its implementation.” --Industry whitepaper (see references). Predicting is Hard. “Predicting is hard…
E N D
Data Mining for Fun and Profit How it works and how to work it
Why do it… • “The half-life of BI is typically shorter than the life of the project needed for its implementation.” --Industry whitepaper (see references)
Predicting is Hard • “Predicting is hard… • …especially about the future” --Yogi Berra
Why are we here? A recent Gartner Group Advanced Technology Research Note listed data mining at the top of the five key technology areas that "will clearly have a major impact across a wide range of industries within the next 3 to 5 years."
What it is… Data Mining finds patterns in data
What it is… • Data Mining finds patterns in data • Using Machine Learning Algorithms • Don’t worry: the hard yards are done • A lot at Microsoft Research
What it is… Data Mining finds patterns in data Uses these patternsto make predictions
What it’s not SSAS ≠ Cube
Look, Ma, No Cube! Dimensional Modelling: Build a Cube Learn MDX Construct Analyses …of the PAST Data Mining: Build Structure Use Model Make Predictions …about the Future
Why No Cube? • Cubes summarize facts: • For Example: • Sums of Sales in all regions for all months • Aggregated by Gender and Age • For each Product • …
Why No Cube? • Cubes summarize facts: • For Example: • Sums of Sales in all regions for all months • Aggregated by Gender and Age • For each Product • … • Data mining find patterns in data
Why No Cube? • Cubes summarize facts: • For Example: • Sums of Sales in all regions for all months • Aggregated by Gender and Age • For each Product • … • Data mining find patterns in data • Cubes abstract much of the interesting information
Why No Cube? • Cubes summarize facts: • For Example: • Sums of Sales in all regions for all months • Aggregated by Gender and Age • For each Product • … • Data mining find patterns in data • Cubes abstract much of the interesting information • Facts that form the patterns are lost in the Cube’s summations
Demo: Excel Data Mining Add-In • Connect to Data Source • Highlight Exceptions • Forecasting • Key Influencers
But is it Respectable? Is it all just smoke and mirrors???
But is it Respectable? • Is it all just smoke and mirrors??? • “Excel data mining add-in was invented to make astrology look respectable!” • Donald Data, industry pundit
Physical Architecture Jargon: ADO = ActiveX Data Objects ADO MD = ADO Multidimensional AMO = Analysis Management Objects DSO = Decision Support Objects XMLA = XML for Analytics
Data Mining Tutorials Books Online Contents or… Search For Data Mining Tutorials
Data Mining Designer • Business Intelligence Development Studio • Demo: Key Influencers • Models and Model Viewers • Decision Tree • Cluster • Naïve Bayes • Neural Network
Decision Tree Algorithm Correlation Tree Node
Decision Tree Algorithm Correlation Tree Node
Decision Tree Algorithm • Hybrid • Linear regression & association & classification
Decision Tree Algorithm • Hybrid • Linear regression & association & classification • Algorithm highlights • Remove rare attributes (“Feature Selection”)
Decision Tree Algorithm • Hybrid • Linear regression & association & classification • Algorithm highlights • Remove rare attributes (“Feature Selection”) • Group values into bins for performance
Decision Tree Algorithm • Hybrid • Linear regression & association & classification • Algorithm highlights • Remove rare attributes (“Feature Selection”) • Group values into bins for performance • Correlate input attributes with outcomes
Decision Tree Algorithm • Hybrid • Linear regression & association & classification • Algorithm highlights • Remove rare attributes (“Feature Selection”) • Group values into bins for performance • Correlate input attributes with outcomes • Find attribute separating outcomes with maximum information gain
Decision Tree Algorithm • Hybrid • Linear regression & association & classification • Algorithm highlights • Remove rare attributes (“Feature Selection”) • Group values into bins for performance • Correlate input attributes with outcomes • Find attribute separating outcomes with maximum information gain • Split tree and re-apply
Cluster Algorithm • Algorithm options: • Non-scalable (all records)
Cluster Algorithm • Algorithm options: • Non-scalable (all records) • Scalable (50,000 records + 50,000 more if needed) • 3 x faster than non-scalable
Cluster Algorithm • Algorithm options: • Non-scalable (all records) • Scalable (50,000 records + 50,000 more if needed) • 3 x faster than non-scalable • K – means (hard)
Cluster Algorithm • Algorithm options: • Non-scalable (all records) • Scalable (50,000 records + 50,000 more if needed) • 3 x faster than non-scalable • K – means (hard) • Expectation Maximization (soft) (default)
Cluster Algorithm • Algorithm options: • Non-scalable (all records) • Scalable (50,000 records + 50,000 more if needed) • 3 x faster than non-scalable • K – means (hard) • Expectation Maximization (soft) (default) • Form initial cluster
Cluster Algorithm • Algorithm options: • Non-scalable (all records) • Scalable (50,000 records + 50,000 more if needed) • 3 x faster than non-scalable • K – means (hard) • Expectation Maximization (soft) (default) • Form initial cluster • Assign probability each attribute-value in each cluster
Cluster Algorithm • Algorithm options: • Non-scalable (all records) • Scalable (50,000 records + 50,000 more if needed) • 3 x faster than non-scalable • K – means (hard) • Expectation Maximization (soft) (default) • Form initial cluster • Assign probability each attribute-value in each cluster • Iterate until model = likelihood of data
Naïve Bayes Algorithm • Simple, fast, surprisingly accurate
Naïve Bayes Algorithm • Simple, fast, surprisingly accurate • “Naïve”: attributes assumed to be independent of each other
Naïve Bayes Algorithm • Simple, fast, surprisingly accurate • “Naïve”: attributes assumed to be independent of each other • Pervasive use throughout Data Mining
Naïve Bayes Algorithm • Simple, fast, surprisingly accurate • “Naïve”: attributes assumed to be independent of each other • Pervasive use throughout Data Mining P(Result | Data) = P(Data | Result) * P(Result) / P(Data)
Naïve Bayes Algorithm P(Girl | Trousers) = ? P(Trousers | Girl) = 20/40 P(Girl) = 40/100 P(Trousers) = 80/100
Naïve Bayes Algorithm P(Girl | Trousers) = ? P(Trousers | Girl) = 20/40 P(Girl) = 40/100 P(Trousers) = 80/100 P(Girl | Trousers) = P(Trousers | Girl) P(Girl) / P(Trousers)
Naïve Bayes Algorithm P(Girl | Trousers) = ? P(Trousers | Girl) = 20/40 P(Girl) = 40/100 P(Trousers) = 80/100 P(Girl | Trousers) = P(Trousers | Girl) P(Girl) / P(Trousers) = (20/40)(40/100)/(80/100) = 20/80 = 0.25
Neural Network Algorithm Cars W W W W Weight 2 W W W W W W Buy W Cars Weight 3 W No W Age W Weight W Output Neurons Input Neurons Hidden Neurons
Neural Network Algorithm • Multilayer Perceptron Network =