150 likes | 538 Views
Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout. Saptak Sen, Microsoft Bill Ramos, Advaiya. Agenda. Overview of predictive analytics & data m ining How Microsoft supports predictive analytics How Mahout fits into the picture Demos. Data Mining.
E N D
Big Data AnalyticsModule 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya
Agenda • Overview of predictive analytics & data mining • How Microsoft supports predictive analytics • How Mahout fits into the picture • Demos
Predicting future performance from historical data Predictive analytics should address the likelihood of something happening in the future, even if it is just an instant later* Legal discovery and document archiving IT infrastructure and web app optimization Social network analysis Recommenda-tion engines Churn analysis Weather forecasting for business planning Location-based tracking and services Personalized Insurance Advertising analysis Fraud detection Equipment monitoring Pricing analysis • *Source: Ventana Research, Predictive Analytics Benchmark Research Report, March 2012.
Data mining tool in SQL Server Analysis Services • Rich data mining algorithms, for clustering, classification, forecasting through time series analysis, and more • Rich developer experience
Analysis Services Data Mining Algorithms Classify Estimate Cluster Forecast Associate • Decision Trees • Logistic Regression • Naïve Bayes • Neural Networks • Decision Trees • Linear Regression • Logistic Regression • Neural Networks • Clustering • Time Series • Association Rules • Decision Trees
Data mining add-in for Excel • Ease of use through Excel • Rich data mining algorithms for clustering, prediction, forecasting, market basket analysis, and more • Scalable through integration with SSAS
Demo 1: Excel Data Mining Add-In Batch Layer Speed Layer Serving Layer Windows Azure HDInsight Microsoft Excel(Mining Add-in) Excel Data Mining Add-in Flat files (.txt, .dat, .xlsx, etc.) Microsoft Excel
Mahout Applications • Scalable machine learning algorithms on Hadoop platform • Algorithms for clustering, classification, and batch-based collaborative filtering using the map/reduce paradigm • Supports a wide range of use cases—from email spam filtering, to fraud detection, to recommendations for books or movies Examples Regression Genetic Dimension Reduction Matrices Pattern Mining Classification Collocations Vector Similarity Recommenders Clustering
Demo 2: Mahout Batch Layer Speed Layer Serving Layer Windows Azure HDInsight HDInsight Consoles Convert to Mahout input Flat files (.txt, .dat, .xlsx, etc.) Running Mahout job on Hadoop Command Window to get output file Output file Hadoop Command Window
Learn more • Data Mining SSAS http://msdn.microsoft.com/en-us/library/bb510516.aspx • Microsoft SQL Server 2012 SP1 Data Mining Add-ins for Microsoft Office 2013 • http://www.microsoft.com/en-us/download/details.aspx?id=35578. • Mahout on Windows Azure - Machine Learning Using Microsoft HDInsight http://social.technet.microsoft.com/wiki/contents/articles/15102.mahout-on-windows-azure-machine-learning-using-microsoft-hdinsight.aspx