280 likes | 508 Views
Business Intelligence Fundamentals: Data Mining. Ola Ekdahl IT Mentors. Introducing Data Mining Integration with SQL Server 2008 Components Data Mining Programmability. Agenda. Where Are We?. Data Sources. Data Marts. Staging Area. Manual Cleansing. Data Warehouse. Module Overview.
E N D
Business Intelligence Fundamentals: Data Mining Ola Ekdahl IT Mentors
Business Intelligence Fundamentals: Data Mining Introducing Data Mining Integration with SQL Server 2008 Components Data Mining Programmability Agenda
Where Are We? Data Sources Data Marts Staging Area Manual Cleansing Business Intelligence Fundamentals: Data Mining Data Warehouse
Business Intelligence Fundamentals: Data Mining Module Overview • Introducing Data Mining • Integration with SQL Server 2008 Components • Data Mining Programmability
Business Intelligence Fundamentals: Data Mining Introducing Data Mining • Purpose of Data Mining • Business Scenarios • SQL Server 2008 Data Mining • Data Preparation • Data Mining Process • Data Mining Visualization
Business Intelligence Fundamentals: Data Mining Purpose of Data Mining • Addresses the problem of too much data and not enough information • Enables data exploration, pattern discovery, and pattern prediction—which lead to knowledge discovery • Forms a key part of a BI solution
Business Intelligence Fundamentals: Data Mining Business Scenarios • Identifying responsive customers/unresponsive customers (also known as churn analysis) • Detecting fraud • Targeting promotions • Managing risk • Forecasting sales • Cross-selling • Segmenting customers
Business Intelligence Fundamentals: Data Mining SQL Server 2008 Data Mining • Hides the complexity of an advanced technology • Includes full suite of algorithms to automatically extract information from data • Handles large volumes of data and complex data • Data can be sourced from relational and OLAP databases • Uses standard programming interfaces • XMLA • DMX • Delivers a complete framework for building and deploying intelligent applications
Business Intelligence Fundamentals: Data Mining SQL Server 2008 Algorithms • Decision Trees • The most popular data mining technique • Used for classification • Clustering • Finds natural groupings inside data • Sequence Clustering • Groups a sequence of discrete events into natural groups based on similarity • Use this algorithm to understand how visitors use your Web site
Business Intelligence Fundamentals: Data Mining SQL Server 2008 Algorithms • Naïve Bayes • Used for classification in similar scenarios to Decision Trees • Linear Regression • Finds the best possible straight line through a series of points • Used for prediction analysis • Logistic Regression • Fits to an exponential factor • Used for prediction analysis
Business Intelligence Fundamentals: Data Mining SQL Server 2008 Algorithms • Association Rules • Supports market basket analysis to learn what products are purchased together • Time Series • Forecasting algorithm used for short-term or long-term predictions future values from a time series • Use multiple series to predict “what if” scenarios • Neural Network • Used for classification and regression tasks • More sophisticated than Decision Trees and Naïve Bayes, this algorithm can explore extremely complex scenarios
Business Intelligence Fundamentals: Data Mining Data Preparation • Often significant amounts of effort are required to prepare data for mining • Transforming for cleaning and reformatting • Isolating and flagging abnormal data • Appropriately substituting missing values • Discretizing continuous values into ranges • Normalizing values between 0 and 1
Business Intelligence Fundamentals: Data Mining Data Mining Process Design time Process time Query time Mining Model
Microsoft Developer & Platform Evangelism Business Intelligence Fundamentals: Data Mining Data Mining Process Design time Process time Query time Mining Model Data Mining Engine Training Data
Microsoft Developer & Platform Evangelism Business Intelligence Fundamentals: Data Mining Data Mining Process Design time Process time Query time Mining Model Data Mining Engine Data to Predict Predicted Data
Business Intelligence Fundamentals: Data Mining Data Mining Visualization • In contrast to OLTP and OLAP queries, data mining queries typically extract previously unknown information • Visualizations can effectively present data discoveries • SQL Server 2008 provides algorithm-specific visualizations that you can se to • Test and explore models in Business Intelligence Development Studio • Embed into Windows Forms applications • Developers can construct and plug-in custom data mining viewers
Business Intelligence Fundamentals: Data Mining Integration with SQL Server 2008 Components • Integration with SSIS • Integration with SSAS • Integration with SSRS
Business Intelligence Fundamentals: Data Mining Integration with SSIS • Perform data mining directly in the control flow or the data flow pipeline • Configure “intelligent” packages based on data mining query results Enterprise Edition only
Business Intelligence Fundamentals: Data Mining Integration with SSAS • Create data mining models directly from OLAP stores • Create dimensions from data mining models to slice cubes using discovered patterns • Decision Trees • Clustering • Association Rules
Business Intelligence Fundamentals: Data Mining Integration with SSRS • Present data mining results in SSRS reports • Prediction queries • Content queries • Parameterized queries • Use a data mining query builder to easily select results • Apply grouping and aggregation to summarize results • Distribute data mining results by using subscriptions
Business Intelligence Fundamentals: Data Mining Data Mining Programmability • SSAS Data Mining Programmability Overview • Programming Interfaces • Embedding SSAS Data Mining • Extending SSAS Data Mining
Business Intelligence Fundamentals: Data Mining SSAS Data Mining Programmability Overview C++ App VB App .NET App Any App OLE DB ADO ADOMD.NET AMO Any Platform, Any Device WAN XMLA Over TCP/IP XMLA Over HTTP Analysis Server OLAP Data Mining Server ADOMD.NET Data Mining Interfaces .NET Stored Procedures Microsoft Algorithms Third-Party Algorithms
Business Intelligence Fundamentals: Data Mining Programming Interfaces • AMO (Analysis Management Objects) • Administer database objects • Apply security • Manage processing • ADOMD.NET • Connect to SSAS databases • Retrieve and manipulate data • Server ADOMD.NET • Extend DMX by using .NET stored procedures
Business Intelligence Fundamentals: Data Mining Embedding SSAS Data Mining • Validate or repair user entry • Integrate predictions • Targeted advertising • “Those that bought this book also purchased these books” • Embed custom visualizations into Windows Forms applications to allow users to explore and understand model patterns SSAS Data Mining ships with custom visualizations
Extending SSAS Data Mining Stored procedures Enhanced Visual Studio data mining tools Plug-in algorithms Plug-in data mining viewers Business Intelligence Fundamentals: Data Mining
Business Intelligence Fundamentals: Data Mining Classifying Customers Likely to Purchase a Bicycle DEMO
Business Intelligence Fundamentals: Data Mining Resources • www.microsoft.com/sql/technologies/dm • Links to technical resources, case studies, news, and reviews • www.sqlserverdatamining.com • Site designed and maintained by the SQL Server Data Mining team • Live samples • Tutorials • Webcasts • Tips and tricks • FAQ • Data Mining for SQL Server 2005, by ZhaoHui Tang and Jamie MacLennan