230 likes | 346 Views
SharePoint 2010 Business Intelligence. Module 6: Analysis Services. Overview. Analysis Services. Lesson: Analysis Services. Introduction ETL OLAP Terms Storage Modes Queries Tools Mining Models. Introduction. Analysis Services provides access to large data sets
E N D
SharePoint 2010 Business Intelligence Module 6: Analysis Services
Overview • Analysis Services
Lesson: Analysis Services • Introduction • ETL • OLAP Terms • Storage Modes • Queries • Tools • Mining Models
Introduction • Analysis Services provides access to large data sets • Running SQL queries against a 100 million row table just doesn’t work • When your data sets get large, you need a better way of handling the load • Online analytical processing (OLAP) provides all your answers with speed! • Analysis services is an OLAP implementation
ETL • The Extract, Transform and Load (ETL) process is vital to the OLAP results • If you input junk, you get junk back out! • All data that enters the OLAP database must be valid or your results could be exponentially wrong! • Make sure that all possible means are employed when ensuring only valid data is entered into the system and that it is entered only once! • Common issue is data inserted more than once which causes invalid results
OLAP Terms • Data Source • A source or destination of data • Fact table • A table that contains numerical information with keys that map to the dimension keys • Measure • A single numerical value in a fact table • Dimension • A set of labels/attributes that describe the measures in a fact table • Cube • A set of aggregations of all the dimensions and facts complied together to produce valuable information • Perspective • A subset of dimensions and measures specific to some group of users • Data Modeling • Using statistical analysis to determine patterns in large sets of data
Tools • Visual Studio / BI Workbench • Several project templates to facilitate the creation of cubes, dimensions and work with data sources/views • IntelliCube • A heuristic analysis tool for automatic generation of a cube based on its data and relationships • External Viewer for • Outliers • Candidate keys • Value distributions • Patterns
Business Intelligence Development Studio • BI Studio is just Visual Studio with project templates installed • Project templates provide item templates and wizards • New 2008 Wizards are much easier to use and more powerful • Context sensitive functionality keeps toolbars and menus trimmed to what elements you need based on what you are looking at • Because it is Visual Studio it is fully customizable • 3rd party add-ons can make it even more powerful
Designing an Analysis Services Database • BI Studio will be used to create new AS Databases • Steps include: • Define data source • Define data view • Create a new cube • Define fact tables • Define dimensions (data and time) • Define measures • Aggregate/Run the cube
Key Performance Indicators (KPI) • KPIs are used to show very simply whether a target is being accomplished • Components include the Goal, Value, Status, and Trend • Example: Sales and Quality targets • Analysis Services allows you to build MDX expressions off of Cube data to build KPIs • KPI values can be queried from client applications • Allows for visual display of meaningful data
Actions • Actions • Allows client application users to be able to interact with what the data means • Example: Browse to a customer or product via URL • Types of Actions: • CommandLine, DataSet, Drillthrough, Html, Custom, Report, URL • Define and assign to objects in the Cube
Perspectives • Perspectives work similar to views in a relational database • Shows different users the data they need to see for a particular role they may be in • Used to reduce complexity of cube data • Not meant to be security mechanism for data
Storage Modes • Data in an Analysis Services database is stored differently than a relational database • Optimized storage provides the OLAP query performance • Partition • MOLAP – multidimensional OLAP (fact data and aggregations are stored in special format) • ROLAP – Relational OLAP (fact data and aggregations remain in relational database) • HOLAP – Hybrid OLAP (fact data is relational, aggregations are stored in special format) • Dimension (dimension attributes only) • MOLAP – stored in special format • ROLAP – stay in relational format
Querying Cubes • SQL is not used in OLAP databases • MDX (multidimensional queries) is used for querying cubes • DMX (Data Mining queries) is used for querying data mining models
MDX Queries • Multidimensional Expressions (MDX) are used to query multidimensional data • Some common terms are: • Cell – the space at an intersection of a measure and attribute • Tuple – a unique cell based on a set of attribute members • Set – an order set of tuples with same dimensionality
Calculated Members and Named Sets • Calculated Members are used when you need to determine something at query time • Can be query or session scoped • Value are only stored in memory not on disk • Named Sets are basically predefined MDX queries that can be reused in other queries • Used to group dimension members
Analyzing Data with Data Mining Algorithms • Out of the box, Analysis services provides five algorithms: • Classification (Decision Tree) • Predict one or more discrete variables, based on the other attributes in the dataset • Regression (Time Series) • Predict one or more continuous variables, such as profit or loss, based on other attributes in the dataset • Segmentation (Clustering) • Divide data into groups, or clusters, of items that have similar properties. • Association (Association) • Find correlations between different attributes in a dataset • Sequence analysis (Sequence Clustering) • Summarize frequent sequences or episodes in data, such as a Web path flow
What could go wrong? • Cubes really are simple things to build and utilize • As simple as they are, it is easy to create a cube that has the wrong data • Always validate that the data that is displayed in the Cube is valid and accurate • Never hurts to be overly aggressive when testing Cube data
Lab 1: Analysis Services • Explore Analysis Services
Lab 2: Building a Cube • Explore Cubes and Dimensions
Lab 3: Data Mining Algorithms • Explore Data Mining Models
Review • Your instructor will ask a series of questions on this module
Summary • Extra Large databases are not easily queried for data • Analysis Services is an OLAP tool to manage large databases • Ensure that your ETL process is accurate • Data Mining Algorithms can help you find patterns you didn’t know about before