170 likes | 327 Views
Chapter 18. Ali Parandian Ashira Khera. OLAP. ….stands for On-Line Analytical Processing ….a series of protocols used mainly for business reporting …. Using OLAP, businesses can analyze data in all manner of different ways, including budgeting, planning, simulation,
E N D
Chapter 18 Ali Parandian Ashira Khera
OLAP ….stands for On-Line Analytical Processing ….a series of protocols used mainly for business reporting …. Using OLAP, businesses can analyze data in all manner of different ways, including budgeting, planning, simulation, Data warehouse reporting, and trend analysis ….Multidimensional view of the data allowing a manager to Pull down data from an OLAP database in broad or specific terms
Data Warehousing • is a repository of information gathered from multiple sources, stored under a unified schema, at a single site. • A single data model and query language can be used to retrieve data from the data warehouse. • Accessing information for decision support is separate from operational system of an organization, hence providing fast retrieval of data without any slow down. • Once gathered, data is stored for a long time, hence providing access to historical data.
Data-Warehouse Architecture • Data Sources (operational systems and flat files)・ • Staging Area (where data sources go before the warehouse)・ • Warehouse (metadata, summary data, and raw data)・ • Users (analysis, reporting, and mining)
Data Warehouse Schema Store_id Item_id City State country Itemname Color size Item_id Store_id Customer_id date Number price Fact Table Customer_id date Name Street State zip Month Quarter year Descriptor Descriptor Star Schema
Cube Example Extended Aggregation SELECT Type, Store, SUM(Number) as Number FROM Pets GROUP BY type,store WITH CUBE
ROLL UP Example SELECT Time, Region, Department, sum(Profit) AS Profit FROM sales GROUP BY ROLLUP(Time, Region, Dept)
Cube and Rollup in a nutshell ROLLUP enables a SELECT statement to calculate multiple levels of subtotals across a specified group of dimensions. It also calculates a grand total. CUBE enables a SELECT statement to calculate subtotals for all possible combinations of a group of dimensions. It also calculates a grand total
Data Mining The term data mining refers loosely to the process of semi-automatically analyzing large databases to find useful pattern. Data Mining attempts to discover rules and patterns from data • Difference between Data Mining and AI • AI uses large volumes of data stored on the disk • Data Mining deals with knowledge discovery in the • database
Data Mining Continued……….. • Data mining consists of five major elements: • Extract, transform, and load transaction data onto the data warehouse system. • Store and manage the data in a multidimensional • database system. • Provide data access to business analysts and • information technology professionals. • Analyze the data by application software. • Present the data in a useful format, such as a graph or • table. graph or table
Prediction: Example: Person applying for a credit card Credit card company makes prediction based on known attributes Such as age, income, credit history etc. to predict credit risks. • Association: Example: Customer purchasing books online will have a tendency To buy a likely merchandise at the same time. Associations, clusters, classes and sequential patterns are examples of descriptive patterns. Applications of Data Mining
Pre Processing and Post Processing of data is • extremely time consuming. Weaknesses of Data Mining • Data Dredging: Data dredging is the scanning of the data for any relationships, and then when one is found coming up with an interesting explanation. For example, if we test 100 random patterns, it is expected that one of them will be "interesting" with a statistical significance at the 0.01 level. • There is no cross-industry standard practice by which • classification functions deal with ties in the data.
Other Types Of Mining • Data Visualization: • It is a system to examine large volumes of data and to • Detect patterns visually such as • Maps, charts, and other graphical representations • Data visualization systems do not automatically detect patterns • But provide system support for users to detect patterns.
Decision Trees • In operations research, specifically in decision analysis, • a decision tree (or tree diagram) is a decision support tool that • uses a graph or model of decisions and their possible • consequences, including chance event outcomes, • resource costs, and utility. • A decision tree is used to identify the strategy most likely to reach a goal • In data mining and machine learning, a decision tree is a • predictive model. • An example of decision Trees is classification tree.
Advantages of decision trees • They are simple to understand and interpret • Have value even with little hard data. • Use a white box model • Can be combined with other decision techniques
References 1. http://www.anderson.ucla.edu/faculty/jason.frand/teacher /technologies/palace/datamining.htm