200 likes | 507 Views
Data Wharehousing OLAP Data Mining. S. Costantini Università degli Studi di L’Aquila stefcost@di.univaq.it. Ringraziamenti (Acknowledgment).
E N D
Data Wharehousing OLAP Data Mining S. Costantini Università degli Studi di L’Aquila stefcost@di.univaq.it
Ringraziamenti (Acknowledgment) • Part of this material is taken from: Database Systems: The Complete Book, by Hector Garcia-Molina, Jeff Ullman, and Jennifer Widom, edited by Prentice-Hall. • URL: http://www-db.stanford.edu/~ullman/dscb.html S. Costantini / Data Wharehousing
Cos’è in sostanza un Data Wharehouse? • E’ una vista materializzata • Aggiornata a intervalli stabiliti (a seconda dell’applicazione) • E’ un cosiddetto “sistema di integrazione di dati” perché può contenere dati provenienti da vari database (detti “sorgenti”) S. Costantini / Data Wharehousing
Perché i Data Warehouse? • Perché le query di analisi statistica ed esame dei dati per estrarne varie informazioni (dette query “OLAP”, vedi seguito) sono pesanti e diminuiscono troppo la performance del sistema. Però non necessitano della versione più aggiornata dei dati. S. Costantini / Data Wharehousing
Perché i Data Wharehouse • Allora conviene separare le query usuali dalle query OLAP, creando per queste ultime un Data Wharehouse • Per le query OLAP il modello relazionale non è ottimale, quindi nel creare un Data Wharehouse il modello dei dati viene modificato. S. Costantini / Data Wharehousing
Observation • Traditional database systems are tuned to many, small, simple queries. • Some new applications use fewer, more time-consuming, complex queries. • New architectures have been developed to handle complex “analytic” queries efficiently. S. Costantini / Data Wharehousing
The Data Warehouse • The most common form of data integration. • Copy sources into a single DB (warehouse) and try to keep it up-to-date. • Usual method: periodic reconstruction of the warehouse, perhaps overnight. • Frequently essential for analytic queries. S. Costantini / Data Wharehousing
OLTP • Most database operations involve On-Line Transaction Processing (OTLP). • Short, simple, frequent queries and/or modifications, each involving a small number of tuples. • Examples: Answering queries from a Web interface, sales at cash registers, selling airline tickets. S. Costantini / Data Wharehousing
OLAP • Of increasing importance are On-Line Application Processing (OLAP) queries. • Few, but complex queries --- may run for hours. • Queries do not depend on having an absolutely up-to-date database. S. Costantini / Data Wharehousing
OLAP Examples • Amazon analyzes purchases by its customers to come up with an individual screen with products of likely interest to the customer. • Analysts at Wal-Mart look for items with increasing sales in some region. S. Costantini / Data Wharehousing
Data Warehouses • Doing OLTP and OLAP in the same database system is often impractical • Different performance requirements • Analysis queries require data from many sources • Solution: Build a “data warehouse” • Copy data from various OLTP systems • Optimize data organization, system tuning for OLAP • Transactions aren’t slowed by big analysis queries • Periodically refresh the data in the warehouse S. Costantini / Data Wharehousing
Common Architecture • Relational Databases handle OLTP. • Local databases copied to a central warehouse overnight. • Analysts use the warehouse for OLAP. S. Costantini / Data Wharehousing
Definition of data warehousing A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process. S. Costantini / Data Wharehousing
Loading the Data Warehouse Data is periodically extracted Data is cleansed and transformed Data Staging Area Users query the data warehouse Source Systems (OLTP) Data Warehouse S. Costantini / Data Wharehousing
Data Mining • Data mining is a popular term for queries that summarize big data sets in useful ways. • Examples: • Clustering all Web pages by topic. • Finding characteristics of fraudulent credit-card use. S. Costantini / Data Wharehousing
Data Warehouse Enterprise “Database” Customers Orders Transactions Vendors Etc… Etc… • Data Miners: • “Farmers” – they know • “Explorers” - unpredictable Copied, organized summarized Data Warehouse Data Mining S. Costantini / Data Wharehousing
Market-Basket Data • An important form of mining from relational data involves market baskets = sets of “items” that are purchased together as a customer leaves a store. • Summary of basket data is frequent itemsets = sets of items that often appear together in baskets. S. Costantini / Data Wharehousing
Data Mining Flavors • Directed – Attempts to explain or categorize some particular target field such as income or response. • Undirected – Attempts to find patterns or similarities among groups of records without the use of a particular target field or collection of predefined classes. S. Costantini / Data Wharehousing
Data Mining Examples in Enterprises • Government • Track down criminals (Police also) • Treasury Dept – suspicious int’l funds transfer • Phone companies • Supermarkets & Superstores • Mail-Order, On-Line Order S. Costantini / Data Wharehousing
Data Mining Examples in Enterprises • Financial Institutions • Insurance Companies • Web sites • Many others… S. Costantini / Data Wharehousing