130 likes | 240 Views
Data Warehousing. Alex Ostrovsky CS157B Spring 2007. Introduction. Data warehouse is a main repository of corporate data Multiple databases are employed per specific purpose
E N D
Data Warehousing Alex Ostrovsky CS157B Spring 2007
Introduction • Data warehouse is a main repository of corporate data • Multiple databases are employed per specific purpose • Contains raw events and unprocessed data, although separate tables might exist for processed information displaying meaningful data
What is it used for? • Data analysis • Data mining • Complex queries with multiple table join • Forecasting • Historical reporting • OLAP (Online Analytical Processing)
Key Concepts and Features • Data is not required to be heavily normalized • Transaction Processing is done mostly offline, thus processing time is not very critical. Although, this might depend on amount of data, normalization, query complexity, and application specifications.
Key Concepts and Features (cont.) • Unlike regular OLTP real-time databases data is subject-oriented • Non-volatile, i.e. data is essentially stored forever without being pruned or deleted. • Heavily integrated: contains data from majority of organization’s applications • Time-variant: most of the data has some time reference for the purpose of producing the reports
Types of data warehousing DBs • Offline operational database: similar to regular data replication. Used to minimize the impact of queries on a running primary operational system • Offline data warehouse: heavily integrated, reporting-oriented warehouse databases which are updated with data from operational databases on regular time intervals
Types of data warehousing DBs (cont) • Real-time data warehouse: database data is updated instantaneously as soon as transaction happens • Integrated data warehouse: database is integrated with primary operational system for immediate decision making and reporting.
Benefits of Data Warehousing • No need to stress operational database with complex queries • Separation of processing and business logic • Very flexible, multiple distinct relations can be defined from a set of data • Can be customer or object specific • Persistent – once result is computed from the raw events, it doesn’t need to be recomputed again, giving faster response time on subsequent queries.
Dangers of Data Warehousing • Heavy processing requires physically separate database machines for warehousing and OLTP • Must be optimized for novice users, complex queries might take a very long time • Much more complex multidimensional design compared to regular relational databases • Errors in computational logic can cause serious financial losses and computational recalculations. • Data representation • Relatively difficult to perform data migration
Database Design • Data warehousing databases mostly utilize complex multidimensional design • Relationships must be meaningful and represent clear patterns and trends of unprocessed data. More data and relationships you have more dimensions database will have. • Information is viewed along one common dimensional position. Can be thought of as intersection of a few planes.
References • http://en.wikipedia.org/wiki/Data_warehouse • http://en.wikipedia.org/wiki/OLAP • http://dmoz.org/Computers/Software/Databases/Data_Warehousing/ • http://dmoz.org/Computers/Software/Databases/Data_Warehousing/Articles/ • http://en.wikipedia.org/wiki/Multidimensional_database • http://www.olapreport.com/market.htm