210 likes | 337 Views
Planning Warehouse Storage. Chapter 9. Data Partitioning. Ease of: - Restructuring - Reorganization - Removal - Recovery - Monitoring - Management - Archiving - Indexing. Breaking up a data into separate physical units that can be handled
E N D
Planning Warehouse Storage Chapter 9
Data Partitioning • Ease of: - Restructuring - Reorganization - Removal - Recovery - Monitoring - Management - Archiving - Indexing • Breaking up a data into separate physical units that can be handled independently Add Order table Drop Other data is not affected
Objects to Partition • Tables: - Fact - Dimension • Indexes
Horizontal Partitioning • Table and index data are split by: - Time - Sales region or person - Geography - Organization - Line of business • Candidate columns appear in WHERE clause • Analysis determines requirement
Vertical Partitioning You may use vertical partitioning when: • Speed of query and update actions is improved by it • Users require access to specific columns • Some data is changed infrequently • Descriptive dimension text may be better moved away from the dimension itself
Partitioning Methods • Range partitioning (Oracle8 and Oracle8i) • Hash partitioning (Oracle8i) • Composite partitioning (Oracle8i)
Star Query Optimization Optimum performance with star schema models 1. Dimensions are queried to create a 2. Cartesian product, computed against 3. Smaller reference table. 4. The result is joined to 5. A fact table to produce a query result.
Star Transformation Market_Table Time_Table Fact_Table Product_Table
Indexing Indexing is used because: • It is huge cost saving, greatly improving performance and scalability • Can replace a full table scan by a quick read of the index followed by a read of only those disk blocks that contain the rows needed
B-Tree Index • Most common type of indexing • Used for high cardinality columns • Designed for few rows returned
Bitmap Indexes • Provide performance benefits and storage savings • Store values as 1s and 0s • Use instead of B-tree indexes when: - Tables are large - Columns have relatively low cardinality
Oracle8 and Oracle8i Index Enhancements • Oracle8 index enhancements: - Partitioned index - Index-organized tables • Oracle8i index enhancements: - Function-based index - New bitmap index improvements - Online index build and rebuild - Descending index - Statistics can be collected when an index is created
Protecting the Database • RAID is essential with large databases • RAID improves: - Reliability - Storage management • There are different levels of RAID • You can eliminate disk contention with disk striping
RAID 0: Striping Disk array controller The file is written to a four-drive disk array: • Block 1 on Drive 1 • Block 2 on Drive 2… • Block 5 in another sector on Drive 1
RAID 0: Striping • Benefits: - Good for simultaneous reads and writes - No redundancy - Scalable • Limitations: - Not recommended for mission-critical systems - No recovery from data loss - One bad sector affects entire disk of data
RAID 1: Mirrored Disk Disk array controller Disk 1 Disk 1 Mirror Disk 2 Disk 2 Mirror Copy of files stored on mirror disk
RAID 1: Mirrored Disk • Benefits: - Complete data redundancy - No performance penalty - Improves reads - Scalability • Limitations: - Highest cost of all RAID configurations
RAID 5: Independent Disk Array Disk array controller Disk 1 Disk 2 Disk 3 Disk 4 Data striped with parity across array
RAID 5: Independent Disk Array • Benefits: - Efficient data integrity - Data reconstruction - Multiple concurrent seeks across array - Scalable • Limitations: - Disk overhead - Data write rate
Backup • Plan at the design stage • Use hot backups for VLDBs • Back up necessary components: - Fact and dimension data - Warehouse schema - Metadata schema - Metadata • Export/Import utility - Disk space - Time
Summary This lesson discussed the following topics: • Explaining vertical partitioning and horizontal partitioning • Distinguishing the different types of partitioning methods • Distinguishing between B-tree index and bitmap index • Understanding why warehouse typically uses RAID 0, or 5 to protect the database