200 likes | 314 Views
Designing VLDBs. Praveen Srivatsa Director | AstrhaSoft Consulting blogs.asthrasoft.com/praveens | praveens@asthra.net. Table and Index Partitioning. Designed for … VLDB w/very large tables (100’s GB) Large machines with 8, 16, 32, and more ‘real’ CPUs
E N D
Designing VLDBs Praveen Srivatsa Director|AstrhaSoft Consulting blogs.asthrasoft.com/praveens|praveens@asthra.net
Table and Index Partitioning • Designed for … • VLDB w/very large tables (100’s GB) • Large machines with 8, 16, 32, and more ‘real’ CPUs • Replacing partitioned views where the partitions are in a single database • Improving large data set manageability • Loading • Efficient dropping of whole partitions
How do you do Partitioning? • Before a table is partitioned two things have to be created, a partition Function and Schema • Partition Function • Each row of an index/table is assigned to a partition (numbered 1, 2, 3, ...) by using a “Partition Function” • SQL Server 2005 supports Range partitioning onlyon a single column • User defines the key column, the number of partitions, and the partition boundary points • Partition Scheme • Each partition for a partition function is mapped to a physical storage location (Filegroup) through a “Partition Scheme”
Rules For Partitions • Restrictions are derived from the principles of no data movement and no scans • All target indexes must exist in the source • All indexes must be aligned (use the same partition function as was used on the table) • Corresponding target index is in the same filegroup as the source • There must be a constraint on the source to ensure all its data “fits” into the target partition
Example Code • CREATE PARTITION FUNCTIONtesttablepart_fn (int) • AS RANGE LEFT FOR VALUES (-1,10, 20, 30) • CREATE PARTITION SCHEMEtesttablepart_ps • AS PARTITION testtablepart_fn TO • ([Filegroup4], [Filegroup1], [Filegroup1], [Filegroup3], [Filegroup2]) • The order of filegroups being displayed is significant • CREATE TABLE Employees (EmpId int, EmpName varchar(50)) ON testtablepart_ps(EmpId)
Partitioning Key restrictions • Must be based on columns in the table • Restrictions similar to index key limitations • Only 1 column is permitted • No Text/NText/Image in partition key • No varchar(max) • No timestamp • Only “native types” – no user-defined types • Column values must be deterministic • Computed columns must be persisted (a new feature in 2005) • Maximum 1000 partitions per table in SQL Server 2005 • (practical limitation, not tested beyond 1000) • All partitions of a single table or index must be in a single database • Partitioned views can be used in conjunction with table partitioning to span databases and servers
Partitioning Benefits • Allow easy management of very large tables and indexes (data scale-up) • For example Fast Insert or Delete of large quantities of data (per-partition) • Index defragmentation or rebuild on one partition using ALTER INDEX … PARTITION (<num>) • In some cases may experience some performance improvements for some operators, e.g. hash join would be more efficient • take advantage of collocation • joins of large tables
Partitioning Benefits • Without interfering with access to the rest of a table: • Add 100's of millions of rows to a table that already has billions of rows • Delete 100's of millions of rows from that table • Solution … use partitions with ranges • Very fast (1 second), just metadata changes • Switch data in and out • Table and partition of another table • Two partitions of different tables • Bulk data loading (into separate table - then switch into main table) • Index maintenance per partition • Two key best practices to improve manageability • Index Alignment (partitioned like the table) • Storage Alignment (table partitions in the same filegroup\file)
Partitioning Benefits • When you have to BCP 100 million rows into a table that already has 600 million row, need to reindex • Table has indexes => load is slow; about 10+ times slower than loading into a heap and creating indexes afterwards • Reindexing made table unavailable • Using a partitioned table, • can load into the table, and index it • while everyone still using the rest of the partitions • When done loading and indexing, just switch it in • Delete is slow – deleting rows is orders of magnitude slower than truncating a multi-GB table • But you don’t want to truncate the table, just delete 20 or 30% of it • With partitioning, just switch out the unneeded partition • Cost of running a utility usually grows linearly with the table size • DBCC CHECKTABLE, use it on a partition at a time
Table Partitioning and Switching • Bring a ‘partition’ of new data in or take a ‘partition’ of old data out of a partitioned table very fast • A table has 12 billion rows divided into 12 months • You want to archive off the 13th month and then bring in the new month • You have partitioned by month • Data is ‘moved’ (not really) between tables by SWITCHing the pointers to physical data locations • DATA IS NOT MOVED or even scanned, so size doesn’t matter • It’s a metadata operation • Takes a second
Example Switch Code • ALTER TABLE table_name1 SWITCH [PARTITION <partition_number1>] TO table_name2 [PARTITION <partition_number2>] • table_name1 is called “source” and table_name2 “target” of the SWITCH • Target table (or partition_number2) is empty
SWITCH Partition • Performance – allows building new and removing old partition fast • Availability – allows adding new and removing old partition with minimal downtime Note: Schema modification lock is acquired for the duration of the ALTER TABLE … SWITCH • Per-partition manageability – enables taking single partition out and run utility on it • CREATE INDEX, ALTER INDEX … REBUILD • And other operations that are not intended to work on individual partition numbers, DBCC CHECKTABLE • Efficiently supports Sliding Window scenario
Indexes can also be Partitioned • Syntax is the same as for tables • “ON PartitionScheme(col)” • They may be partitioned differently from the base table • However, index alignment is the best practice • If an index uses a similar partition function and same partitioning key as the base table, then the index is “aligned” • Similar means the same number of partitions and same boundary points • One-to-one correspondence between data in table and index partition • All index entries in one partition map to data in a single partition of the base table
Can I Change a Partition? • Yes, but do you want to? Not if there are already 100’s of millions of rows • Plan partitioning very carefully and avoid a huge I/O problem later • But if you have to add or remove a partition: • SPLIT – adding a partition • Taking 1 partition and splitting it into 2 partitions • ALTER PARTITION FUNCTION pfname() SPLIT RANGE (new_boundary_value) • MERGE – removing a partition • Joining 2 partitions into 1 • ALTER PARTITION FUNCTION pfname() MERGE RANGE (old_boundary_value) • All affected tables are locked during the operation by EXCLUSIVE lock • Logging is in effect similarly as in INSERT INTO … SELECT FROM • No data moves in or out of the table; only rows of tables and indexes are moved from one partition to another • This is the I/O killer, so plan ahead of time to avoid doing this
Partitioning Best Practices • Plan Plan Plan ahead and save yourself from pain • Ensure minimal data movement for • SPLIT • MERGE • Have many partition functions so any alteration of one function will have limited impact • Use storage and index alignment • Empty partition should be the one that is “SPLIT” • Watch how filegroups are assigned to partitions • Spread the filegroups over many drives – let SQL Server handle the I/O distribution (e.g. each partition on a separate disk may not be the best choice)
Feedback / QnA • Your Feedback is Important! Please take a few moments to fill out our online feedback form For detailed feedback, use the form at http://www.connectwithlife.co.in/vtd/helpdesk.aspx Or email us at vtd@microsoft.com • Use the Question Manager on LiveMeeting to ask your questions now!