Designing VLDBs

Designing VLDBs Praveen Srivatsa Director|AstrhaSoft Consulting blogs.asthrasoft.com/praveens|praveens@asthra.net

Table and Index Partitioning • Designed for … • VLDB w/very large tables (100’s GB) • Large machines with 8, 16, 32, and more ‘real’ CPUs • Replacing partitioned views where the partitions are in a single database • Improving large data set manageability • Loading • Efficient dropping of whole partitions

How do you do Partitioning? • Before a table is partitioned two things have to be created, a partition Function and Schema • Partition Function • Each row of an index/table is assigned to a partition (numbered 1, 2, 3, ...) by using a “Partition Function” • SQL Server 2005 supports Range partitioning onlyon a single column • User defines the key column, the number of partitions, and the partition boundary points • Partition Scheme • Each partition for a partition function is mapped to a physical storage location (Filegroup) through a “Partition Scheme”

Rules For Partitions • Restrictions are derived from the principles of no data movement and no scans • All target indexes must exist in the source • All indexes must be aligned (use the same partition function as was used on the table) • Corresponding target index is in the same filegroup as the source • There must be a constraint on the source to ensure all its data “fits” into the target partition

Example Code • CREATE PARTITION FUNCTIONtesttablepart_fn (int) • AS RANGE LEFT FOR VALUES (-1,10, 20, 30) • CREATE PARTITION SCHEMEtesttablepart_ps • AS PARTITION testtablepart_fn TO • ([Filegroup4], [Filegroup1], [Filegroup1], [Filegroup3], [Filegroup2]) • The order of filegroups being displayed is significant • CREATE TABLE Employees (EmpId int, EmpName varchar(50)) ON testtablepart_ps(EmpId)

Partitioning Key restrictions • Must be based on columns in the table • Restrictions similar to index key limitations • Only 1 column is permitted • No Text/NText/Image in partition key • No varchar(max) • No timestamp • Only “native types” – no user-defined types • Column values must be deterministic • Computed columns must be persisted (a new feature in 2005) • Maximum 1000 partitions per table in SQL Server 2005 • (practical limitation, not tested beyond 1000) • All partitions of a single table or index must be in a single database • Partitioned views can be used in conjunction with table partitioning to span databases and servers

Partitioning Benefits • Allow easy management of very large tables and indexes (data scale-up) • For example Fast Insert or Delete of large quantities of data (per-partition) • Index defragmentation or rebuild on one partition using ALTER INDEX … PARTITION (<num>) • In some cases may experience some performance improvements for some operators, e.g. hash join would be more efficient • take advantage of collocation • joins of large tables

Partitioning Benefits • Without interfering with access to the rest of a table: • Add 100's of millions of rows to a table that already has billions of rows • Delete 100's of millions of rows from that table • Solution … use partitions with ranges • Very fast (1 second), just metadata changes • Switch data in and out • Table and partition of another table • Two partitions of different tables • Bulk data loading (into separate table - then switch into main table) • Index maintenance per partition • Two key best practices to improve manageability • Index Alignment (partitioned like the table) • Storage Alignment (table partitions in the same filegroup\file)

Partitioning Benefits • When you have to BCP 100 million rows into a table that already has 600 million row, need to reindex • Table has indexes => load is slow; about 10+ times slower than loading into a heap and creating indexes afterwards • Reindexing made table unavailable • Using a partitioned table, • can load into the table, and index it • while everyone still using the rest of the partitions • When done loading and indexing, just switch it in • Delete is slow – deleting rows is orders of magnitude slower than truncating a multi-GB table • But you don’t want to truncate the table, just delete 20 or 30% of it • With partitioning, just switch out the unneeded partition • Cost of running a utility usually grows linearly with the table size • DBCC CHECKTABLE, use it on a partition at a time

Table Partitioning and Switching • Bring a ‘partition’ of new data in or take a ‘partition’ of old data out of a partitioned table very fast • A table has 12 billion rows divided into 12 months • You want to archive off the 13th month and then bring in the new month • You have partitioned by month • Data is ‘moved’ (not really) between tables by SWITCHing the pointers to physical data locations • DATA IS NOT MOVED or even scanned, so size doesn’t matter • It’s a metadata operation • Takes a second

Example Switch Code • ALTER TABLE table_name1 SWITCH [PARTITION <partition_number1>] TO table_name2 [PARTITION <partition_number2>] • table_name1 is called “source” and table_name2 “target” of the SWITCH • Target table (or partition_number2) is empty

SWITCH Partition • Performance – allows building new and removing old partition fast • Availability – allows adding new and removing old partition with minimal downtime Note: Schema modification lock is acquired for the duration of the ALTER TABLE … SWITCH • Per-partition manageability – enables taking single partition out and run utility on it • CREATE INDEX, ALTER INDEX … REBUILD • And other operations that are not intended to work on individual partition numbers, DBCC CHECKTABLE • Efficiently supports Sliding Window scenario

Indexes can also be Partitioned • Syntax is the same as for tables • “ON PartitionScheme(col)” • They may be partitioned differently from the base table • However, index alignment is the best practice • If an index uses a similar partition function and same partitioning key as the base table, then the index is “aligned” • Similar means the same number of partitions and same boundary points • One-to-one correspondence between data in table and index partition • All index entries in one partition map to data in a single partition of the base table

Can I Change a Partition? • Yes, but do you want to? Not if there are already 100’s of millions of rows • Plan partitioning very carefully and avoid a huge I/O problem later • But if you have to add or remove a partition: • SPLIT – adding a partition • Taking 1 partition and splitting it into 2 partitions • ALTER PARTITION FUNCTION pfname() SPLIT RANGE (new_boundary_value) • MERGE – removing a partition • Joining 2 partitions into 1 • ALTER PARTITION FUNCTION pfname() MERGE RANGE (old_boundary_value) • All affected tables are locked during the operation by EXCLUSIVE lock • Logging is in effect similarly as in INSERT INTO … SELECT FROM • No data moves in or out of the table; only rows of tables and indexes are moved from one partition to another • This is the I/O killer, so plan ahead of time to avoid doing this

Partitioning Best Practices • Plan Plan Plan ahead and save yourself from pain • Ensure minimal data movement for • SPLIT • MERGE • Have many partition functions so any alteration of one function will have limited impact • Use storage and index alignment • Empty partition should be the one that is “SPLIT” • Watch how filegroups are assigned to partitions • Spread the filegroups over many drives – let SQL Server handle the I/O distribution (e.g. each partition on a separate disk may not be the best choice)

Feedback / QnA • Your Feedback is Important! Please take a few moments to fill out our online feedback form For detailed feedback, use the form at http://www.connectwithlife.co.in/vtd/helpdesk.aspx Or email us at vtd@microsoft.com • Use the Question Manager on LiveMeeting to ask your questions now!

Designing VLDBs

Designing VLDBs

Presentation Transcript

Designing

Designing Samples

Designing a

Designing Output

Designing Interaction

Designing Graphics

Graphic Designing Services.Logo Designing Services

Differentiating Web Designing From Print Designing

Web designing and Logo Designing

Web Designing

Designing Visuals

web designing,graphic designing

Designing

Designing Experiments

Software Designing

WEBSITE DESIGNING

SQL Server 2005 features for VLDBs

Creative designing services – Graphic Designing Services

Website Designing - Website Designing Company

Catalog Designing Graphic Designing

Web designing and UI designing

Designing