460 likes | 809 Views
Big Data Working with Terabytes in SQL Server. Andrew Novick www.NovickSoftware.com. Agenda. What’s Big? Concerns ETL/Load Performance Query Performance Backup/Restore Performance Architecture Solutions. Introduction. Andrew Novick – Novick Software, Inc.
E N D
Big DataWorking with Terabytes in SQL Server Andrew Novick www.NovickSoftware.com
Agenda • What’s Big? • Concerns • ETL/Load Performance • Query Performance • Backup/Restore Performance • Architecture • Solutions
Introduction • Andrew Novick – Novick Software, Inc. • Business Application Consulting • SQL Server • .Net • www.NovickSoftware.com • Books: • Transact-SQL UDFs • SQL 2000 XML Distilled
SQL Pass 2008 • November 18-21 – Seattle
What’s Big? • 100’s of gigabytes and up to 10’s of terabytes • 100,000,000 rows an up to 100’s of Billions of rows
Big Scenarios • Data Warehouse • Very Large OLTP databases (usually with reporting functions)
Big Hardware • Multi-core 8-64 • RAM 16 GB to 256 GB • SAN’s or direct attach RAID • 64 Bit SQL Server
Concerns • Load Speed (ETL) • Query Speed • Data Management • Backup / Restore • DBCC CHECKDB, remove Fragmentation
Architecture What do we have to work with?
Logical Disk System – Windows Drives Drive C: Drive D: Drive E: Physical IO - subsystem Disk Disk Disk Disk Disk Disk SQL Server Storage Architecture SQL Server Storage Table1 Table2 FileGroupA FileGroupB FileA1 FileB1 FileB2
Solution to what? • Load Speed (ETL) • Query Speed • Data Management • Backup / Restore • DBCC CHECKDB, remove Fragmentation
Solutions • Use Multiple FileGroups/Files • Spread Data to maximize resource use • Sliding Window if there is a time dimension • Partitioned Tables and/or Views • ETL – Insert into empty unindexed tables • Use READ_ONLY FileGroups to minimize maintenance needs.
I/O Performance • Little has changed in 50 years • Watch out for bottlenecks in the I/O Path • Memory reduces the need for I/O Disks can only do so many I/O operations per second The more disk heads you have the higher the I/O throughput.
At 3 PM on the 1st of the month:Where do you want your data to be?
Sliding Window Always There Data Temporal Data 2008-01 Temporal Data 2008-02 Temporal Data 2008-03 Temporal Data 2008-04 Temporal Data 2008-05
Read_Only FileGroups • Require only one Backup • Don’t require page or row locks • Don’t require maintenance • The ALTER requires exclusive access to the database before SQL 2008 ALTER DATABASE <database> MODIFY FILEGROUP <filegroup> SET READ_ONLY
Concern - Load Performance (ETL) • 4 Hour maximum window for any load • Load into large indexed tables is unacceptably long. • Example: 2 million row insert into 400 million row table with 10 indexes took 12 hours.
Concern – Query Performance • Users have little patience • Data warehouse Queries • Frequent small to medium to support UI • Less frequent large queries on fact tables may access 10’s of GB
Partitioned Views CREATE VIEW Fact AS SELECT * FROM Fact_20080405 UNION ALL SELECT * FROM Fact_20080406 ALTER TABLE Fact_20080405 ADD CONSTRAINT CK_FACT_20080405_Date CHECK (FactDate >= ‘2008-04-05’ and FactDate < ‘2008-04-06’ Available in SQL Server Standard Available in SQL Server 2000 Created like any view Check constraints tell SQL Server which data is in which table
Partitioned View - 2 • Looks to a query like any table or view • Can take advantage of parallel execution. • Limited to 256 tables • Can cross servers (Performance Warning) SELECT FactDate, ….. FROM Fact WHERE CustID=334343 AND FactDate = ‘2008-04-05’
Logical Disk System – Windows Drives Drive C: Drive D: Drive E: Physical IO - subsystem Disk Disk Disk Disk Disk Disk Partitioned View SQL Server Storage View Fact Table1 Table2 Fact_20080330 Fact_20080401 Fact_20080401 Fact_20080331 FGF1 FGF1 FGF2 FGF2 FGF3 FGF3 FGF4 FGF4 FileGroupA FileGroupB FileA1 FileB1 FileB2 F1 F1 F2 F2 F3 F3 F4 F4
Partition Elimination • The query compiler can eliminate partitions from consideration in the plan • Partition elimination happens at query compile time. • Values matching the partitioning column must be constants to allow partition elimination.
Partitioned Tables SQL Server Enterprise SQL Server 2005 and Above Require a non-null partitioning column Check constraints tell SQL Server what data is in each parturition All tables are partitioned!
Partitioned Tables 2 • Partition Function • Defines how to split data • Partition Scheme • Defines where to store each range of data CREATE Partitioned View Fact_PF(smalldatetime) RANGE RIGHT FOR VALUES (‘2001-07-01’, ‘2001-07-02’) CREATE PARTITION SCHEME Fact_PF AS PARTITION Fact_pf TO(PRIMARY, FG_20010701, FG_20010702)
Logical Disk System – Windows Drives Drive C: Drive D: Drive E: Physical IO - subsystem Disk Disk Disk Disk Disk Disk Partitioned Table SQL Server Storage Table Fact Table1 Table2 Fact.$Partition=1 Fact.$Partitoin=3 Fact.$Partition=4 Fact.$Partition=2 FileGroupA FileGroupB FGF1 FGF2 FGF3 FGF4 FileA1 FileB1 FileB2 F1 F2 F3 F4
Partitioning Goals • Adequate Import Speed • Maximize Query Performance • Make use of all available resources • Data Management • Migrate data to cheaper resources • Delete old data easily
Achieving Load Speed • Insert into empty tables • Index and add foreign keys after the insert • Add the Slices to • Partitioned Views • Partitioned Tables
Achieving Query Speed • Eliminate access to partitions during query compile • All disk resources should be used • Parallel access • All available memory should be used • All available CPUs should be used • Parallel query
Solution • Partition at a sufficiently high grain • Spread dimension data to all useable disks • Separate Data and Index FileGroups • Multiple files per FileGroup • Spread Fact data by partition key to all useable disks • Rotate file locations to maximize dispersion
Concern – Data Management (Backup) • Let’s say you have a 10 TB database. • Now back that up.
Backup Calculation • 10 TB = 10000 GB • Typical Backup speed • Low end 1 GB per minute • High end 10 GB per minute • At 10 GB/Minute Who’s got 1000 minutes?
Achieving Backup Performance • Backup less! • Maintain data in a READ_ONLY state • Compress Backups
Partial Backup • Partial Base • Backs up read_write filegroups • Partial Differential • Differential backup of read_write filegroups BACKUP DATABASE <db name> READ_WRITE_FILEGROUPS ….. BACKUP DATABASE <db name> READ_WRITE_FILEGROUPS WITH DIFFERENTIAL ….
Maintenance Operations • Maintain only READ_WRITE data • DBCC CHECKFILEGROUP • ALTER INDEX • REBUILD PARTITION = • REORGANIZE PARTITION = • Avoid SHRINK
SQL Server 2008 – What’s New • Row, page, and backup compression • Filtered Indexes • Optimization for star joins • MERGE T-SQL DML • Resource Governor • Fewer operations require exclusive access to the database
New England Visual Basic Pro • Focused on VB.Net development • Meetings @ MS Waltham – MPR C • 1st Thursday - 6:15 to 8:30 PM • Sept 4 – Jim O’Neil – ASP.Net Dynamic Data • Sept 25 – Chris Hammond – DotNetNuke • Oct 2 – Kathleen Dollard – XML Litterals in VB 9 • Nov 6 – Joe Stagner – Stupid Hacker Tricks and How 2 Defend • Feb 5 ’09 – Joe Hill – Novell – Mono/VB/etc…. • www.NEVB.com
Thanks for Coming Andrew Novick • anovick@NovickSoftware.com www.NovickSoftware.com