170 likes | 315 Views
SQL Server Sri Lanka User Group Meeting Oct 2013. Best Practices in Loading Large Datasets. Asanka Padmakumara ( BSc,MCTS ). Agenda. Definition of “Large Dataset” Database and table structures for large data sets. Loading data using T-SQL Loading data using SSIS Q & A. A Large Dataset….
E N D
SQL Server Sri Lanka User Group Meeting Oct 2013 Best Practices in Loading Large Datasets Asanka Padmakumara (BSc,MCTS)
Agenda • Definition of “Large Dataset” • Database and table structures for large data sets. • Loading data using T-SQL • Loading data using SSIS • Q & A
A Large Dataset…. • A Collection of data sets: • large • complex • difficult to process~ Wikipedia • Large data set to you is depend on your hardware configuration.
A Large table…. • A large table is one that does not perform as desired or one where the maintenance costs have gone beyond pre-defined maintenance periods. • if one user’s activities significantly affect another or if maintenance operations affect other user’s abilities. In effect, this even limits availability. ~Microsoft White Paper (Partitioned Tables and Indexes)
Database and table structures • Try to have multiple file group across multiple disks. • Can backup only the file group. • To get Performance • Data on a separate I/O path. • Index on a separate I/O path • Partition the tables and Indexes • Partition means Break large table into parts. • Easy to insert-Switch partition in to table • Easy to delete- Move partition from table • Can rebuild index only of the partition
Database and table structures • Have Clustered index for the most using column. • Have non Clustered indexes for other most using columns. • Try to have correct data types. • Int, smallint makes a different when no of rows are high.
Loading Data with T-SQL • TSQL Commands and Utilities • BCP • BULK INSERT • OPENROWSET
Loading Data with T-SQL • BCP • bulk copy is an utility program: bcp.exe • copies data between Microsoft SQL Server and a data file in a user-specified format • Can generate format file for data. • performance is improved if the data being imported is sorted according to the clustered index on the table • IN – Insert to table, Out- Export to file • Syntax: bcp AdventureWorks2012.sales.CurrencyRate out F:\DemoData\Currency.dat –c –usa –SSLLAPTOP266\SQL2012
Loading Data with T-SQL • Bulk Insert • Transact-SQL statement.faster than the BCP utility • Unlike BCP, Can’t export data to files. Only insert • you can specify up to 1024 fields only, If more than that use BCP. • Syntax : BULK INSERT AdventureWorks2012.Sales.SalesOrderDetail FROM 'f:\orders\lineitem.tbl’
Loading Data with T-SQL • OPENROWSET • Alternative to accessing tables in a linked server and is a one-time • SELECT a.* FROM OPENROWSET('SQLNCLI', 'Server=Seattle1;Trusted_Connection=yes;','SELECT Name FROM Department AS a;) • Use BULK keyword for use OPENROWSETfor bulk loading • SELECT a.* FROM OPENROWSET( BULK 'c:\testvalues.txt',FORMATFILE = 'c:\test\values.fmt') AS a; • IGNORE_CONSTRAINTS, IGNORE_TRIGGERS
Loading Data with T-SQL • Considerations…. • Disable Index • Alter index [IXYourIndex] ON YourTable DISABLE • Disable constrain Do not disable clustered index. Table become read-only. • If you disable it have to rebuild index. • Enable Index • ALTER INDEX [IXYourIndex] ON YourTable REBUILD
Deleting Data with T-SQL • Truncate instead of delete.(if possible) • (if not) Move the Partition to new table, then truncate it.-New table should be in same file group • Delete batch by batch avoid growing log file • delete is a single auto-commit transaction • Disable Triggers • Simple recovery mode to minimize growing log
Loading data using SSIS • Buffer size • Can increase buffer size of dataflow task • Default size=10MB, can set up to=100MB • Cachelookup • If same table is looked up more than once inside SSIS package, cache that lookup • Vender specific providers will give you more performance. • Ex: Oracle provider instead of Microsoft Oledb
Loading data using SSIS • Use raw file instead on temp table when using staging table • Avoid using Slowly changing dimension control. • Use merge statement instead of SCD control • Use parallel flow inside dataflow task to multi threading • Minimize usage of blocking ,Partiallyblocking transformations • Ex: Aggregate, Sort, Merge, Merge Join, and Union All
Demo • Demo # 1 • Delete data from partitioned table. • Demo # 2 • Import/Export data using BCP