70 likes | 198 Views
Improve SSIS delta loads using hashing techniques. How to spot the unique. Find the Fault!. What is a hash function. A hash function is any algorithm that maps data of arbitrary length to data of a fixed length
E N D
Improve SSIS delta loads using hashing techniques How to spot the unique
What is a hash function • A hash function is any algorithm that maps data of arbitrary length to data of a fixed length • IE: SELECT Hashbytes( 'MD5','The quick brown fox jumps over the lazy dog') = 0x9E107D9D372BB6826BD81D3542A419D6 (16 Bytes)
Why is Hashing important in SSIS? • Identification of uniqueness in data that has no keys • Identification of changes in large string data • Ability to minimise the buffer usage in lookup transformations • Can be applied against any data source
Known problems and obstacles • Hash Collisions • Additional development competency • Additional evaluation of data sources
Can we make it smaller? • Conversion to BIGINT • SELECT Convert(BIGINT,Hashbytes( 'MD5','The quick brown fox jumps over the lazy dog')) = 7770993271616313814 (8 Bytes)