1 / 24

SAS Hash Object: My New Best Friend

SAS Hash Object: My New Best Friend. Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor. Program Objectives. disposition 97 obs. campaign 11,346 obs. dec_offers 6,145,029 obs. program 446 obs. disposition category 6 obs. Matching Datasets.

zoltin
Download Presentation

SAS Hash Object: My New Best Friend

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor

  2. Program Objectives disposition 97 obs campaign 11,346 obs dec_offers 6,145,029 obs program 446 obs disposition category 6 obs

  3. Matching Datasets What is the best way to get the fields from the 4 small datasets into the main population of 6.1 million observations? • PROC merge • HASH

  4. PROC merge • Both datasets need to be sorted prior to the merge • Merge datasets • Sort again • Merge again

  5. Sort / Merge Code procsort data=oms_prod.disposition out=disp ; by disposition_id ; run ; procsort data=dec_offers ; by disposition_id ; run ; data dec_match ; merge dec_offers (in=a) disp(keep=disposition_id description touched disposition_category_code in=b) ; by disposition_id ; if a and b ; run ;

  6. Log For Sort / Merge NOTE: Sorting was performed by the data source. NOTE: There were 97 observations read from the data set OMS_PROD.DISPOSITION. NOTE: The data set WORK.DISP has 97 observations and 10 variables. NOTE: Compressing data set WORK.DISP decreased size by 0.00 percent. Compressed is 2 pages; un-compressed would require 2 pages. NOTE: PROCEDURE SORT used (Total process time): real time 0.23 seconds cpu time 0.00 seconds NOTE: There were 6145029 observations read from the data set WORK.DEC_OFFERS. NOTE: The data set WORK.DEC_OFFERS has 6145029 observations and 4 variables. NOTE: Compressing data set WORK.DEC_OFFERS increased size by 58.15 percent. Compressed is 38412 pages; un-compressed would require 24289 pages. NOTE: PROCEDURE SORT used (Total process time): real time 28.44 seconds cpu time 39.81 seconds NOTE: There were 6145029 observations read from the data set WORK.DEC_OFFERS. NOTE: There were 97 observations read from the data set WORK.DISP. NOTE: The data set WORK.DEC_MATCH has 6145029 observations and 7 variables. NOTE: Compressing data set WORK.DEC_MATCH decreased size by 74.94 percent. Compressed is 27499 pages; un-compressed would require 109733 pages. NOTE: DATA statement used (Total process time): real time 42.81 seconds cpu time 42.58 seconds

  7. Sort / Merge Code Continued procsort data=oms_prod.campaign out=camp ; by campaign_id ; run ; procsort data=dec_match ; by campaign_id ; run ; data dec_match2 ; merge dec_match (in=a) camp(keep=campaign_id program_id campaign_code description in=b) ; by campaign_id ; if a and b ; run ;

  8. Log For Sort / Merge NOTE: Sorting was performed by the data source. NOTE: There were 11346 observations read from the data set OMS_PROD.CAMPAIGN. NOTE: The data set WORK.CAMP has 11346 observations and 19 variables. NOTE: Compressing data set WORK.CAMP decreased size by 43.48 percent. Compressed is 143 pages; un-compressed would require 253 pages. NOTE: PROCEDURE SORT used (Total process time): real time 0.67 seconds cpu time 0.43 seconds NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH. NOTE: The data set WORK.DEC_MATCH has 6145029 observations and 7 variables. NOTE: Compressing data set WORK.DEC_MATCH decreased size by 74.94 percent. Compressed is 27496 pages; un-compressed would require 109733 pages. NOTE: PROCEDURE SORT used (Total process time): real time 1:09.07 cpu time 1:59.52 NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH. NOTE: There were 11346 observations read from the data set WORK.CAMP. NOTE: The data set WORK.DEC_MATCH2 has 6145029 observations and 9 variables. NOTE: Compressing data set WORK.DEC_MATCH2 decreased size by 71.53 percent. Compressed is 34306 pages; un-compressed would require 120491 pages. NOTE: DATA statement used (Total process time): real time 51.29 seconds cpu time 51.05 seconds

  9. Sort / Merge Code Continued proc sort data=oms_prod.program out=pgm ; by program_id ; run ; proc sort data=dec_match2 ; by program_id ; run ; data dec_match3 ; merge dec_match (in=a) pgm(keep=program_id name in=b) ; by program_id ; if a and b ; run ;

  10. Log For Sort / Merge NOTE: Sorting was performed by the data source. NOTE: There were 446 observations read from the data set OMS_PROD.PROGRAM. NOTE: The data set WORK.PGM has 446 observations and 16 variables. NOTE: Compressing data set WORK.PGM decreased size by 40.00 percent. Compressed is 6 pages; un-compressed would require 10 pages. NOTE: PROCEDURE SORT used (Total process time): real time 0.25 seconds cpu time 0.03 seconds NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH2. NOTE: The data set WORK.DEC_MATCH2 has 6145029 observations and 9 variables. NOTE: Compressing data set WORK.DEC_MATCH2 decreased size by 71.53 percent. Compressed is 34306 pages; un-compressed would require 120491 pages. NOTE: PROCEDURE SORT used (Total process time): real time 1:17.37 cpu time 2:02.37 NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH2. NOTE: There were 446 observations read from the data set WORK.PGM. NOTE: The data set WORK.DEC_MATCH3 has 6145029 observations and 10 variables. NOTE: Compressing data set WORK.DEC_MATCH3 decreased size by 72.06 percent. Compressed is 26016 pages; un-compressed would require 93107 pages. NOTE: DATA statement used (Total process time): real time 59.00 seconds cpu time 58.97 seconds

  11. Sort / Merge Code proc sort data= oms_prod.disposition_category out=disp_cat(rename=(description=disp_desc)) ; by disposition_category_code ; run ; proc sort data=dec_match3 ; by disposition_category_code ; run ; data dec_match4 ; merge dec_match3 (in=a) disp_cat(keep=disposition_category_code disp_desc in=b) ; by disposition_category_code ; if a and b ; run ;

  12. Log For Sort / Merge NOTE: Sorting was performed by the data source. NOTE: There were 6 observations read from the data set OMS_PROD.DISPOSITION_CATEGORY. NOTE: The data set WORK.DISP_CAT has 6 observations and 2 variables. NOTE: Compressing data set WORK.DISP_CAT increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: PROCEDURE SORT used (Total process time): real time 0.03 seconds cpu time 0.02 seconds NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH3. NOTE: The data set WORK.DEC_MATCH3 has 6145029 observations and 10 variables. NOTE: Compressing data set WORK.DEC_MATCH3 decreased size by 72.06 percent. Compressed is 26017 pages; un-compressed would require 93107 pages. NOTE: PROCEDURE SORT used (Total process time): real time 1:26.08 cpu time 2:14.65 NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH3. NOTE: There were 6 observations read from the data set WORK.DISP_CAT. NOTE: The data set WORK.DEC_MATCH4 has 6145029 observations and 11 variables. NOTE: Compressing data set WORK.DEC_MATCH4 decreased size by 71.05 percent. Compressed is 31209 pages; un-compressed would require 107808 pages. NOTE: DATA statement used (Total process time): real time 1:03.35 cpu time 1:03.28

  13. HASH code data dec_match ; if _n_ = 1 then do ; IF 0 then set oms_prod.disposition(keep=disposition_id description touched disposition_category_code ) ; declare hash ht(dataset: "oms_prod.disposition") ; ht.defineKEY("disposition_id ") ; ht.defineData("disposition_id ", "description “ “touched","disposition_category_code") ; ht.defineDone() ; end ; set dec_offers ; if ht.find()=0 ; run ; No sorting !!

  14. HASH Log NOTE: There were 97 observations read from the data set OMS_PROD.DISPOSITION. NOTE: There were 6145029 observations read from the data set WORK.DEC_OFFERS. NOTE: The data set WORK.DEC_MATCH has 6145029 observations and 7 variables. NOTE: Compressing data set WORK.DEC_MATCH decreased size by 74.94 percent. Compressed is 27499 pages; un-compressed would require 109733 pages. NOTE: DATA statement used (Total process time): real time 48.38 seconds cpu time 48.14 seconds

  15. HASH Code data dec_match2 ; if _n_ = 1 then do ; IF 0 then set oms_prod.campaign(keep=campaign_id program_id campaign_code description ) ; declare hash ht(dataset: "oms_prod.campaign") ; ht.defineKEY("campaign_id") ; ht.defineData("campaign_id", "program_id", "campaign_code", "description") ; ht.defineDone() ; end ; set dec_match ; if ht.find()=0 ; run ;

  16. HASH Log NOTE: There were 11346 observations read from the data set OMS_PROD.CAMPAIGN. NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH. NOTE: The data set WORK.DEC_MATCH2 has 6145029 observations and 9 variables. NOTE: Compressing data set WORK.DEC_MATCH2 decreased size by 38.33 percent. Compressed is 39071 pages; un-compressed would require 63352 pages. NOTE: DATA statement used (Total process time): real time 55.35 seconds cpu time 55.21 seconds

  17. HASH Code data dec_match3; if _n_ = 1 then do; IF 0 then set oms_prod.program(keep=program_id name ); declare hash ht(dataset: "oms_prod.program"); ht.defineKEY("program_id"); ht.defineData("program_id", "name"); ht.defineDone(); end; set dec_match2; if ht.find()=0; run;

  18. HASH Log NOTE: There were 446 observations read from the data set OMS_PROD.PROGRAM. NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH2. NOTE: The data set WORK.DEC_MATCH3 has 6145029 observations and 10 variables. NOTE: Compressing data set WORK.DEC_MATCH3 decreased size by 48.53 percent. Compressed is 43928 pages; un-compressed would require 85348 pages. NOTE: DATA statement used (Total process time): real time 1:00.38 cpu time 1:00.17

  19. HASH Code data disposition_category (rename=(description=disp_desc)); set oms_prod.disposition_category; run; data dec_match4; if _n_ = 1 then do; IF 0 then set disposition_category(keep=disposition_category_code disp_desc); declare hash ht(dataset: "disposition_category"); ht.defineKEY("disposition_category_code"); ht.defineData("disposition_category_code", "disp_desc"); ht.defineDone(); end; set dec_match3; if ht.find()=0; run;

  20. HASH Log NOTE: There were 6 observations read from the data set OMS_PROD.DISPOSITION_CATEGORY. NOTE: The data set WORK.DISPOSITION_CATEGORY has 6 observations and 2 variables. NOTE: Compressing data set WORK.DISPOSITION_CATEGORY increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.01 seconds NOTE: There were 6 observations read from the data set WORK.DISPOSITION_CATEGORY. NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH3. NOTE: The data set WORK.DEC_MATCH4 has 6145029 observations and 11 variables. NOTE: Compressing data set WORK.DEC_MATCH4 decreased size by 49.47 percent. Compressed is 51750 pages; un-compressed would require 102418 pages. NOTE: DATA statement used (Total process time): real time 1:02.45 cpu time 1:02.30

  21. Comparison Of Processing Time

  22. Conclusion When looking for efficiencies HASH objects are definitely worth considering. In larger programs, HASH objects can save valuable processing time.

  23. References Linda Jolley – Using Table Lookup Techniques Efficiently Jason Secosky – The DATA Step In Version 9: What’s New? Paul Dorfman- DATA Step HASH Objects As Programming Tools

  24. Contact Information Denise A. Kruse SAS Contractor DeniseAKruse@gmail.com

More Related