390 likes | 532 Views
High-Performance Calculations. Simple tricks to make some Tableau calculations execute hundreds of times faster. PRESENTED BY. Overview. Why do we need fast calculations? Real-life examples: Computing dates from Unix time stamps Computing dates from “yyyymmdd” columns
E N D
High-Performance Calculations Simple tricks to make some Tableau calculations execute hundreds of times faster PRESENTED BY
Overview • Why do we need fast calculations? • Real-life examples: • Computing dates from Unix time stamps • Computing dates from “yyyymmdd” columns • Displaying numbers as text in a viz • Computing a nested set (combined field) • Coming attractions • How version 8 speeds up some common calculations
Your organisation can respond faster You can stay in a “flow” state longer Overview: The Need for Speed
All calculations run against 1 billion rows Amplifies differences to human scale Range is from 6 seconds to 5 hours Overview: One…Billion…Rows!
Computers Compute! A case study in calculation performance
Customer Task: Convert a column of Unix timestamps to dates Timestamps are 64-bit integers Contain the number of milliseconds since 1970-01-01 Need to convert timestamps to dates for analysis Human-style units like years, months, days (“binning”) Unix Times: The Problem
Meaning: Convert the number to a string Take the left 10 characters Change that to an integer (10s of seconds) Divide by 8640 to get days Add to the “zero date” Computing one billion values takes 3 hours and 45 minutes! Version 8 still takes about 30 minutes. DATE("1/1/1970") + INT( INT( LEFT( STR( [unix]),10 ) ) / 8640 ) Unix Times: Original Version
Meaning: Convert the number to seconds by dividing Add those seconds to the zero date Remove the time part Computing one billion values takes 45 seconds That is 13,000x faster than version 7! Still 40x faster than version 8. DATE( DATEADD( 'seconds', INT([unix] / 1000), #1970-01-01# ) ) Unix Times: Numeric Version
Need to look at each character Need to figure out how many characters there are Need to find space for the answer Need to copy each character …and so on Often takes 10-100 instructions per value Unix Times: Strings are Slow
Computers are good at arithmetic They “compute”! Many arithmetic operations take only one instruction 2.66 GHz processor = 2.66 billion instructions / second The more arithmetic you use, the faster they will be Unix Times: Numbers are Fast
These numbers are for the Tableau Data Engine Should also work for most analytic databases: Vertica, ParAccel, VectorWise, etc. May not help on slow databases MySQL, Text Files, Excel If you extract, it should help Unix Times: When Does it Help?
Creating Dates Use date arithmetic instead of string parsing
Customer Task: Convert a column of numbers to dates The numbers are in the form yyyymmdd Need to convert them to dates for analysis Time series Binning (weeks, quarters) Creating Dates: The Problem
Answer taken from an old in-house training manual Meaning: Build a string in “mm/dd/yyyy” format Cast it to a date Problems: One billion values takes 5 hours Only works in the U.S. DATE( MID( STR( [yyyymmdd] ), 4, 2 ) + “/” + RIGHT( STR( [yyyymmdd] ), 2 ) + “/” + LEFT( STR( [yyyymmdd] ), 4 ) ) Creating Dates: U.S.A. Strings
Meaning: Build a string in “yyyy-mm-dd” format Cast it to a date Works in any country Data engine tries this format first One billion values still takes 5 hours DATE( LEFT( STR( [yyyymmdd] ), 4 ) + “-” + MID( STR( [yyyymmdd] ), 4, 2 ) + “-” + RIGHT( STR( [yyyymmdd] ), 2 ) ) Creating Dates: ISO Strings
Meaning: Get date parts using division (/) and remainder (%) Use date arithmetic to add the parts to a date constant Division gives real numbers, INT fixes this. One billion values takes 64 seconds 280x faster This is the difference between stretching your legs and coming back in the morning! DATEADD( 'day', [yyyymmdd] % 100 - 1, DATEADD( 'month', INT( ( [yyyymmdd] % 10000 ) / 100 ) - 1, DATEADD( 'year', INT( [yyyymmdd] / 10000 ) - 1900, #1900-01-01# ) ) ) Creating Dates: Date Arithmetic
The original calculation has four concatenations Each one needs different amounts of memory Each one needs to copy the characters …and so on Changing a number to a string has similar problems Reading dates from text is tricky What country are we in? 5/3/1983 Creating Dates: Strings are Slow
Numbers all have the same size Copying numbers is fast Date arithmetic is still arithmetic Not as simple as addition BUT Still only a few instructions Computers are good at arithmetic! Creating Dates: Numbers are Fast
LOG How many digits are there? ABS / SIGN Remove / extract the sign of the number MIN / MAX Smallest / largest of two values Creating Dates: Useful Numeric Functions
Presenting Numbers Move display formatting out of your calculations
Customer Task: Data only has the day of the quarter User wants to group data by the week of a quarter Weeks should be labeled nicely Need a calculation to convert the day to the week Need to format it for display Presenting Numbers: The Problem
Meaning: Check all 14 possible ranges one after another Label out of bounds values as “Other” Lots of typing means lots of mistakes A billion rows takes several minutes ~7 minutes in 7.0 ~4 minutes in 8.0 IF [Day Of Quarter] < 7 THEN "Week #1" ELSEIF [Day Of Quarter] < 14 THEN "Week #2" … ELSEIF [Day Of Quarter] < 91THEN "Week #13" ELSE "Other" END Presenting Numbers: If Then Else
Solution 1: Use aliases Rewrite calculation to return numbers Create aliases for the values: “Week #1” etc. Only takes 36s on a billion rows 12x faster than 7.0 6x faster in 8.0 Problems: Typing aliases is still error prone Dialogue is slow because Tableau must find all the values INT( [DayOfQuarter] /7 ) + 1 Presenting Numbers: Aliases
Solution 2: Use column formatting Rewrite the calculation to return numbers Apply number formatting to the column: “Week #”0 Still only takes 36s on a billion rows Viz updates live as you edit! Much easier to correct mistakes Formatting editor doesn’t need to run queries INT( [DayOfQuarter] /7 ) + 1 Presenting Numbers: Formatting
Databases can format output for historical reasons Remember teletypes? Line printers? Database formatting has to be done on every row Grouping by string calculations can be much slower than grouping by numbers Need to compare entire strings instead of string identifiers Presenting Numbers: Strings are Slow
Grouping by numbers only compares numbers 10x-100x faster than strings Grouping reduces the number of rows returned Aliases and formatting are applied after the query Changing the formatting in Tableau does not run queries Presenting Numbers: Presentation is Fast
Combined Fields Use Sets instead of concatenated strings
Customer Task: User wants to create a multi-column set from two or more string columns The user may want to change the column separator Combined Fields: The Problem
Taken from in-house training as an alternative to nested fields (called “combined fields” in version 8) Meaning: Concatenate two strings together Use a separator string that can be changed Problems: One billion rows takes almost 9 minutes Changing the separator requires re-running the query [Month] + “, “ + [Weekday] Combined Fields: Concatenation
Using “Combine Fields…” menu item, create a field that shows both fields with a user specified separator “Create Set…” in v7 Changing the separator does not run a new query Performance is extremely fast 6 seconds on one billion rows 90x faster Combined Fields: Set
String concatenation is very hard to make fast Must build all the combinations from every row Grouping by calculated strings is slow Calculations don’t have string identifiers Combined Fields: Strings are Slow
Unmodified string columns are really numbers One number per unique string Grouping by them is like grouping by numbers Tableau formats combined fields after the query Changing the formatting doesn’t run another query Combined Fields: Numbers are Fast
Coming Attractions Some things we have made faster in the version 8 data engine
Version 7 evaluated both sides of IFs Computed the ELSE side even when true Computed the THEN side even when false Especially bad when Tableau nested many of them Fixed! Coming Attractions: If/Then/Else
Version 7 did not have a CASE statement Made us build huge if/then/else statements If the nesting was deep enough, we would crash Fixed! Also computes only the outputs it needs THEN “string” much faster too Coming Attractions: Case
Version 7 strings were computed one at a time Version 8, many functions have been “chunked” Compute 1000 values at a time Converting to/from strings is much faster in Version 8 Coming Attractions: String Functions
Version 7 computed values on only one processor Version 8 tries to spread calculations across processors If you have 4 cores, calculations can be 4x faster Coming Attractions: Parallel Execution
Version 7 could not edit the column order or the separator Version 8 lets you edit the column order and the separator Coming Attractions: Combined Fields