280 likes | 507 Views
rand() Considered Harmful. Stephan T. Lavavej (" Steh -fin Lah - wah -wade") Senior Developer - Visual C++ Libraries stl@microsoft.com. What's Wrong With This Code?. # include < stdio.h > #include < stdlib.h > #include < time.h > int main() { srand (time(NULL));
E N D
Version 1.1 - September 5, 2013 rand() Considered Harmful Stephan T. Lavavej ("Steh-fin Lah-wah-wade") Senior Developer - Visual C++ Libraries stl@microsoft.com
What's Wrong With This Code? #include <stdio.h> #include <stdlib.h> #include <time.h> int main() { srand(time(NULL)); for (inti = 0; i < 16; ++i) { printf("%d ", rand() % 100); } printf("\n"); }
What's Right With This Code? All required headers are included! #include <stdio.h> #include <stdlib.h> #include <time.h> int main() { srand(time(NULL)); for (inti = 0; i < 16; ++i) { printf("%d ", rand() % 100); } printf("\n"); } All included headers are required! Headers are sorted! One True Brace Style! %d is correct for int! Unnecessary argc, argv, return 0; omitted!
What's Wrong With This Code? #include <stdio.h> #include <stdlib.h> #include <time.h> int main() { srand(time(NULL)); for (inti = 0; i < 16; ++i) { printf("%d ", rand() % 100); } printf("\n"); }
What's Wrong With This Code? ABOMINATION! #include <stdio.h> #include <stdlib.h> #include <time.h> int main() { srand(time(NULL)); for (inti = 0; i < 16; ++i) { printf("%d ", rand() % 100); } printf("\n"); }
What's Wrong With This Code? ABOMINATION! #include <stdio.h> #include <stdlib.h> #include <time.h> int main() { srand(time(NULL)); for (inti = 0; i < 16; ++i) { printf("%d ", rand() % 100); } printf("\n"); } Frequency: 1 Hz!
What's Wrong With This Code? ABOMINATION! warning C4244: 'argument' : conversion from 'time_t' to 'unsigned int', possible loss of data #include <stdio.h> #include <stdlib.h> #include <time.h> int main() { srand(time(NULL)); for (inti = 0; i < 16; ++i) { printf("%d ", rand() % 100); } printf("\n"); } Frequency: 1 Hz! 32-bit seed!
What's Wrong With This Code? ABOMINATION! warning C4244: 'argument' : conversion from 'time_t' to 'unsigned int', possible loss of data #include <stdio.h> #include <stdlib.h> #include <time.h> int main() { srand(time(NULL)); for (inti = 0; i < 16; ++i) { printf("%d ", rand() % 100); } printf("\n"); } Frequency: 1 Hz! 32-bit seed! Range: [0, 32767] Linear congruential low quality!
What's Wrong With This Code? ABOMINATION! warning C4244: 'argument' : conversion from 'time_t' to 'unsigned int', possible loss of data #include <stdio.h> #include <stdlib.h> #include <time.h> int main() { srand(time(NULL)); for (inti = 0; i < 16; ++i) { printf("%d ", rand()% 100); } printf("\n"); } Frequency: 1 Hz! 32-bit seed! Non-uniform distribution! Range: [0, 32767] Linear congruential low quality!
Modulo Non-Uniform Distribution intsrc = rand(); // Assume uniform [0, 32767] intdst = src % 100; // Non-uniform [0, 99] // [0, 99] src [0, 99] dst // [100, 199] src [0, 99] dst // ... // [32700, 32767] src [0, 67] dst • This is modulo's fault, not rand()'s • Trigger: input range isn't exact multiple of output range
Floating-Point Treachery intsrc = rand(); // Assume uniform [0, 32767] intdst = static_cast<int>( // As seen on (src * 1.0 / RAND_MAX) * 99 // StackOverflow ); // Hilariously non-uniform [0, 99] • Only one input produces the output 99: static_cast<int>((32765* 1.0 / 32767) * 99) == 98 static_cast<int>((32766 * 1.0 / 32767) * 99) == 98 static_cast<int>((32767 * 1.0 / 32767) * 99) == 99
Floating-Point Double Treachery intsrc = rand(); // Assume uniform [0, 32767] intdst = static_cast<int>( (src * 1.0 / (RAND_MAX + 1)) * 100 ); // Subtly non-uniform [0, 99] • Less likely outputs (327/32768 vs. 328/32768): 3, 6, 9, 12, 15, 18, 21, 24, 28, 31, 34, 37, 40, 43, 46, 49, 53, 56, 59, 62, 65, 68, 71, 74, 78, 81, 84, 87, 90, 93, 96, 99 • Same problem as src % 100 • Nothingcan uniformly map 32768 inputs to 100 outputs
Floating-Point Triple Treachery • What if the input is [0, 232) or [0, 264)? • Non-uniformity is reduced, but not eliminated, when the input is much larger than the output • What if IEEE runs out of bits? • Example: [0, 264) input [0, 1018 ≈ 259.8) output • double has only 53 bits of significand precision • Say you have a problem, so you use floating-point • Now you have 2.000001 problems • DO NOT MESS WITH FLOATING-POINT
<random> URNGs(Uniform Random Number Generators) • Engine templates: • linear_congruential_engine • mersenne_twister_engine • subtract_with_carry_engine • Engine adaptor templates: • discard_block_engine • independent_bits_engine • shuffle_order_engine • Non-deterministic: • random_device • Engine (adaptor)typedefs: • minstd_rand0 • minstd_rand • mt19937 • mt19937_64 • ranlux24_base • ranlux48_base • ranlux24 • ranlux48 • knuth_b • default_random_engine
<random> Distributions • Uniform distributions • uniform_int_distribution • uniform_real_distribution • Poisson distributions • poisson_distribution • exponential_distribution • gamma_distribution • weibull_distribution • extreme_value_distribution • Sampling distributions • discrete_distribution • piecewise_constant_distribution • piecewise_linear_distribution • Bernoulli distributions • bernoulli_distribution • binomial_distribution • geometric_distribution • negative_binomial_distribution • Normal distributions • normal_distribution • lognormal_distribution • chi_squared_distribution • cauchy_distribution • fisher_f_distribution • student_t_distribution
Hello, "Random" World! #include <iostream> #include <random> int main() { std::mt19937 mt(1729); std::uniform_int_distribution<int> dist(0, 99); for (inti = 0; i < 16; ++i) { std::cout << dist(mt) << " "; } std::cout << std::endl; }
Hello, "Random" World! #include <iostream> #include <random> int main() { std::mt19937 mt(1729); std::uniform_int_distribution<int> dist(0, 99); for (inti = 0; i < 16; ++i) { std::cout << dist(mt) << " "; } std::cout << std::endl; } Deterministic 32-bit seed
Hello, "Random" World! #include <iostream> #include <random> int main() { std::mt19937mt(1729); std::uniform_int_distribution<int> dist(0, 99); for (inti = 0; i < 16; ++i) { std::cout << dist(mt) << " "; } std::cout << std::endl; } Engine: [0, 232) Deterministic 32-bit seed
Hello, "Random" World! #include <iostream> #include <random> int main() { std::mt19937mt(1729); std::uniform_int_distribution<int>dist(0, 99); for (inti = 0; i < 16; ++i) { std::cout << dist(mt) << " "; } std::cout << std::endl; } Engine: [0, 232) Deterministic 32-bit seed Distribution: [0, 99]
Hello, "Random" World! #include <iostream> #include <random> int main() { std::mt19937mt(1729); std::uniform_int_distribution<int>dist(0, 99); for (inti = 0; i < 16; ++i) { std::cout << dist(mt) << " "; } std::cout << std::endl; } Engine: [0, 232) Deterministic 32-bit seed Distribution: [0, 99] Note: [inclusive, inclusive]
Hello, "Random" World! #include <iostream> #include <random> int main() { std::mt19937mt(1729); std::uniform_int_distribution<int>dist(0, 99); for (inti = 0; i < 16; ++i) { std::cout << dist(mt) << " "; } std::cout << std::endl; } Engine: [0, 232) Deterministic 32-bit seed Distribution: [0, 99] Run engine, viewed through distribution Note: [inclusive, inclusive]
Hello, Random World! #include <iostream> #include <random> int main() { std::random_devicerd; std::mt19937 mt(rd()); std::uniform_int_distribution<int> dist(0, 99); for (inti = 0; i < 16; ++i) { std::cout << dist(mt) << " "; } std::cout << std::endl; } Non-deterministic 32-bit seed
mt19937 vs. random_device • mt19937 is: • Fast (499 MB/s = 6.5 cycles/byte for me) • Extremely high quality, but not cryptographically secure • Seedable (with more than 32 bits if you want) • Reproducible (Standard-mandated algorithm) • random_device is: • Possibly slow (1.93 MB/s = 1683 cycles/byte for me) • Strongly platform-dependent (GCC 4.8 can use IVB RDRAND) • Possibly crypto-secure (check documentation, true for VC) • Non-seedable, non-reproducible
uniform_int_distribution • Takes any Uniform Random Number Generator • Usually [0, 232) or [0, 264) but [1701, 1729] works • If your URNG does that, you are bad and you should feel bad • Emits any desired range of integers [low, high] • signed/unsignedshort/int/long/long long • Why not char/signed char/unsigned char? Standard Says SoTM • Preserves perfect uniformity • Requires obsessive implementers • Uses bitwise/etc. magic, invokes URNG repeatedly (rare) • Runs fairly quickly (34% raw speed for me) • Deterministic, but not invariant • Will vary across platforms, may vary across versions
random_shuffle() Considered Harmful template <typenameRanIt> void random_shuffle(RanIt f, RanIt l); • May call rand() • C++ Standard Library, I trusted you! template <typenameRanIt, typename RNG> void random_shuffle(RanIt f, RanIt l, RNG&& r); • Not evil, but highly inconvenient • Knuth shuffle needs r(n) to return [0, n)
shuffle() Considered Awesome template <typenameRanIt, typename URNG> void shuffle(RanIt f, RanIt l, URNG&& g); • Takes URNGs directly (e.g. mt19937) • Shuffles perfectly • All permutations are equally likely • Invokes the URNG in-place (can't copy) • Other algorithms can copy functors, like generate() • Special exception: for_each() moves functors
Random <random> Notes • Running mt19937 is fast, constructing/copying isn't • Constructing/copying engines often is already undesirable • URNG/distribution function call ops are non-const • Multiple threads cannot simultaneously call a single object • When is it safe to skip uniform_int_distribution? • mt19937's [0, 232) or mt19937_64's [0, 264)[0, 2N) • In this case, masking is safe, simple, and efficient • In all other cases, use uniform_int_distribution