300 likes | 561 Views
STL11: Magic && Secrets. Stephan T. Lavavej (" Steh -fin Lah - wah -wade") Visual C++ Libraries Developer stl@microsoft.com. Magic. Malevolent Architecture. Space How much RAM does your phone/tablet/console have? How much RAM does your server have?
E N D
STL11:Magic && Secrets Stephan T. Lavavej ("Steh-fin Lah-wah-wade") Visual C++ Libraries Developer stl@microsoft.com Version 1.0 - February 2, 2012
Magic Version 1.0 - February 2, 2012
Malevolent Architecture • Space • How much RAM does your phone/tablet/console have? • How much RAM does your server have? • How many users does it handle? Is it virtualized? • How much L1/L2/L3 cache do you have? • Time • How many cores do you have? How fast is each one? • How many milliseconds do you have at 60 FPS? • Abstraction • How big are the problems you solve? Are they getting bigger? • How smart are you? Are you getting smarter? • How many programmers/end-users use the STL? • All the users, all of them Version 1.0 - February 2, 2012
Visible Invisibility // shared_ptr, how do I construct thee? Let me count the ways: shared_ptr<T> sp1(new T(args)); shared_ptr<T> sp2(new T(args), del); shared_ptr<T> sp3(new T(args), del, alloc); auto sp4 = make_shared<T>(args); auto sp5 = allocate_shared<T>(alloc, args); // Plus other "non-fundamental" ways • Where did the deleter/allocator types go? • vector<T, MyAlloc<T>> versus shared_ptr<T> • Magic: type erasure! • Less visibly, but more importantly, also powering make_shared<T>() Version 1.0 - February 2, 2012
Exposition Diagram _Uses: 7 shared_ptr<int> int _Weaks: 4 1729 _Ptr _Ptr _Dtor _Rep _Ref_count_del <_Ty, _Dx> _Uses: 7 _Ref_count_base _Uses: 7 _Weaks: 4 _Weaks: 4 _Ptr _Ptr _Ref_count<_Ty> _Dtor Type erasure: _Ref_count_base * _Rep points to _Ref_count<_Ty>, etc. _Myal _Ref_count_del_alloc <_Ty, _Dx, _Alloc> Version 1.0 - February 2, 2012
A Wizard Did It shared_ptr<int> _Ptr _Rep _Uses: 7 _Ref_count<_Ty> _Weaks: 4 _Uses: 7 _Ref_count_base _Storage: 1729 _Weaks: 4 _Ptr _Uses: 7 _Ref_count_obj<_Ty> _Weaks: 4 No _Ptr! We know where you live. _Storage: 1729 _Myal _Ref_count_obj_alloc <_Ty, _Alloc> Version 1.0 - February 2, 2012
When Things Spin, Science Happens • Actual measurements on Win7 • Vista+: Low Fragmentation Heap automatically enabled • Including allocator overhead • Excluding shared_ptr itself (x86: 8 bytes; x64: 16) • Original recipe shared_ptr<int> • x86: 40 bytes (x64: 48) – Boost 1.48.0, GCC 4.6.1, VC10 SP1 • Extra crispy make_shared<int>() • x86: 32bytes (x64: 48, unchanged) – Boost 1.48.0, GCC 4.6.1 • x86: 24 bytes (x64: 32) – VC10 SP1 • Reducing 2 dynamic memory allocations to 1 is good • Saving space is also good • Two great tastes that taste great together Version 1.0 - February 2, 2012
Take A Third Option • Why is copying slower than moving? • Copying does more work • Allocating memory • Copying elements (recursively!) • _InterlockedIncrement() • Is moving instantaneous? • Not quite; it does some work • Copying pointers • Nulling source pointers • Copying other Plain Old Data • Also, there are throwing move ctors (shock! horror!) • New option: emplacement Version 1.0 - February 2, 2012
Computers Are Fast vector<pair<string, int>> v; v.emplace_back("Carmichael", 3 * 11 * 17); v.emplace_back( "Hardy", 1 * 1 * 1 + 12 * 12 * 12); v.emplace_back( "Ramanujan", 9 * 9 * 9 + 10 * 10 * 10); // vector<T> member functions: void push_back(const T& t); void push_back(T&& t); template <typename... Args> void emplace_back(Args&&...args); iterator insert(const_iteratorpos, const T& t); iterator insert(const_iteratorpos, T&& t); template <typename... Args> iterator emplace(const_iteratorpos, Args&&...args); Version 1.0 - February 2, 2012
Nice Job Breaking It, Hero • N3035 (Feb. 16, 2010) 20.3.4.2 [pairs.pair]/4-5: pair(const T1& x, const T2& y); Effects: The constructor initializes first with x and second with y. template<class U, class V> pair(U&& x, V&& y); Effects: The constructor initializes first with std::forward<U>(x) and second with std::forward<V>(y). • VC10 SP1: (remember that NULL is 0) pair<X *, double> p1(0, 3.14); // BOOM error C2440: 'initializing' : cannot convert from 'int' to 'X *' Conversion from integral type to pointer type requires reinterpret_cast, C-style cast or function-style cast pair<X *, double> p2(nullptr, 3.14); // OK Version 1.0 - February 2, 2012
Set Right What Once Went Wrong • N3337 20.3.2 [pairs.pair]/5, 7-8: pair(const T1& x, const T2& y); Effects: The constructor initializes first with x and second with y. template<class U, class V> pair(U&& x, V&& y); Effects: The constructor initializes first with std::forward<U>(x) and second with std::forward<V>(y). Remarks: If U is not implicitly convertible to first_type or V is not implicitly convertible to second_typethis constructor shall not participate in overload resolution. • VC11: pair<X *, double> p1( 0, 3.14); // OK pair<X *, double> p2(nullptr, 3.14); // OK • nullptr is still better; down with 0/NULL • Consider make_shared<T>(args) SFINAE: a truly marvelous technique, which this slide is too small to explain Version 1.0 - February 2, 2012
Benevolent Architecture void meow(const pair<int, int>& p) { cout << "int: " << p.first << ", " << p.second << endl; } void meow(const pair<string, string>& p) { cout << "string: " << p.first << ", " << p.second << endl; } meow(make_pair(8191, 65537)); meow(make_pair("Mersenne", "Fermat")); • Invalid C++98/03! VC9 SP1, VC10 SP1: error C2668: 'meow' : ambiguous call to overloaded function • Valid C++11! VC11: int: 8191, 65537 string: Mersenne, Fermat Version 1.0 - February 2, 2012
Drop The Hammer • N3337 20.3.2 [pairs.pair]/11, 14: template<class U, class V> pair(const pair<U, V>& p); Remark: This constructor shall not participate in overload resolution unless const U& is implicitly convertible to first_type and const V& is implicitly convertible to second_type. template<class U, class V> pair(pair<U, V>&& p); Remark: This constructor shall not participate in overload resolution unless U is implicitly convertible to first_type and V is implicitly convertible to second_type. • const char * is not implicitly convertible to int • const char * is implicitly convertible to string • Not quite perfect – SFINAE is a blunt hammer • Consider Base *, Derived *, and MoreDerived * Version 1.0 - February 2, 2012
Secrets Version 1.0 - February 2, 2012
To The Future And Beyond • VC11 Beta/RTM: <condition_variable>, <future>, and <mutex> are powered by the Concurrency Runtime • ConcRT is highly efficient; read about it on MSDN • Creating lots of futures won't create lots of physical threads – just enough to saturate your user's cores • VC11 Beta/RTM: <atomic>, <condition_variable>, <future>, <mutex>, and <thread> all #error under /clr[:pure] • Same for ConcRT in VC10/VC11 Version 1.0 - February 2, 2012
Twenty Minutes Into The Future Move semantics string flip(string s) { reverse(s.begin(), s.end()); return s; } int main() { vector<future<string>> v; v.push_back(async([] { return flip( " ,olleH"); })); v.push_back(async([] { return flip(" evitaNgnioG"); })); v.push_back(async([] { return flip( "\n!2102"); })); for (auto& e : v) { cout << e.get(); } } future Right angle brackets Lambdas async auto Hello, GoingNative 2012! Range-based for-loop Version 1.0 - February 2, 2012
The Reveal • The range-based for-loop will be supported in... • ... VC11! • ... VC11 Beta! • Developer: Jonathan Caves • Tester: Stephan T. Lavavej • Fake tester, real tests • Minions: Ryan Molden, Ahmed Charles, Pavel Minaev • Reviewers: Ulzii Luvsanbat, Andy Rich • Intellisense: VC11 RTM • In Beta, you'll get red squiggles, but it'll compile Version 1.0 - February 2, 2012
All There In The Manual • Condensing N3337 6.5.4 [stmt.ranged]/1: for (for-range-declaration : expression) statement • Is equivalent to: { auto&& __range = (expression); for (auto __begin = begin-expr, __end = end-expr; __begin != __end; ++__begin) { for-range-declaration = *__begin; statement } } • braced-init-lists are allowed instead of expressions • But not in VC11 Beta/RTM __range, __begin, and __end are "for exposition only" Version 1.0 - February 2, 2012
Imported Alien Phlebotinum auto&& __range = (expression); • auto is powered by template argument deduction • auto&& behaves like T&& in perfect forwarding • auto&& binds to everything • auto&& becomes X&, const X&, X&&, or const X&& • __range is a named reference • Keeps temporaries alive • Doesn't copy anything • This works perfectly: vector<int> func(); for (int i : func()) { cout << i << endl; } Version 1.0 - February 2, 2012
Sealed Evil In A Can auto&& __range = (expression); for (auto __begin = begin-expr, __end = end-expr; • begin-expr and end-expr depend on the expression • For arrays of N elements: • __range and __range + N • For classes with begin/end members: • __range.begin() and __range.end() • For everything else: • begin(__range) and end(__range) • begin/end are found by Argument-Dependent Lookup Version 1.0 - February 2, 2012
Red Herring • For generic programming, <iterator> provides: template <typename C> auto begin( C& c) -> decltype(c.begin()); template <typenameC> auto begin(const C& c) -> decltype(c.begin()); template <typename C> auto end( C& c) -> decltype(c.end()); template <typename C> auto end(const C& c) -> decltype(c.end()); template <typename T, size_t N> T * begin(T (&array)[N]); template <typename T, size_t N> T * end(T (&array)[N]); • But range-for never uses them! • Range-for works with, but doesn't require, the STL • C++/CX: range-for requires <collection.h>, which provides begin()/end() for WFC::IVector<T>^, etc. Version 1.0 - February 2, 2012
Pop Quiz for-range-declaration= *__begin; • for (string s1 : v1) • s1 is a modifiable copy! • v1 can be modifiable/const • for (const string s2 : v2) • s2 is a constcopy! • v2 can be modifiable/const • for (string& s3 : v3) • Observe/modify s3in-place! • v3 must be modifiable • for (const string& s4 : v4) • Observe s4in-place! • v4 can be modifiable/const Version 1.0 - February 2, 2012
Invisible Subtle Difference for-range-declaration = *__begin; • for (auto e5: v5) • Same: e5 is a modifiable copy, v5 can be modifiable/const • e5 is modifiable even when v5 is const; auto drops constness • for (constauto e6: v6) • Same: e6 is a constcopy, v6 can be modifiable/const • for (auto& e7: v7) • Observe/modify e7in-place, if v7 is modifiable! • Observe e7in-place, if v7 is const! • e7 is const when v7 is const; auto& preserves constness • for (constauto& e8: v8) • Same: observe e8in-place, v8 can be modifiable/const • auto is powered by template argument deduction • C++98/03: foo(T5 t5) drops constness, bar(T7& t7) preserves it Version 1.0 - February 2, 2012
I Will Only Slow You Down vector<string> v; // Which one is faster? for (const auto& s1 : v) for (const string& s2 : v) // They're identical! map<string, int> m; // Which one is faster? for (constauto& p1: m) for (const pair<string, int>& p2: m) // const auto& p1 is faster! // m's value_typeis pair<const string, int> // Binding const pair<string, int>& to pair<const string, int> // constructs a temporary pair<string, int>, // which copies a string. Version 1.0 - February 2, 2012
If My Calculations Are Correct • 95% of the time, you should use: • for (auto& e : r) • for (const auto& e : r) • Why would you want to use anything else? • When you actually want a copy • for (auto e : r) // intentional copy • When you actually want a conversion • for (uint64_t x : r) // uint32_t => uint64_t • When proxies might be involved • for (auto&& e : r) // binds to everything • Examples: vector<bool> and <collection.h> Version 1.0 - February 2, 2012
That, Detective, Is The Right Question • N3337, the first post-C++11 Working Paper • open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdf • "The only changes since [C++11] are editorial." • C++11 Features In VC11 • blogs.msdn.com/b/vcblog/archive/2011/09/12/10209291.aspx • VC10's ConcRT documentation • msdn.microsoft.com/en-us/library/dd504870.aspx • My E-mail address • stl@microsoft.com Version 1.0 - February 2, 2012
Bonus Slides Version 1.0 - February 2, 2012
Color Coded For Your Convenience template <typename T> struct Vector { void Assign(size_t, const T&); template <typenameInIt> typenameenable_if<!is_integral<InIt>::value, void>::type Assign(InIt, InIt); Vector(size_t, const T&); template <typenameInIt> Vector(InIt, InIt, typenameenable_if<!is_integral<InIt>::value, void **>::type = nullptr); }; int * p = nullptr; Vector<int> v1(17, 29); v1.Assign(17, 29); Vector<int> v2(p, p); v2.Assign(p, p); Version 1.0 - February 2, 2012
Magical Incantation template <typename... Types> struct Tuple { explicit Tuple(const Types&...); template <typename... Other> explicit Tuple(Other&&..., typenameenable_if<COND, void **>::type = nullptr); // BAD }; • N3337 14.8.2.1 [temp.deduct.call]/1: "For a function parameter pack that does not occur at the end of the parameter-declaration-list, the type of the parameter pack is a non-deduced context." template <typename... Other, typenameDummy = typenameenable_if<COND, void>::type> explicit Tuple(Other&&...); // GOOD Version 1.0 - February 2, 2012
Did Not Do The Research • Core Issue 1442: "Normal ADL" or "pure ADL"? • N3337 6.5.4 [stmt.ranged]/1: "begin and end are looked up with argument-dependent lookup" • 3.4.2 [basic.lookup.argdep]/3: "Let X be the lookup set produced by unqualified lookup (3.4.1) and let Y be the lookup set produced by argument dependent lookup (defined as follows). If X contains • a declaration of a class member, or • a block-scope function declaration that is not a using-declaration, or • a declaration that is neither a function or a function template • then Y is empty. Otherwise Y is the set of declarations found in the namespaces associated with the argument types as described below. The set of declarations found by the lookup of the name is the union of X and Y." • Normal ADL: range-for would consider local using-directives, etc. • Pure ADL: range-for's ADL would still work within member functions of classes that have their own begin/end members • VC11 Beta: normal ADL; clang/EDG/GCC: pure ADL; VC11 RTM: ? Version 1.0 - February 2, 2012