300 likes | 676 Views
New NUMA Support with Windows Server 2008 R2 and Windows 7. Phil Pennington philpenn@microsoft.com Microsoft WSV317. What will you look for? Overall Solution Scalability. Agenda Windows Server 2008 R2. New NUMA APIs New User-Mode Scheduling APIs New C++ Concurrency Runtime.
E N D
New NUMA Support with Windows Server 2008 R2 and Windows 7 Phil Pennington philpenn@microsoft.com Microsoft WSV317
AgendaWindows Server 2008 R2 • New NUMA APIs • New User-Mode Scheduling APIs • New C++ Concurrency Runtime
Example NUMA Hardware Today A 256 Logical Processor System – HP SuperDomeA 64 Logical Processor System - Unisys ES7000 64 dual-core hyper-threaded “Montvale” 1.6 GHz Itanium2 32 dual-core hyper-threaded “Tulsa” 3.4 GHz Xeon
Expectsystemswith 128-256 logical processors NUMA Hardware Tommorrow2, 4, 8 Cores-per-Socket "Commodity" CPU Architectures Nehalem Nehalem I/O Hub I/O Hub Nehalem Nehalem PCI Express* PCI Express*
NUMA Node GroupsNew with Win7 and R2 GROUP NUMA NODE Socket Socket Core Core LP LP LP LP Core Core NUMA NODE
NUMA Node GroupsExample: 2 Groups, 4 Nodes, 8 Sockets, 32 Cores, 4 LPs/Core = 128 LPs Group Group NUMA Node NUMA Node Socket Socket Socket Socket NUMA Node NUMA Node Socket Socket Socket Socket Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP
Sample SQL Server Scaling64P To 128P 1.7X 1.3X 64P 128P • Windows Server Performance team sample test lab results
(3) Node Interconnect (1) (4) DiskA MemB Bad Case Disk Write Software and Hardware Locality NOT Optimal Locked out for I/O Initiation Locked out for I/O Initiation (6) (2) ISR DPC (7) I/O Initiator P1 P2 P3 P4 (0) Cache4 Cache2 Cache3 Cache1 (5) Cache(s) I/O Buffer Home DiskB MemA
Node Interconnect DiskA MemB (3) Windows Server 2008 R2Optimization for NUMA Topology I/O Initiator (3) ISR DPC ISR P1 P2 P3 P4 (2) Cache1 Cache2 Cache3 Cache4 (2) Cache(s) DiskB MemA
NUMA Aware ApplicationsNon-Uniform Memory Architecture • Minimize Contention, Maximize Locality • Apps scaling beyond even 8-16 logical processors should be NUMA aware • A process or thread can set a preferred NUMA node • Use the Node Group scheme for Task or Process partitioning • Performance-optimize within Node Groups
demo NUMA API's “Minimize Contention and Maximize Locality”
AgendaWindows Server 2008 R2 • New NUMA APIs • New User-Mode Scheduling APIs • New C++ Concurrency Runtime
User Mode Scheduling (UMS)System Call Servicing Primary Threads UMS KT (Backing threads) Core 1 Core 2 KT(P1) KT(P2) KT(1) KT(2) KT(3) KT(4) Wake primary to regain core syscall Migrate request to appropriate KT Blocked Parked Parked Parked Parked Running Kernel Kernel UT(P1) UT(P2) User User UMS completion list UT(1) UT(2) UT(3) UT(4) USched ready list
User Mode Context Switch • Benefit • Lower context switch time means scheduling finer-grained items • UMS-based yield: 370 cycles • Signal-and-wait: 2600 cycles • Direct impact • synchronization-heavy fine-grained work speeds up • Indirect impact • finer grains means more workloads are candidates for parallelization
Getting the Processor Back • Benefit • The scheduler keeps control of the processor when work blocks in the kernel • Direct impact • More deterministic scheduling and better use of a thread’s quantum • Indirect impact • Better cache locality when algorithmic libraries take advantage of the determinism to manage available resources
AgendaWindows Server 2008 R2 • New NUMA APIs • New User-Mode Scheduling • New C++ Concurrency Runtime
Visual Studio 2010Tools, Programming Models, Runtimes Tools Programming models PLINQ Parallel Pattern library Agents library Parallel Debugger Task Parallel library Data structures Data structures Profiler and concurrency analyzer Concurrency runtime Task scheduler Thread pool Task scheduler Resource manager Resource manager Operating system Threads/UMS Key: Managedlibrary Nativelibrary Tools
Task Scheduling • Tasks are run by worker threads, which the scheduler controls Dead Zone WT0 WT1 WT2 WT3 Without UMS (signal-and-wait) WT0 WT1 WT2 WT3 With UMS (UMS yield)
demo User-Mode Scheduling API's and the C++ Concurrency Runtime “Cooperative Thread-Scheduling”
SummaryCall-to-action • Consider how your solution will scale on NUMA systems • Utilize the NUMA API’s to Maximize Node Locality • Leverage UMS for custom user-mode thread scheduling • Use the C++ Concurrency Runtime for most native Parallel Computing scenarios and gain benefits of NUMA/UMS implicitly
Resources • MSDN Concurrency Dev-Center • http://msdn.microsoft.com/concurrency • MSDN Channel9 • http://channel9.msdn.com/tags/w2k8r2 • MSDN Code Gallery • http://code.msdn.microsoft.com/w2k8r2 • MSDN Server Dev Center • http://msdn.microsoft.com/en-us/windowsserver • 64+ LP and NUMA API Support • http://code.msdn.microsoft.com/64plusLP • http://www.microsoft.com/whdc/system/Sysinternals/MoreThan64proc.mspx • Dev-Team Blogs • http://blogs.msdn.com/pfxteam • http://blogs.technet.com/winserverperformance
Required Slide Speakers, TechEd 2009 is not producing a DVD. Please announce that attendees can access session recordings at TechEd Online. Resources • www.microsoft.com/teched Sessions On-Demand & Community • www.microsoft.com/learning • Microsoft Certification & Training Resources • http://microsoft.com/technet • Resources for IT Professionals • http://microsoft.com/msdn Resources for Developers www.microsoft.com/learning Microsoft Certification and Training Resources
Required Slide Speakers, please list the Breakout Sessions, TLC Interactive Theaters and Labs that are related to your session. Related Content DTL203 "The Manycore Shift: Making Parallel Computing Mainstream" Monday 5/11, 2:45-4:00, Room 404, Stephen Toub DTL310 Parallel Computing with Native C++ in Microsoft Visual Studio 2010 Friday 5/15, 2:45-4:00, Room 515A, Josh Phillips DTL403 "Microsoft Visual C++ Library, Language, and IDE : Now and Next" Thursday 5/14, 4:30-5:45, Room 408A, Kate Gregory DTL06-INT "Task-Based Parallel Programming with the Microsoft .NET Framework 4" Thursday 5/14, 1:00-2:15, Blue Thr 2, Stephen Toub
Required Slide Track PMs will supply the content for this slide, which will be inserted during the final scrub. Windows Server Resources Make sure you pick up your copy of Windows Server 2008 R2 RC from the Materials Distribution Counter Learn More about Windows Server 2008 R2: www.microsoft.com/WindowsServer2008R2 Technical Learning Center (Orange Section): Highlighting Windows Server 2008 and R2 technologies Over 15 booths and experts from Microsoft and our partners
Required Slide Complete an evaluation on CommNet and enter to win!
Required Slide © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.