310 likes | 508 Views
Windows XP Performance and Tuning: An Update. Demand Technology Software 1020 Eighth Avenue South, Suite 6, Naples, FL 34102 phone: (941) 261-8945 fax: (941) 261-5456 e-mail: markf@demandtech.com http://www.demandtech.com. Windows XP.
E N D
Windows XP Performance and Tuning: An Update Demand Technology Software 1020 Eighth Avenue South, Suite 6, Naples, FL 34102 phone: (941) 261-8945 fax: (941) 261-5456 e-mail: markf@demandtech.com http://www.demandtech.com
Windows XP • Designed to provide an upgrade path from Windows (9x, ME) to Windows NT • Conceived with usability, not performance, in mind • Unified desktop permits MS apps to exploit native NT technology • Multithreading, file cache, NTFS, etc. • Another new UI to get used to… • A minor, maintenance release of Windows NT Server (internally, version 5.1) • Synchronized with Windows XP – 64 bit support
Windows XP • No major changes in the way the OS works! • New Power Management-related Processor utilization Counters • Many incremental improvements: • Several changes designed to enhance scalability • Prefetching to speed-up program loading (including the boot process) • New Volume snapshot copy APIs for backing up Open files with integrity
Windows NT evolution: new in WinXP • Boot and image file prefetch: (An attempt to answer criticisms leveled at OS designers by Jeffrey Raskin in his influential book, The Humane Interface)
Windows NT evolution: new in WinXP • Volume shadow copy: IOCTL_VOLSNAP_FLUSH_AND_HOLD_WRITES, IOCTL_VOLSNAP_RELEASE_WRITES
Windows NT evolution: new in WinXP • New Networking tab in Taskman
Windows NT evolution: new in WinXP • Some new Counters:
Windows NT evolution: new in WinXP • New built-in security profiles • SYSTEM • LOCAL SERVICE • NETWORK SERVICE
Windows NT evolution: new in WinXP • Improved disk defragger
Windows XP – 64 bit support • Major new version of the OS to support new Intel 64-bit processors • P7 chips • 64-bit virtual addressing • Matches look-and-feel of Windows XP 32-bit desktop and applications
Intel 786 IA-64 architecture • EPIC: Explicitly Parallel Instruction Execution • Explicit parallelism • Predication • Speculation • Massive Resources • 10 GHz by 2010 • First generation Itanium chips: 800 MHz • .013 micron fabrication process • Second generation Itanium chips (McKinley): • Clocked at 1.2 GHz and higher • 400 MHz X 128 bit internal system bus
Intel 786 IA-64 architecture • Very difficult to compare performance of the P7 to the P6 • Significant architectural differences • New instruction set • Parallel programming model • Massive microprocessor designed for high-end applications • Currently, requires Intel compiler optimizations that exploit its major architectural features • Far superior Floating Point performance
Intel 786 IA-64 architecture • Parallel Execution Resources • 2 Memory Units • 2 Integer Units • 2 Floating Point Units • 3 Branch Units all designed to execute up to six separate instructions in parallel
Intel 786 IA-64 architecture • Massive Resources: extended Register set • 64-bit Instruction Pointer (IP) • 128 64-bit GPRs, plus an associated Not a Thing (NaT) bit • some GPRs have reserved meanings; • GR 0 is hardwired to always contains a Zero value • GR 1 is a global data pointer (gp) for the currently addressable global data segment • Register stacking functions for loop optimization • 128 82-bit Floating Point Registers • 128 64-bit dedicated Application Registers • e.g., 8 dedicated Kernel registers (AR0-AR7) • 64 1-bit Predicate Registers • 8 64-bit Branch Registers
Intel 786 IA-64 architecture • VLIW: Very Long Instruction Word • 16-byte Instruction Bundles (aligned on 16-byte boundaries) • 5-bit template, followed by • 3 41-bit instruction slots • Can be filled out with No Ops • Compiler optimization: • Match Instruction Bundles to Execution Resources • Instruction dispersal
Intel 786 IA-64 architecture • Memory Latency • Instructions executing in parallel all stall during memory waits
Intel 786 IA-64 architecture • Strategies to minimize memory latency • Instructions executing in parallel all stall during memory waits • Utilizes a Register stack for passing parameters to and from functions • Function arguments do not have to be loaded from memory • Register stack overflows into process virtual memory • Speculative Loads from memory
Intel 786 IA-64 architecture • Speculation • Data speculation • Advanced Load with an associated Check to ensure that there was no intermediate store instruction • ld8.a r6=[r8] makes an entry in the ALAT (Advanced Load Address Table) • ld8.c r6=[r8] is a zero cycle Check instruction that must be issued prior to using the data loaded in r6 speculatively • Store into memory at [r8] sets the NaT Register bit & invalidates the ALAT entry, causing the processor to recover the Load • Control speculation • Advanced Load in front of a Branch instruction with an associated Check to ensure the Branch was taken • ld8.s r6=[r8]
Intel 786 IA-64 architecture • Predication • Conditional execution of an instruction based on a qualifying predicate value • Contained in a Predicate Register • Uses: • If conversion: remove branches from IF-THEN-ELSE constructs and execute in-line predicated instructions • Loop optimizations (control parallel execution)
HP i2000 Itanium Workstation • Uses first generation Itanium chip: • 733 MHz • 4.2 GB/sec system bus • 1 GB RAM • DVD/CD drive • Ethernet port • Etc.
HP i2000 Itanium Workstation • Uses first generation Itanium chip: • Install Evaluation copy of Windows XP – 64 bit from bootable CD-ROM • Test Performance SeNTry collection agent
64-bit Address Space • One uniform 64-bit Virtual address space • 7152 GB Process address spaces are built on demand 0 User Mode User Space 6fc 0000 0000 Kernel Mode User Space 1fff ff00 0000 0000 User Page Tables 2000 0000 0000 0000 Session Space 3fff ff00 0000 0000 Session Space Page Tables e000 0000 0000 0000 System Space e000 0600 0000 0000 System Space Page Tables ffff ff00 0000 0000
64-bit Windows Applications • WOW64 provides emulation services for 32-bit applications • Thunking in User mode is performed to extract arguments from the 32-bit stack, extend them to 64 bits, then make the native 64-bit system call to ntdll.dll. • WOW64.dll, WOW64cpu.dll, and WOW64win.dll increase the size of the application’s working set significantly • System calls redirected to %systemroot%\SysWOW64 for 32-bit DLLs
64-bit Windows Applications • WOW64.dll, WOW64cpu.dll, and WOW64win.dll increase the size of the application’s working set significantly
64-bit Windows Programming • New data types • DWORD32 32-bit unsigned integer • DWORD64 64-bit unsigned integer • INT32 32-bit signed integer • INT64 64-bit signed integer • LONG32 32-bit signed integer • LONG64 64-bit signed integer • UINT32 Unsigned INT32 • UINT64 Unsigned INT64 • ULONG32 Unsigned LONG32 • ULONG64 Unsigned LONG64
64-bit Windows Programming • New Pointers: • POINTER_32 • A 32-bit pointer. On 32-bit Windows, this is a native pointer. On 64-bit Windows, this is a truncated 64-bit pointer. • POINTER_64 • A 64-bit pointer. On 64-bit Windows, this is a native pointer. On 32-bit Windows, this is a sign-extended 32-bit pointer.
64-bit Windows Programming • New 64-bit compiler • macros: • _WIN64 – 64-bit platform. • _WIN32 – 32-bit platform. This value is also defined by the 64-bit compiler for backward compatibility. • _WIN16 – 16-bit platform • Inline Helper functions to convert from one data type to another • E.g., UIntToPtr
Where to get more information • “Windows XP: Kernel improvements create a more robust, powerful, and scalable OS” by David Solomon and Mark Russinovich, MSDN Magazine, December 2001. • Itanium Processor Microarchitecture Reference http://developer.intel.com/design/itanium/downloads/245474.htm • Programming Itanium-based Systems By Triebel, Bissell and Booth (Intel Press) • TechNet or the Microsoft Developer Network (MSDN) CD