720 likes | 740 Views
Open VMS Performance Tips & Tricks. Guy Peleg President Maklee Engineering guy.peleg@maklee.com. Performance – Why should you care?. Application Tuning. Oracle Tuning. System Tuning. Java Tuning. The Golden Rules. Source: OpenVMS Information Desk – October 2004.
E N D
Open VMS Performance Tips & Tricks Guy Peleg President Maklee Engineering guy.peleg@maklee.com
Performance – Why should you care? Application Tuning Oracle Tuning System Tuning Java Tuning
The Golden Rules Source: OpenVMS Information Desk – October 2004 The best performing code isthe code not being executed The fastest I/Os are those avoided Idle CPUs are the fastest CPUs Look at your code….be ready to be surprised
RMS • RMS holds great potential for improving performance • The C RTL uses RMS • Most C applications would benefit from RMS tuning
RMS • RMS parameters related to performance: • FAB/RAB parameters (should you have access to the code): • ASY, RAH, WBH, DFW, SQO • ALQ & DEQ • MBC & MBF • NOSHR, NQL, NLK • SET RMS … • /SYSTEM | /PROCESS • /BUFFER_COUNT=n • /BLOCK_COUNT=n • SYSGEN> SET RMS_SEQFILE_WBH 1 • Don’t be afraid of Global Buffers
FTP Performance & Simple RMS Tuning FTP into IT13 and transfer the file Brutel> ftp it13 220 IT13.bruclass.com FTP Server (Version 5.6) Ready. Connected to ALPH13.BRUCLASS.COM. Name (ALPH13.BRUCLASS.COM:bru_guy): peleg 331 Username peleg requires a Password Password: 230 User logged in. FTP> cd $1$dga703:[000000] 250-CWD command successful. 250 New default directory is $1$DGA703:[000000] FTP> put HP-I64VMS-JAVA150-V0105-1-1.PCSI_SFX_I64EXE 200 TYPE set to IMAGE. 200 PORT command successful. 150 Opening data connection for $1$DGA703:[000000]HP-I64VMS-JAVA150-V0105-1-1.PC SI_SFX_I64EXE; (192.168.1.7,49428) 226 Transfer complete. local: SYS$SYSDEVICE:[BRU_GUY]HP-I64VMS-JAVA150-V0105-1-1.PCSI_SFX_I64EXE;1 rem ote: HP-I64VMS-JAVA150-V0105-1-1.PCSI_SFX_I64EXE 286026004 bytes sent in 00:00:49.92 seconds (5594.83 Kbytes/s) 200 TYPE set to ASCII.
FTP Performance & Simple RMS Tuning $ set rms/sys/exte=60000/seq/block=127/buf=8 $ mc sysgen SYSGEN> SET RMS_SEQ 1 SYSGEN> W A SYSGEN> Exit Throughput increased by more than 50% FTP> put HP-I64VMS-JAVA150-V0105-1-1.PCSI_SFX_I64EXE 200 TYPE set to IMAGE. 200 PORT command successful. 150 Opening data connection for $1$DGA703:[000000]HP-I64VMS-JAVA150-V0105-1-1.PC SI_SFX_I64EXE; (192.168.1.7,49432) 226 Transfer complete. local: SYS$SYSDEVICE:[BRU_GUY]HP-I64VMS-JAVA150-V0105-1-1.PCSI_SFX_I64EXE;1 rem ote: HP-I64VMS-JAVA150-V0105-1-1.PCSI_SFX_I64EXE 286026004 bytes sent in 00:00:31.83 seconds (8773.78 Kbytes/s) 200 TYPE set to ASCII.
gZIP & RMS • gZIP is written in C – I/Os eventually reach RMS • 1.6 Ghz rx2600, MSA30, OpenVMS V8.3 • Test 1 • Compress 5.67 GB saveset • Decompress 2.74 gZIP archive • Default O/S & RMS settings • Test 2 • Compress 5.67 GB saveset • Decompress 2.74 gZIP archive • SET RMS/BLOCK=127/EXTEN=60000/BUFFER=8, RMS_SEQFILE_WBH=1
gZIP & RMS Elapsed Time in Minutes (less is better)
Smaller MBC for Random Access • Times to read 1,000,000 records randomly (same sequence of records (where mbc passed as first parameter: $ frand 32 Elapsed time == 42823ms $ frand 64 Elapsed time == 54761ms $ frand 96 Elapsed time == 66343ms $ frand 124 Elapsed time == 80122ms $ $ frand 1 Elapsed time == 31205ms $ $ frand 1 Elapsed time == 31233ms $ frand 2 Elapsed time == 31680ms $ frand 4 Elapsed time == 32607ms $ frand 8 Elapsed time == 33698ms $ frand 16 Elapsed time == 36101ms
RMS & fsynch() • Writing small amount of data? • Using fsynch() ? • Slow ! • Setting MBC & MBF to 1 is (almost!) identical • Still need to take care of EOF
Sequential Writes • Frequent file expansions are expensive • Typically seen with: • BACKUP savesets • Database Imports • FTP’ing large files • The significant amount spent expanding files impacts performance • If possible – pre allocate files (container files) • Limit the number of expansions on a volume: • $ SET VOLUME/EXTEND=65535
Black Magic… • What would you say about improving system performance by 5% - 20%? • A typical response would be – “What does it take?” • Nothing ! Just a small change to one SYSGEN parameter • ….and some physical memory • Sounds interesting?
Introducing the VHPT • Each CPU contains a translation buffer • Special cache to hold recent translations of virtual memory address to physical address • When a TB miss occurs the O/S has to resolve the translation by walking the page tables • Itanium provides an extra layer for resolving addresses – Virtual Hash Page Table (VHPT) • VHPT – linear array of 32 byte entries • Created by OpenVMS at boot time but not accessed by it
VHPT • Order of use • CPU TB cache • VHPT • OpenVMS performs 3 level address translation walks the page tables. • The VHPT is sized by a system parameter - VHPT_SIZE • Default value of 1 means allocate 32KB per CPU for the VHPT
VHPT • Default VHPT settings should be sufficient for small applications (up to 8MB of virtual address space). • Large applications with poor locality would benefit from increasing the VHPT. • Generally speaking – an application that benefits from enabling HT would benefit from an increase to the VHPT. • YMMV !!
VHPT Benchmark • The following charts illustrate the impact of increasing the VHPT made on Oracle batch jobs • rx6600 – 8 cores • OpenVMS V8.3-1H1 • EVA8000 • Oracle 10gR2 • HyperThreads Enabled • 64 GB of physical memory • With VHPT = 10000, 2.5GB of physical memory is allocated for the VHPT.
Oracle Batch job A 23% performance increase Elapsed Time in Minutes (less is better)
Oracle Batch job B 22% performance increase Elapsed Time in Minutes (less is better)
CPU Power Management (IA64 only) • CPUs may be placed in a “lower power mode” when idle. • Reduces energy costs for the system. • SYSGEN parameter CPU_POWER_MGMT turns this feature on/off. • May impact performance. • In a recent engagement we noted 30% performance improvement on an rx6600 by turning power management off (set CPU_POWER_MGMT=0)
Shadowed RAM disk • Shadowed RAM disk for applications that frequently read data from disk. • The Shadow server will read from memory and will write to both devices. • Forces data to remain resident in memory • Significantly boosts performance when files are opened cluster wide by multiple users • XFC will not help • Beneficial if file update rate is low compared to the read rate • Included in the EOE & MCOE packages
Physical Disk Vs. RAM disk • C application that processes records read from sequential file • Each I/O 124 Blocks • RX2600, OpenVMS V8.3, HSG80 Elapsed time to read 250MB file (less is better)
V8.3-1H1 • When possible upgrade to V8.3-1H1 • Performance improvements • Always inspire to stay current with O/S version • Relink Applications using the V8.3-1H1 Linker • The new linker produces smaller images • Reduction between 2% - 18% • 0% is also possible • Montvale based systems – There is more than meets the eye…
V8.3-1H1 – Addendum kit • EFICHK operation is performed during the patch installation Performance improvements The following product will be installed to destination: HP I64VMS VMS831H1I_ADDENDUM V1.0 DISK$SYS831H1:[VMS$COMMON.] Portion done: 0%...10%...20%...30%...40%...50%...70%...80%...90% %MOUNT-I-FATCHECK, volume created by EFI$CP version V5.2-5 checking for errors, repairing, and updating FAT information.%EFICP-W-BADCCNT, FS0:\EFI\VMS\TOOLS\ACPIDUMP.EFI actual cluster count of 126 does not match the file allocation of 127. Filesize of 258232 bytes, requires 508 blocks (rounded to the cluster factor of 4) 508 blocks shown allocated, but 126 actual clusters (504 blocks) counted in file The disk storage (258048 bytes) is smaller than the file size (258232 bytes) Truncating file! ***CHECK CONTENTS FOR VALIDITY***%EFICP-I-FATCHECK, 1 errors found, 1 fixed. 18 files in 4 folders checked, 12095166 total bytes in 5913 clusters%EFICP-I-FATCHECK, Updating the FAT EFI$CP version information to V6.0-1, FAT version 1%EFI-I-COPIED, copied FS0:\EFI\VMS\IPB.EXE to PCSI$DESTINATION:[SYSEXE]FLAG_IPB.EXE%EFI-I-COPIED, copied PCSI$DESTINATION:[SYSEXE]IPB.EXE to FS0:\EFI\VMS\%EFI-I-COPIED, copied FS0:\EFI\VMS\IPB.EXE to PCSI$DESTINATION:[SYSEXE]CHECK_IPB.EXE...100%COPIED, copied FS0:\EFI\VMS\VMS_LOADER.EFI to PCSI$DESTINATION:[SYSEXE]CHECK_VMS_LOADER.EFI
Resident Images – a mystery AlphaServer GS1280 7/1150 Elapsed time to execute a program (less is better)
Resident Images AlphaServer GS1280 7/1150 Elapsed time to execute a program (less is better)
Resident Images rx6600 4P/8C 1.6 Ghz Elapsed time to execute a program (less is better)
Resident Images • Alpha • the image activator has to apply the relocations - pagefaults • Link using /section=code • Avoid /section=data • IA64 • relocations are mapped into memory (the dynamic segment stays in paged pool)
SORTing • HYPERSORT • Multi-threaded • $ define sortshr sys$library:hypersort.exe • Spread work files among disks/controllers/adaptors • Apart from input/output disks • No problem to have input and output on same disk
Sort 100,000,000 Records • 100 bytes each • 19,531,250 blocks • 3 work files • ~618,000 IO Sort32 • ~922,000 IO HyperSort • No XFC file caching of input, output or work • HyperSort Elapsed < CPU
PEDRIVER Data Compression • OpenVMS V8.3 • Reduces traffic between nodes • May be beneficial for Shadow copy and MSCP traffic • Can be enabled system wide or per VC
Turn on compression for one VC SCACP> set vc it14/comp SCACP> sh vc IT13 PEA0 VC Summary 30-JAN-2007 07:43:28.02: Remote VC Total Channels ECS MaxPkt ReXmt --XmtWindow-- Xmt Total ----------- Most Recent ----------- - Node State Errors Xmt:TMO Open ECS Pri Size TMO(uSec) Cur Max Mgt Options Pkts(S+R) VC Opened Time VC Closed Time ------ ----- ------ --------- ---- --- --- ---- --------- ---- ---- ---- ------ --------- ------------------ --------------- --- ALPH50 Open 4 115444 2 2 0 1426 672330.3 33 64 0 889107 21-JAN 13:34:25.78 (No time) ALPH40 Open 0 Infinite 2 2 0 1426 516452.3 16 32 0 803545 21-JAN 13:34:25.72 (No time) IT14 Open 1 790292 2 2 0 1426 223273.5 32 64 0 CMP 1242954 21-JAN 13:34:25.93 (No time) IT13 Open 0 Infinite 1 1 0 1426 3000000.0 1 8 0 5 21-JAN 13:34:23.05 (No time)
PEDRIVER Data Compression • Copy 250MB file to MSCP served SCSI disk • Both systems are rx2600, running OpenVMS V8.3 Elapsed time to copy 250MB file (less is better)
Alignment Faults • No performance talk is complete without mentioning Alignment Faults • Alignment faults on Itanium will have serious impact on performance • May be an (performance) issue on Alpha as well
What is an Alignment Fault? When an attempted: • Longword memory access is not aligned on a memory boundary that is divisible by 4 • Quadword memory access is not aligned on a memory boundary that is divisible by 8 • Word memory access is not aligned on a boundary that is divisible by 2 An alignment fault is generated and control is transferred to code that will complete the load/store through shifting, masking and setting bits.
Why Worry? OpenVMS Monitor Utility ALIGNMENT FAULT STATISTICS on node DWARF 3-MAY-2007 14:26:56.27 CUR AVE MIN MAX Kernel Fault Rate 0.00 0.66 0.00 1.33 Exec Fault Rate 0.00 0.00 0.00 0.00 Super Fault Rate 0.00 0.00 0.00 0.00 User Fault Rate 640253.31 662505.00 640253.31 684756.68 Total Fault Rate 640253.31 662505.83 640253.31 684758.31
Why Worry? +-----+ TIME IN PROCESSOR MODES | CUR | on node DWARF +-----+ 3-MAY-2007 14:26:59.27 Combined for 2 CPUs 0 50 100 150 200 + - - - - + - - - - + - - - - + - - - - + Interrupt State | | | | | | MP Synchronization 9 |* | | | | | Kernel Mode172 |********************************** | | | | | Executive Mode | | | | | | Supervisor Mode | | | | | | User Mode 19 |*** | | | | | Compatibility Mode | | | | | | Idle Time | + - - - - + - - - - + - - - - + - - - - +
Let the Compiler Warn You in Advance $ cc/nomember/warning=enable=alignment align_test int x; ................^ %CC-I-MISALGNDMEM, This member is at offset 1, which is not a multiple of the member's alignment of longword. Consider padding before this member, rearranging the order of member declarations, or using #pragma member_alignment. at line number 10 in file SYS$SYSDEVICE:[test]ALIGN_TEST.C;7 int x; ................^ %CC-I-MISALGNDSTRCT, This member requires longword alignment for efficient access, but is contained in a struct containing byte alignment. Consider using #pragma nomember_alignment longword. at line number 10 in file SYS$SYSDEVICE:[test]ALIGN_TEST.C;7 sub(&z[i].x,&z[i].a); ....................^ %CC-W-ALIGNCONFLICT, In this statement, the address "&z[i].x" has alignment of byte which is less than the alignment requirements of the destination pointer. Dereferencing the destination pointer may cause an alignment fault. at line number 22 in file SYS$SYSDEVICE:[test]ALIGN_TEST.C;7 $
Reporting Alignment Faults • Analyze alignment faults on Alpha prior to a port • Only works on current process • sys$perm_report_align_fault • sys$perm_dis_align_fault_report $ r align_testAddress of x == 10001%SYSTEM-I-ALIGN, data alignment trap, virtual address=0000000000010001, function=00000000, PC=000000001DCF0202, PS=0000001B%SYSTEM-I-ALIGN, data alignment trap, virtual address=0000000000010001, function=00000001, PC=000000001DCF0212, PS=0000001B%SYSTEM-I-ALIGN, data alignment trap, virtual address=0000000000010006, function=00000000, PC=000000001DCF0202, PS=0000001B%SYSTEM-I-ALIGN, data alignment trap, virtual address=0000000000010006, function=00000001, PC=000000001DCF0212, PS=0000001B%SYSTEM-I-ALIGN, data alignment trap, virtual address=000000000001000B, function=00000000, PC=000000001DCF0202, PS=0000001B%SYSTEM-I-ALIGN, data alignment trap, virtual address=000000000001000B, function=00000001, PC=000000001DCF0212, PS=0000001B%SYSTEM-I-ALIGN, data alignment trap, virtual address=0000000000010015, function=00000000, PC=000000001DCF0202, PS=0000001B
Process Affinity • Running on a large system with a low load? • Running on a large system with heavy load? • Better utilize the CPU caches (data cache, instruction cache & TB) by affinitizing your process to a set of CPUs • In HT environment affinitize to one core • Up to 25% performance increase
Generating Primes GS 1280 7/1150 EV7 has EV68 “core”
Free Hot File Tracking Utility $ sh mem/cache=(volume=*,topqio) System Memory Resources on 26-APR-2007 01:39:15.03 Extended File Cache Top QIO File Statistics: _$1$DGA642: (DISK$ES40), Caching mode is VIOC Compatible _$1$DGA642:[VMS$COMMON.SYSEXE]RIGHTSLIST.DAT;1 (open) Caching is enabled, active caching mode is Write Through Allocated pages 9 Total QIOs 107 Read hits 92 Virtual reads 107 Virtual writes 0 Hit rate 85 % Read aheads 0 Read throughs 107 Write throughs 0 Read arounds 0 Write arounds 0 _$1$DGA642:[VMS$COMMON.SYSEXE]VMS$OBJECTS.DAT;2 (open) Caching is enabled, active caching mode is Write Through Allocated pages 0 Total QIOs 9 Read hits 0 Virtual reads 9 Virtual writes 0 Hit rate 0 % Read aheads 0 Read throughs 9 Write throughs 0 Read arounds 0 Write arounds 0 _$1$DGA642:[VMS$COMMON.SYSEXE]VMS$AUDIT_SERVER.DAT;1 (open) Caching is enabled, active caching mode is Write Through Allocated pages 1 Total QIOs 4 Read hits 0 Virtual reads 4 Virtual writes 0 Hit rate 0 % Read aheads 0 Read throughs 4 Write throughs 0 Read arounds 0 Write arounds 0 Total of 3 files for this volume
Free Hot File Tracking Utility _$1$DGA242: (DISK$ITANIUMVMS), Caching mode is VIOC Compatible _$1$DGA242:[VMS$COMMON.SYSLIB]DECC$SHR.EXE;1 (open) Caching is enabled, active caching mode is Write Through Allocated pages 303 Total QIOs 1646 Read hits 1561 Virtual reads 1646 Virtual writes 0 Hit rate 94 % Read aheads 0 Read throughs 1642 Write throughs 0 Read arounds 4 Write arounds 0 _$1$DGA242:[VMS$COMMON.SYSLIB]LIBRTL.EXE;1 (open) Caching is enabled, active caching mode is Write Through Allocated pages 143 Total QIOs 1165 Read hits 1123 Virtual reads 1165 Virtual writes 0 Hit rate 96 % Read aheads 0 Read throughs 1164 Write throughs 0 Read arounds 1 Write arounds 0 _$1$DGA242:[VMS$COMMON.SYSLIB]CMA$TIS_SHR.EXE;1 (open) Caching is enabled, active caching mode is Write Through Allocated pages 12 Total QIOs 720 Read hits 711 Virtual reads 720 Virtual writes 0 Hit rate 98 % Read aheads 0 Read throughs 720 Write throughs 0 Read arounds 0 Write arounds 0 Avoid caching files that pollute the cache
Elapsed time for I/Os SDA> xfc show volume/brief Summary of XFC Cached Volumes (CVBs)------------------------------------Volume Name CVB Open Closed Total Read Read Write ... Response (Milliseconds)... Files Files I/Os Hits Count Count Hits disk AverageDISK$CARFAX FFFFFFFEE01895E0 0 0 0 0 0 0 (N/A) (N/A) (N/A)DISK$UP FFFFFFFEE0189380 0 0 0 0 0 0 (N/A) (N/A) (N/A)DISK$ORADAT FFFFFFFEE0189120 26 3 1872255 0 0 1872255 (N/A) 0.0000 0.0000DISK$ORADSK FFFFFFFEE0188EC0 73 177 22015701 14108183 21116834 898891 0.0232 0.5811 0.2236DISK$IA64_V82 FFFFFFFEE0188C60 0 0 0 0 0 0 (N/A) (N/A) (N/A)DISK$82SOURCE FFFFFFFEE0188A00 0 0 1 0 1 0 (N/A) (N/A) (N/A)DISK$IT14_10292 FFFFFFFEE01887A0 2 0 0 0 0 0 (N/A) (N/A) (N/A)DISK$ES40 FFFFFFFEE0188540 4 3 27676052 27667501 27674665 1387 0.0118 0.4007 0.0120DISK$IT14_DOSD FFFFFFFEE01882E0 0 0 0 0 0 0 (N/A) (N/A) (N/A)DISK$SYS831H1 FFFFFFFEE0188080 313 183 2736618 2668894 2713025 23594 0.0179 0.5425 0.0308 SDA>XFC SHOW VOLUME/BRIEF
The XFC “overhead” RDB users – consider disabling caching of .RDA files Elapsed time to copy 150MB file, rx2600, HSG80, OpenVMS V8.3
IBM MQ series • MQ is a heavy user of pthreads • Set MULTITHREAD to 1 • Thread manager upcalls are enabled; the creation of multiple kernel threads is disabled
Sizing Working Sets • Respect AUTOGEN but don’t trust it blindly • Alpha Server ES47, 16GB RAM • maximum process count of 2500 processes • AUTOGEN will set PQL_MWSDEFAULT to 17.38MB • 17.38MB X 2500 = 43.45GB RAM • Exceeds Physical memory by almost 3 times
Sizing Working Sets • It’s not 1980 any more… • Determine the size of XFC cache + MPW_HILIMIT • Subtract the sum from the number of fluid pages on the system (MMG$GQ_FLUID_PGCNT) • Divide by the maximum number of processes that have ever been running on the system (PMS$GL_PROCCNTMAX) • Multiply the result by 16 to translate from pages to pagelets • If you are conservative, take 70% of the result and set working set limit and quota to this value • Working set extent should be 3 times the result • Make sure PGFLQUOTA is properly sized
TCP/IP & Gigabit Ethernet • Using Gigabit Ethernet? • Turn on Jumbo frames • Frames larger than 1518 bytes, more data per frame -> less frames -> less interrupts -> better performance • Must be supported by the switch • Must be configured before TCP/IP is started • mc lancp set dev ewa/jumbo • Bit 6 in SYSGEN parameter LAN_FLAGS