570 likes | 703 Views
REP704: RepServer Performance And Tuning. Greg Carter Staff Engineer II greg.carter@sybase.com August 15-19, 2004. The Enterprise. Unwired. The Enterprise. Unwired. Industry and Cross Platform Solutions. Manage Information. Unwire Information. Unwire People.
E N D
REP704: RepServer Performance And Tuning Greg Carter Staff Engineer II greg.carter@sybase.com August 15-19, 2004
The Enterprise. Unwired. Industry and Cross Platform Solutions Manage Information Unwire Information Unwire People • Adaptive Server Enterprise • Adaptive Server Anywhere • Sybase IQ • Dynamic Archive • Dynamic ODS • Replication Server • OpenSwitch • Mirror Activator • PowerDesigner • Connectivity Options • EAServer • Industry Warehouse Studio • Unwired Accelerator • Unwired Orchestrator • Unwired Toolkit • Enterprise Portal • Real Time Data Services • SQL Anywhere Studio • M-Business Anywhere • Pylon Family (Mobile Email) • Mobile Sales • XcelleNet Frontline Solutions • PocketBuilder • PowerBuilder Family • AvantGo Sybase Workspace
Complexities Of Replication Performance • What’s the topology? • What machines are involved? • What security services are required? • What networking mechanisms are used? • What is the profile of replicated transactions? • How is the replicate data used?
Case Study • In a simple environment, observe performance with no tuning • Tune RepServer as intuition, experience or observations indicate • Observe any changes to performance and repeat • Use RepServer’s Monitor & Counter to make observations • Use TPC-C Transaction Clients at the Primary Database
Hardware Specifics • Manufacturer: Sun (Sun Microsystems) • System Model: Ultra 4 • Main Memory: 3.0 GB • Number of Processors: 4 • Processor Type: UltraSPARC-II • Processor Speed: 400MHz • Operating System: Solaris8 kernel version 108528-07 • Disk Devices: Twelve 18 GB IBM Ultrastar 36LZX DDYS-T18350 10,000 rpm with 3 physical Heads and a 4.9ms seek time • This machine was not dedicated to these tests
Software Specifics • Replication Server/12.6/EBF 11961 ESD#2/Sun_svr4/OS 5.8/1/OPT/Thu May 20 18:04:38 2004 • Adaptive Server Enterprise/12.5.2/EBF 11941 ESD#1/P/Sun_svr4/OS 5.8/ase1252/1838/32-bit/FBO/Sat May 29 09:42:35 2004
edsprs03 prs03 pds03 rds03 Host Topology
Topology 1> admin who 2> go Spid Name State Info ---- ---------- -------------------- ---------------------------------------- 20 DSI EXEC Awaiting Command 101(1) edsprs03.edbprs03 12 DSI Awaiting Message 101 edsprs03.edbprs03 8 SQM Awaiting Message 101:0 edsprs03.edbprs03 23 DSI EXEC Awaiting Command 102(1) pds03.tpcc 13 DSI Awaiting Message 102 pds03.tpcc 16 DIST Awaiting Wakeup 102 pds03.tpcc 19 SQT Awaiting Wakeup 102:1 DIST pds03.tpcc 10 SQM Awaiting Message 102:1 pds03.tpcc 9 SQM Awaiting Message 102:0 pds03.tpcc 24 REP AGENT Awaiting Command pds03.tpcc 22 DSI EXEC Awaiting Command 103(1) rds03.tpcc 14 DSI Awaiting Message 103 rds03.tpcc 11 SQM Awaiting Message 103:0 rds03.tpcc 15 dSUB Sleeping 6 dCM Awaiting Message 7 dAIO Awaiting Message 17 dREC Sleeping dREC 18 dSTATS Sleeping 26 USER Active sa 5 dALARM Awaiting Wakeup
Transaction Profile (Over 7 Minutes Period) • Command Proportions • Inserts: 31.6% • Updates: 65.8% • Deletes: 2.6% • Average commands per Tran: 15 • Average Memory Consumption per Group: 60,497bytes • Large Transactions: 0 • Text/image data: 0 • Replicated DDL: 0 • Tables Replicated: 9 • Average Number of Columns per Table: 9
Observation Setup 1> admin statistics,flush_status 2> go RepServer RSSD flushing status for prs03 ran on: 06/07/04 03:17:05 PM Rep Server Counters Flush Configuration. ---------------------------------------- Configuration_Name Run_Value ------------------------------ ---------- stats_daemon_sleep_time 600 stats_sampling on stats_flush_rssd off stats_reset_afterflush on stats_intrusive_counter off
Observation Setup Rep Server module flushing status. ---------------------------------- Module_Name Info Flush_Status ----------- ---------------------------------------- ------------ CM dCM off DIST 102 pds03.tpcc off DSI 103 rds03.tpcc off DSI 102 pds03.tpcc off DSI 101 edsprs03.edbprs03 off REPAGENT pds03.tpcc off SQM 101:0 edsprs03.edbprs03 off SQM 102:0 pds03.tpcc off SQM 102:1 pds03.tpcc off SQM 103:0 rds03.tpcc off SQT 103:0 DSI 103 rds03.tpcc off SQT 101:0 DSI 101 edsprs03.edbprs03 off SQT 102:1 DIST pds03.tpcc off SQT 102:0 DSI 102 pds03.tpcc off STS STS off SYNC SYNC_GLOB off
Observation Setup 1> configure replication server 2> set stats_daemon_sleep_time to '120' 3> go Config parameter 'stats_daemon_sleep_time' is modified. 1> admin stats_intrusive_counter, 'on' 2> go 1> admin stats_config_module, all_modules, 'on' 2> go These all take effect immediately. Only “stats_daemon_sleep_time” is persistent. Be careful that all modules for which stats are to be collected are up and running.
Observation Setup 1> admin statistics,flush_status 2> go RepServer RSSD flushing status for prs03 ran on: 06/09/04 12:49:26 PM Rep Server Counters Flush Configuration. ---------------------------------------- Configuration_Name Run_Value ------------------------------ ---------- stats_daemon_sleep_time 120 stats_sampling on stats_flush_rssd off stats_reset_afterflush on stats_intrusive_counter on
Observation Setup Rep Server module flushing status. ---------------------------------- Module_Name Info Flush_Status ----------- ---------------------------------------- ------------ CM dCM on DIST 102 pds03.tpcc on DSI 103 rds03.tpcc on DSI 102 pds03.tpcc on DSI 101 edsprs03.edbprs03 on REPAGENT pds03.tpcc on SQM 101:0 edsprs03.edbprs03 on SQM 102:0 pds03.tpcc on SQM 102:1 pds03.tpcc on SQM 103:0 rds03.tpcc on SQT 102:1 DIST pds03.tpcc on SQT 102:0 DSI 102 pds03.tpcc on SQT 101:0 DSI 101 edsprs03.edbprs03 on SQT 103:0 DSI 103 rds03.tpcc on STS STS on SYNC SYNC_GLOB on
Making Observation • Run transaction load for a period of longer than the Stats Daemon sleep time - 3 or 4 times longer at least • Execute “admin statistics,sysmon” for a period of time longer than the Stats Daemon sleep time, but short enough so that it will complete before the transaction load finishes
Making Observation 1> admin statistics,sysmon,420 2> go The session issuing this command will be blocked for 7 minutes. The transaction load for these tests lasts at least 15 minutes.
Making Observation Sybase Replication Server Statistics Report =============================================================================== RepServer Version: Replication Server/12.6/EBF 11961 ESD#2/Sun_svr4/OS 5.8/1/OPT/Thu May 20 18:04:38 2004 Open Server Version: Sybase Server-Library/12.5.1/P-EBF11759-11758 ESD #2/DRV.12.5.1.0/SPARC/Solaris 2.8 Native Threads/BUILD1251-002/OPT/Tue Mar 9 22:02:04 2004 Open CLient Version: Sybase Client-Library/12.5.1/P-EBF11758 ESD #2/DRV.12.5.1.0/SPARC/Solaris 2.8 Native Threads/BUILD1251-002/OPT/Tue Mar 9 20:37:42 2004 RepServer Name: prs03 Start Time: 06/21/04 09:04:43 PM End Time: 06/21/04 09:11:44 PM Sample Interval (secs): 421
Making Observation RepServer Runtime Configurations -------------------------------- memory_limit: 20 init_sqm_write_delay: 1000 init_sqm_write_max_delay: 10000 sqm_write_flush: on sqt_max_cache_size: 1048576 sqt_init_read_delay: 2000 sqt_max_read_delay: 10000 sts_cachesize: 100 sqm_recover_segs: 1 smp_enable: off
Making Observation DSI Statistics ----------------------------- Info Instance_ID Instance_Val TransTotal NgTransTotal TransSucceeded CmdGroups TransInCmdGroups CmdGroupsCommit CommitsInCmdGroup CmdsSucceed GroupsClosedBytes GroupsClosedNoneOrig GroupsClosedTranPartRule PartitioningWaits AllThreadsInUse AllLargeThreadsInUse ExecsCheckThrdLock TrueCheckThrdLock CommitChecksExceeded GroupsClosedTrans ---------------------------------------- ----------- ------------ ----------- 103 rds03.tpcc 103 -1 742 5952 742 742 5952 742 5953 0 741 0 0 0 5242 0 0 0 0 0
Making Observation Info Instance_ID Instance_Val TransApplied NgTrans CmdsApplied TransMaxGroup MemUsedMaxGroup SysTransRead InsertsRead UpdatesRead DeletesRead ExecsWritetext TPS_Average NgTPSaverage ErrsDeadlock ErrsOutofLock SendTimeAvg DSIEBPSaverage TransAvgGroup MemUsedAvgGroup SendRPCTimeAvg DSIEResultTimeAve DSIEBatch DSIEBatchTimeAve DSIEBatchSizeMax DSIEBatchSizeAve DSIEOCmdCountAve DSIEICmdCountAve DSIEBFCommitNext DSIEBFMaxCmds DSIEBFMaxBytes DSIEFSMapTimeAve DSIESCCTimeAve DSIESCBTimeAve DSIETranTimeMax DSIETranTimeAve ---------------------------------------- ----------- ------------ ------------ 103(1) rds03.tpcc 103 1 742 5952 90716 13 65536 0 24694 52023 2105 0 1 13 0 0 0 103623 8 60538 0 7 742 11 8192 6143 17 18 1484 0 3157 0 0 0 196 56
Analysis REPAGENT Statistics ----------------------------- Info Instance_ID Instance_Val CmdsTotal CmdsApplied CmdsRequest CmdsMiniAbort UpdsRslocater PacketsReceived BytesReceived RAYields RAYieldTimeAve RAWriteWaits RAWriteWaitsTimeAve ---------------------------------------- ----------- ------------ ----------- ----------- ----------- ------------- ------------- --------------- ------------- ----------- -------------- ------------ ------------------- pds03.tpcc 102 -1 146211 126210 0 1024 146 38294 70727869 23167 0 9617 2 About 3 of every 7 minutes the RepAgent spends waiting for writes to complete. No CPU issue for RepAgent.
Analysis SQM Statistics ----------------------------- Info Instance_ID Instance_Val CmdsWritten BlocksWritten BytesWritten SegsActive SegsAllocated SegsDeallocated TimeAveNewSeg TimeAveSeg UpdsRsoqid WritesTimerPop WriteRequests BlocksFullWrite BPSaverage CmdSizeAverage ---------------------------------------- ----------- ------------ ----------- ------------- ------------ ----------- ------------- --------------- 102:1 pds03.tpcc 102 1 146221 10040 157641940 1 157 157 268 369 157 0 146221 10040 378204 961 Info Instance_ID Instance_Val CmdsRead BlocksRead BlocksReadCached SleepsWriteQ ---------------------------------------- ----------- ------------ ----------- ----------- ---------------- ------------ 102:1 pds03.tpcc,0,SQT 102 11 146268 10043 6196 9526 Reads right on top of writes. Many from cache. Reader having to wait for blocks to be written.
Analysis SQT Statistics ----------------------------- Info Instance_ID Instance_Val CmdsTotal OpenTransTotal CmdsAveTran CacheMemUsed MemUsedMaxTran MemUsedAveTran TransRemoved TruncTransTotal ClosedTransTotal ReadTransTotal OpenTransRmTotal TruncTransRmTotal ClosedTransRmTotal ReadTransRmTotal ---------------------------------------- ----------- ------------ ----------- -------------- ----------- ------------ -------------- -------------- ------------ --------------- ---------------- -------------- ---------------- ----------------- ------------------ ---------------- 102:1 DIST pds03.tpcc 102 1 146268 9493 15 964096 46751 18541 0 9493 9493 9489 9493 9488 9489 9488 Recall sqt_max_cache_size set to 1048576. Cache is not quite full – 92%. No transactions flushed from cache. Cache is sized sufficiently.
Analysis DIST Statistics ----------------------------- Info Instance_ID Instance_Val CmdsTotal TransProcessed SREstmtsInsert SREstmtsUpdate SREstmtsDelete SREstmtsDiscard TDbegin TDclose ---------------------------------------- ----------- ------------ ----------- -------------- -------------- -------------- -------------- --------------- ----------- ----------- 102 pds03.tpcc 102 -1 144497 9488 40919 81519 3084 0 9487 9488
Analysis SQM Statistics ----------------------------- Info Instance_ID Instance_Val CmdsWritten BlocksWritten BytesWritten SegsActive SegsAllocated SegsDeallocated TimeAveNewSeg TimeAveSeg UpdsRsoqid WritesTimerPop WriteRequests BlocksFullWrite BPSaverage CmdSizeAverage ---------------------------------------- ----------- ------------ ----------- 103:0 rds03.tpcc 103 0 144434 3276 52311934 22 51 32 824 11868 51 32 144434 3244 126444 245 Info Instance_ID Instance_Val CmdsRead BlocksRead BlocksReadCached SleepsWriteQ ---------------------------------------- ----------- ------------ ----------- ----------- ---------------- ------------ 103:0 rds03.tpcc,0,DSI 103 10 90683 2050 0 0 Reader is far behind No reads from cache. No waiting for a write.
Analysis SQT Statistics ----------------------------- Info Instance_ID Instance_Val CmdsTotal OpenTransTotal CmdsAveTran CacheMemUsed MemUsedMaxTran MemUsedAveTran TransRemoved TruncTransTotal ClosedTransTotal ReadTransTotal OpenTransRmTotal TruncTransRmTotal ClosedTransRmTotal ReadTransRmTotal ---------------------------------------- ----------- ------------ ----------- -------------- ----------- ------------ -------------- -------------- ------------ --------------- ---------------- -------------- ---------------- ----------------- ------------------ ---------------- 103:0 DSI 103 rds03.tpcc 103 0 90683 5946 15 1048832 18215 7547 0 5946 5946 5947 5946 5953 5947 5953 Recall sqt_max_cache_size set to 1048576. Cache is full. No transactions flushed from cache. Cache is sized sufficiently.
Analysis DSI Statistics ----------------------------- Info Instance_ID Instance_Val TransTotal NgTransTotal TransSucceeded CmdGroups TransInCmdGroups CmdGroupsCommit CommitsInCmdGroup CmdsSucceed GroupsClosedBytes GroupsClosedNoneOrig GroupsClosedTranPartRule PartitioningWaits AllThreadsInUse AllLargeThreadsInUse ExecsCheckThrdLock TrueCheckThrdLock CommitChecksExceeded GroupsClosedTrans ---------------------------------------- ----------- ------------ ----------- 103 rds03.tpcc 103 -1 742 5952 742 742 5952 742 5953 0 741 0 0 0 5242 0 0 0 0 0 Very often, DSI/S is ready to schedule a transaction but all DSI/Es are occupied.
Analysis Info Instance_ID Instance_Val TransApplied NgTrans CmdsApplied TransMaxGroup MemUsedMaxGroup SysTransRead InsertsRead UpdatesRead DeletesRead ExecsWritetext TPS_Average NgTPSaverage ErrsDeadlock ErrsOutofLock SendTimeAvg DSIEBPSaverage TransAvgGroup MemUsedAvgGroup SendRPCTimeAvg DSIEResultTimeAve DSIEBatch DSIEBatchTimeAve DSIEBatchSizeMax DSIEBatchSizeAve DSIEOCmdCountAve DSIEICmdCountAve DSIEBFCommitNext DSIEBFMaxCmds DSIEBFMaxBytes DSIEFSMapTimeAve DSIESCCTimeAve DSIESCBTimeAve DSIETranTimeMax DSIETranTimeAve ---------------------------------------- ----------- ------------ ------------ 103(1) rds03.tpcc 103 1 742 5952 90716 13 65536 0 24694 52023 2105 0 1 13 0 0 0 103623 8 60538 0 7 742 11 8192 6143 17 18 1484 0 3157 0 0 0 196 56 On average, 11 1/100ths of a second spent per batch. On average, less than 1/100th of a second spent in FS map. On average, less than 1/100th of a second spent sending a batch. On average, 7 1/100ths of a second spent processing the results of a batch.
Analysis STS Statistics ----------------------------- Info Instance_ID Instance_Val SelectsTotal InsertsTotal UpdatesTotal DeletesTotal BeginsTotal STSCacheExceed ---------------------------------------- ----------- ------------ ------------ ------------ ------------ ------------ ----------- -------------- STS 11 -1 444 1337 208 1318 944 0 sts_cachesize sufficiently sized.
Analysis • The RepAgent is obviously capable of shoving commands in faster than RepServer is writing them to the IBQ • Downstream stats (particularly in the DSI) indicate that allowing it to do so would offer no benefit other than to drain the PDB transaction log faster
Analysis • From OBQ SQM stats the reader (DSI/S) is lagging considerably behind the writer (DIST). • However, OBQ SQT cache is full (and sufficiently sized). So DSI/S is not struggling to keep SQT cache full • Also consider that IBQ SQM stats indicate that about three times the number of blocks are being written/read than on the OBQ ( 10,040/10,043 vs. 3,276/2,050). (Note that 10,040 blocks on IBQ 3,276 blocks on OBQ due to message format differences.) So the Stable Device has no trouble in keeping up with OBQ I/O requests.
Analysis & Adjustment • Over 5,000 times in the 7 minute observation period, DSI/S wanted to schedule a transaction but could not find an available DSI/E • Observation: This implies that Parallel-DSI may help • Intuition: Before doing that, however, let’s see if we can take advantage of the multiple processors. Maybe the DSI/E just needs more processor time. 1> configure replication server 2> set smp_enable to 'on' 3> go Config parameter 'smp_enable' is modified. This change will not take effect until the Replication Server is restarted. 1> shutdown 2> go
Adjustment • After reboot, we must turn on intrusive counters and module flushing (Make sure the RepAgent reconnected) • Make another observation 1> admin stats_intrusive_counter, 'on' 2> go 1> admin stats_config_module, all_modules, 'on' 2> go 1> admin statistics,sysmon,420 2> go
Analysis • Observation: While DSI throughput has increased some 26%, we also see an increase of some 21% in the number of times DSI/S wanted to schedule transactions but the DSI/E was already busy with one. Parallel-DSI continues to be implicated as an area to pursue. • SMP enables the RepAgent to push through 27% more commands. However the RepAgent continues to wait on RepServer to write the commands as fast as they are being pushed. Still, the DSI is the limiting factor. • Observation: For both IBQ and OBQ, updates to rs_oqid increased about 28%. We can reduce this RSSD I/O by increasing sqm_recover_segs.
Analysis • Observation: For both IBQ and OBQ, writes due to the SQM’s alarm clock going off are nearly 0. Since in the case of the OBQ the SQT cache is full, there is no indication that adjustments to init_sqm_write_delay or init_sqm_max_write_delay are necessary. • Observation: And even though the SQT cache is not being kept full for the IBQ, there is no perceived delay in processing this queue. Therefore, decreasing init_sqm_write_delay and init_sqm_max_write_delay, while it may keep cache more full, would have no effect on overall throughput. • Observation: Since RepAgent stats indicate that commands are being pushed as fast as RepServer will allow as opposed to a “trickle”, no adjustments to sqt_init_read_delay and sqt_max_read_delay are indicated.
Analysis • Observation/Experience: There are 9 tables being replicated for which 9 repdefs have been created. There are about 100 columns in total. Given that sts_cachesize is set to 100 (by default) we may or may not see advantage to fully caching rs_columns. Aside from rs_columns, the other two tables generally recommended to be cached are rs_objects and rs_functions.
Adjustment 1> configure replication server 2> set sqm_recover_segs to '5000' 3> go Config parameter 'sqm_recover_segs' is modified. This change will not take effect until the Replication Server is restarted. 1> shutdown 2> go • Observation made
Adjustment 1> configure replication server 2> set sts_full_cache_rs_objects to 'on' 3> go Config parameter 'sts_full_cache_rs_objects' is modified. This change will not take effect until the Replication Server is restarted. 1> configure replication server 2> set sts_full_cache_rs_columns to 'on' 3> go Config parameter 'sts_full_cache_rs_columns' is modified. This change will not take effect until the Replication Server is restarted. 1> configure replication server 2> set sts_full_cache_rs_functions to 'on' 3> go Config parameter 'sts_full_cache_rs_functions' is modified. This change will not take effect until the Replication Server is restarted. 1> shutdown 2> go • Observation made
Analysis • Fully caching rs_objects, rs_columns and rs_functions AND increasing sqm_recover_segs made no impact on this RepServer’s performance • But don’t let that discourage you from trying them in your RepServer – particularly if you have a good bit more than 9 repdefs! • DSI continues to be the bottleneck as indicated by the OQB reads being slower than the writes – behind by nearly 60% after 7 minutes • Experience: Typical DSI tuning, aside from Parallel-DSI, includes increasing db_packet_size and dsi_cmd_batch_size
Adjustment • In replicate ASE 1> sp_configure 'max network packet size',8192 2> go Parameter Name Default Memory Used Config Value Run Value Unit Type ------------------------------ ----------- ----------- ------------ ----------- -------------------- ---------- max network packet size 512 0 8192 512 bytes static (1 row affected) Configuration option changed. Since the option is static, Adaptive Server must be rebooted in order for the change to take effect. Changing the value of 'max network packet size' does not increase the amount of memory Adaptive Server uses. (return status = 0) 1> shutdown 2> go
Adjustment • In RepServer • Make observation 1> alter connection to rds03.tpcc 2> set db_packet_size to '8192' 3> go Config parameter 'db_packet_size' is modified. This change will not take effect until the connection/route is restarted. A connection/route can be restarted with the suspend and resume commands. 1> shutdown 2> go