1.05k likes | 1.15k Views
Best Practices for Domino Server and Application Tuning. Andy Pedisich Technotics. What We’ll Cover …. Tuning hardware and OS Optimizing Domino server performance Examining opportunities in on-disk structure (ODS) Keeping applications under control Mastering cluster replication
E N D
Best Practices for Domino Server and Application Tuning Andy Pedisich Technotics
What We’ll Cover … • Tuning hardware and OS • Optimizing Domino server performance • Examining opportunities in on-disk structure (ODS) • Keeping applications under control • Mastering cluster replication • Dealing with database corruption • Resolving specific problems with databases • Wrap-up
Keep Up with Domino Fixpacks and Releases • Use this link to find out what’s new • www-10.lotus.com/ldd/r5fixlist.nsf/WhatsNew • In some cases, this will take you to a “Top 20 Fixes” for a new release • Granted, reading all this material can cure anyone’s insomnia, but someone has to do it and it might as well be you • Lots of Domino shops like to lag a bit when it comes to fixpacks • Why do I need to keep up? • “I didn’t see anything that might affect us” • Here’s a good example of why you might want to keep up with the fixpacks, even if you didn’t see a problem in your environment
Running Domino on Windows 2008 64-Bit • Windows 64-bit introduced a new problem with Domino • Microsoft Windows 2008 64-bit servers sometimes have significantly increased CPU usage and I/O degradation when Lotus Domino opens or backs up large numbers of databases • www-01.ibm.com/support/docview.wss?uid=swg21449825 • I personally saw one case where we couldn’t seem to put enough RAM into the system • Server started running at 100% RAM util, and stayed that way • It wasn’t until we were in the 16GB range that the utilization dropped down to 85% • No users were on the Domino server at the time • Not everyone would even see this problem
Virtual Address Space Becomes Exhausted • The Virtual Address Space cache may be completely used up • Successive calls to OS cache manager to get memory from the OS system cache results in mapping/un-mapping of views from the system cache • These operations take a lot of CPU time and, as a result, show as high OS CPU usage • In addition, the large OS system cache may now reside on the disk • RAM is not large enough to hold the OS system cache • The result is significant I/O on the system • This occurs with Domino 8.5.2
You Might Need a Hotfix and a Domino Parameter • Domino opens databases with a RANDOM flag FILE_FLAG_RANDOM_ACCESS • In Windows 2008 64-bit, this flag causes file blocks that are read to stay in the cache until the file is closed • Domino keeps files open in the Database Cache (dbcache) for performance reasons • It takes quite a long time until the cache is released
Parameter Needed for Release 8.5.2 FP2 and a Hotfix • SPR #KBRN899NF6 and a hotfix provides a notes.ini variable to disable the FILE_FLAG_RANDOM_ACCESS • Once you have installed the hotfix, use this parameter • Disable_Random_RW_File_ATTR=1 • It is fixed in Domino 8.5.2 FP3 and 8.5.3 • It’s another great reason to keep up with fixpacks and new releases • But you’re still going to need a lot more memory running on Windows 2008 (R2 also) • SPR# KBRN8AKKA9 – Fix to better improve performance when opening files on Windows 64-bit platform
Keep Disks Unfragmented • Many administrators falsely believe that Domino does not suffer from fragmented files on disk • Fun fact: Domino uses smaller allocations for new documents • This can cause files to be spread out across the disk, which can cause performance issues, especially during backups • The system has to hunt for all the sectors spread everywhere on the disk • Defragment once per week when the server is not busy • There are several Windows tools, such as: • Contig V 1.6 • It’s a free tool from Microsoft • http://technet.microsoft.com/en-us/sysinternals/bb897428
A Free Defrag Tool for Domino that Uses Contig 1.6 • Domino Defrag 3.2 OpenNTF Project • www.openntf.org/internal/home.nsf/project.xsp?action=openDocument&name=DominoDefrag • An open source solution of R853+ C API Lotus Domino server task (DominoDefrag.exe) and a R853+ Lotus Domino server XPages database called the DominoDefrag Administrator • DominoDefragAdmin.nsf – relies on http://extlib.openntf.org/ • Server task uses “contig.exe” (v1.6) to defrag Domino databases on all Windows server 2003-2008 versions (32-bit and 64-bit) • And will also defrag a full-text index associated with a Notes database and the Domino server’s transaction log and DAOS files
A NOTES.INI Parameter Improves the Product • DominoDefrag_EnterpriseSupport=1 (on) • Output is recorded to CSV files, and sent to the DominoDefrag Administrator for processing attached to a summary email • Has the added functionalities: • Being able to compact a database prior to defragging • Supports multi-processing (can load multiple times to run concurrently) and use of an indirect file (.ind) for compact batch functionality • Performance checks can also be tested using generated document collections • This will help to determine the “before and after” defrag millisecond read performance of databases and their associated full-text indexes
A General File System Recommendation for All OS • Keep at least 30% free space available on all drives • This allows the file system to optimize where to write data • Helps to reduce file fragmentation • Keep file systems below 1GB on all platforms • This helps performance, and makes disaster recovery faster and simpler • You might have to split your data up to fit the smaller volumes • The payback will come from better performance for mail and applications • Admittedly, it is harder to have smaller volumes with mail files than with applications • We like keeping all mail in one folder, don’t we?
Working With the Server Availability Index (SAI) • Did you ever track an SAI and noticed that a server never really seemed to be available? • Or maybe you never tracked an SAI before • You can, with our special Statrep database • TechnoticsR85Statrep.ntf • Download free from www.andypedisich.com
The Stats Are There, Now You Can See Them • It has all the views that are on the original Statrep • Plus over a dozen additional views to help you analyze the stats your servers generate
The SAI Is Fixed in R8.5 • It was broken for many years • SAI calculation on fast servers still might not work for you • There is a routine called LOADMON that runs on Domino that stores values in a LOADMON.NCF file on the server • It compares access times using micro-seconds • On a fast server, at off-peak times, transactions can take just a few micro-seconds • For normal servers, the SAI can sometimes look low
The Expansion Factor Servers determine their workload based on the expansion factor This is calculated based on response times for recent requests Server compares recent response time to minimum response time that the server has completed Example: Server currently averages 12ms for DBOpen requests; minimum time was 4ms Expansion factor = 3 (current time/fastest time) This is averaged over different types of transactions Fastest time is stored in memory and in LOADMON.NCF LOADMON.NCF is read each time server starts
Delete LOADMON.NCF When the Server Starts • Delete LOADMON.NCF when server is down to delete old minimum values • Do this with a scripted start under the Windows platform • Delete LOADMON.NCF before Domino starts • You can still do it on the Linux platform for free • Nash!Com has a start script for free • www.nashcom.de/nshweb/pages/startscript.htm • The link has a list of all changes • Plus a link where you can request the script from Daniel • Daniel is one of the smartest Domino administrators I have met in my entire career • Linux/Unix start script can delete LOADMON.NCF automatically
The Expansion Factor But sometimes, Domino has a difficult time calculating the expansion factor The result is that the Server_AvailabilityIndex is not a reliable measure of how busy the server is This can happen with extremely high-performing servers If you see a very low Server_AvailabilityIndex at a time you know servers are supposed to be idle and you are trying to load balance, there is something you can do to correct it And Domino can help!
Changing Expansion Factor Calculation Use this parameter to change how the Expansion Factor is calculated SERVER_TRANSINFO_RANGE=n To determine the optimal value for this variable: After the server has experienced heavy usage, use this console command: Show AI This means, show the availability index calculation It has nothing to do with that 2001 Steven Spielberg movie, about the robot that looks like a child and tries to become a real boy
An Easy Way to Find the Parameter Value Show AI is a console command that has been around since Domino Release 6 It runs some computations on the server And suggests a SERVER_TRANSINFO_RANGE for you
Platform Disk Statistics • The disk specification will vary by server • Platform.LogicalDisk.1.AvgQueueLen • AvgQueueLen: The average number of both read and write requests that were queued for all logical disks on all physical disks during the sample interval • Should not consistently rise above 2 • Platform.LogicalDisk.1.PctUtil • PctUtil: Percent of time the drives are busy reading or writing • Watch for disks constantly hitting above 80% • Track both of these statistics in Notes with the new Statrep • Follow up with performance monitoring on the OS level
Change the View Temp File Default Folder • By default, Domino generates temp files in the server’s temporary folder when it rebuilds a view • Directory used by update/updall tasks for rebuilding indexes • The default is usually somewhere on the system drive C: when using Windows servers • If the system doesn’t have a temp folder, Domino puts the temp files in the Domino data folder • Because of the disk I/O and disk space required, you should change the location to a different drive • Not your Domino data drive, or your transaction log drive, or your OS drive, or your DAOS file system • For maximum performance, it should be on its own drive
Make Sure There Is Plenty of Space Available • Use this parameter: • VIEW_REBUILD_DIR=(drive and folder location) • Make sure you have plenty of space available • The performance increase is worth the trouble • If Domino calculates that there isn’t enough space on the temporary folder’s drive, it uses a slower method to rebuild the view • You’ll see the message below in the log and console • It’s best to remedy this with more disk space, or performance will actually drop Warning: Unable to use optimized view rebuild for view due to insufficient disk space at directory. Estimate may need x million bytes for this view. Using standard rebuild instead.
Anti-Virus Software on Domino Servers • I hate running AV software as a Domino task • Many shops have stopped using it because malicious software is caught with perimeter software or desktop software • If you must run OS platform AV software, remember to exclude: • Domino data directory • Transaction log drive • TMP directory • DAOS drive • View rebuild directory
What We’ll Cover … • Tuning hardware and OS • Optimizing Domino server performance • Examining opportunities in on-disk structure (ODS) • Keeping applications under control • Mastering cluster replication • Dealing with database corruption • Resolving specific problems with databases • Wrap-up
Use Transaction Logging • Transaction logging can increase performance significantly • Enable transaction logging in the server document • T-Logs might already be in use in Archive logging style if servers are backed up incrementally • Otherwise, use the Circular logging style so that transaction logging reuses space • But be careful where you put the logs
Choices to be Made by Administrators • You’ll need to decide whether to configure the transaction logs to create more or less checkpoints • To record a recovery checkpoint, Domino evaluates each active logged database to determine how many transactions would be necessary to recover each database after a system failure • Then, it creates a recovery checkpoint record in the transaction log that lists each open database and the starting point transaction needed for recovery
Runtime/Restart Performance • Your choices are: • Standard (default and recommended) • To record checkpoints regularly • Favor runtime • To record fewer checkpoints • Requires fewer system resources and improves server run-time performance, but causes more of the log to be applied during restart • Favor restart recovery time • To record more checkpoints • This option improves restart recovery time because fewer transactions are required for recovery
Location of Transaction Logs • Transaction logs work best if placed on Raid 1 disks • These are mirrored drives • And should be local to the server • These logs should not be placed: • On the Wintel system drive C: • On the same drive as the Domino data • On a SAN drive
Disconnect Idle Users • An idle user stays connected to a server for 4 hours • This takes up valuable server resources • Use this parameter to drop idle users faster • SERVER_SESSION_TIMEOUT=(number of minutes) • Users will not have to re-enter a password if they become active after the time limit • The minimum recommended setting is 30-45 minutes • A lower setting may negatively impact server performance • IBM/Lotus says it’s not needed in R8 • But I like to use the parameter regardless • It gives you more realistic user concurrency stats
1,000 Users – Server_session_timeout=60 • Comparison of memory usage on a Domino server
650 Users – Server_session_timeout=30 • Domino server memory comparison with and without the parameter set to 30
650 Users – Server_session_timeout=30 (cont.) • CPU Utilization comparing with and without the parameter
Disable HTTP Server Logging • We’ve found many instances where DOMLOG.NSF was well over 2GB • And it was nearly impossible to wait for it to open • Because it had never actually been opened before • If you don’t look at the logs, improve performance by disabling the HTTP server logging • It’s in the HTTP section of the server document • Disable both the Enable Logging and Domlog.nsf
Don’t Maintain Read Marks on All Databases • Replication of unread marks was primarily designed for mail databases • If you don’t need them, don’t replicate them, because it can significantly slow database performance • For example, keep them switched off in Help, LOG.NSF, NAMES.NSF, and any reference application • Work with your developers to develop standards for enabling or disabling the feature
Plan on a Monthly Restart for Domino Servers • Consider regular monthly restarts of Domino servers • Not just Wintel-based servers, all servers • Server memory allocation and shared memory fragmentation can occur over time • Plus, there could be undocumented memory leaks • Regular restarts will help ensure your Domino servers are running as efficiently as possible
Keep as Few Documents in Inbox as Possible • We all know large mail files are a problem, right? • This is true, if only from the perspective of disk space • But the issue is bigger than just disk space • And here’s the proof you can take back to your domain • IBM/Lotus did a study using Domino on the iSeries called: • Sizing Large-Scale Domino Workloads on iSeries • They found that reducing the number of documents kept in the inbox: • Reduces overall CPU usage • Improves response time • And can dramatically improve startup/recovery performance
It’s Very Logical When You Think About It • In terms of performance, the Inbox is the most “expensive” container in a mail file • The Inbox folder contains all new messages a mail file receives • It must be updated each time a user opens the file • Or clicks Refresh to see new mail • The more documents kept in the Inbox folder, the more expensive it is to refresh the view of it • Reducing the number of documents in the folder reduces the CPU and main storage required to update the view of it
What Can You Do About It? • Two things you can do about this problem • First, when a user calls and says that Notes is slow, ask this question: • How many messages are in your inbox? • This should be a standard part of your help desk response • Urge them to keep no more than 90 days in the inbox • Use NOTES.INI parameters on Notes client to demonstrate how indexing the inbox is a major problem • CLIENT_CLOCK=1 • Debug_Console=1
Use Release 8.x Inbox Manager • Second, control the number of messages in the inbox using settings in the AdminP section of the server document • AdminP can start an agent in the user’s mail file to remove messages from the Inbox • This can also be controlled from policies • The messages are not deleted • They are still in the All Documents view • Users need to know where the messages can be found
Control User Polling for New Mail • Some users want to know if they have new mail • They configure a user preference to check for new mail every couple of minutes • If there are a lot of users on a server, a setting like this can really hurt performance
Override the User Configuration for New Mail Polling • Add this parameter to mail server’s NOTES.INI to control how often a client can check for new mail • MinNewMailPoll= (number of minutes) • Experiment with this number, but 15 is safe • This parameter overrides the user’s selection in the Mail Setup dialog box • This can prevent frequent polling from affecting server performance • Parameters like this one should be in every server’s NOTES.INI • That’s why they belong in a server configuration document
Port Compression • Enable network port compression! • This is especially good for server-to-server communication • Must be enabled on server • Client should be enabled using policies • Up to 60% compression of data
What We’ll Cover … • Tuning hardware and OS • Optimizing Domino server performance • Examining opportunities in on-disk structure (ODS) • Keeping applications under control • Mastering cluster replication • Dealing with database corruption • Resolving specific problems with databases • Wrap-up
There Is a New On-Disk Structure for Domino 8 • The term On-Disk Structure (ODS) describes the internal architecture of Notes databases • Each new release, except ND7, has included an update to the ODS to accommodate new features and functions • Domino 8 includes a new On-Disk Structure, ODS48
Design Compression Saves Space • Design compression reduces the size of databases by compressing design elements by up to 60% • It will shrink the standard Notes 8 mail template MAIL8.NTF from 25MB to 11MB • The compression percentage achieved will vary from database to database • This is based on the compression ratio achieved for each design element in each application
Enabling Design Note Compression • The design compression switch is available on the Advanced tab of the properties of applications with ODS43 and ODS48 • You must be using the Notes 8 client to see the option • However, the compression will not occur unless the application is subsequently upgraded to ODS48 • Once enabled, the Design Compression setting replicates to other replicas of the application • Keep in mind that the ODS itself does not replicate
Your ODS By Default Is 43 • When a new application is created in a Lotus Notes 7, 8, or 8.5 client or on a Lotus Domino 7, 8, or 8.5 server, the on-disk structure (ODS) remains at 43 • The on-disk structure has been upgraded in Notes/Domino 8.5 to the new ODS version of 51 • Add the following parameter to the NOTES.INI on the server or client to use ODS 51: • CREATE_R85_DATABASES=1
Use Compact –C to Upgrade to New ODS • Yes, it must be a compact –C, –B will not work • Makes it easy to plan the ODS upgrade • Low risk, no problems have been seen • Besides the “compress database design” option from ODS 48 in advanced properties, it gives you options to turn on • Compression of non-summary data • Use Domino Attachment and Object Service (DAOS)
What We’ll Cover … • Tuning hardware and OS • Optimizing Domino server performance • Examining opportunities in on-disk structure (ODS) • Keeping applications under control • Mastering cluster replication • Dealing with database corruption • Resolving specific problems with databases • Wrap-up
Making Applications Behave • You’re not a developer, you’re an administrator • What can you do to help applications stay under control? • The biggest complaints about agents that run applications are: • The agents run too long • The agents consume vast amounts of memory • The agents utilize too much CPU on the server • And all of these complaints are usually made anecdotally • They are in conversations heard in elevators or around water coolers • Are there still water coolers for people to stand around, gossiping?