Best practices for Implementing Lotus Domino in a Storage Area Network (SAN) Environment With the implementation of storage area networks (SAN) becoming more of a standard configuration, this paper describes the technology involved and how best to implement a Lotus Domino server in this environment. Where applicable, using network attached storage (NAS) with a Domino server is included as well. Contents: Introduction 2 What is NAS? 2 What is a SAN? 3 Differences between NAS and a SAN 4 DAS, NAS or SAN: Does Lotus Domino care? 4 IBM Lotus statement of Support for Domino on SAN and NAS equipment 5 Configuring a SAN for Domino 5 Optimizing Domino performance 10 Transaction logging 10 References 13 1
In the beginning there was Direct-attached storage, or DAS, is the most basic storage model, in which storage devices are part of the host computer. As the first widely popular storage model, DAS products still comprise a large majority of the installed base of storage systems in today's IT infrastructures. Direct-attached storage is still a viable option by virtue of being simple to deploy and by having a lower initial cost when compared to networked storage. When considering DAS, it is important to bear in mind that in order for clients on the network to access the storage device, they must be able to access the server it is connected to. The server being down or experiencing problems will have a direct impact on users' ability to store and access data. Over time the increased requirement for ever higher scalability and availability has meant that disk storage technology has evolved into an even more sophisticated form than the External Disk Sub-System by introducing disk storage as a networked entity. Networked storage comes in two flavors: network attached storage (NAS) and storage area networks (SAN). What is NAS? Network attached storage (NAS) evolved to support the concept of network file serving: transferring small amounts of data to clients on a file-by-file basis. NAS connects directly to the network using TCP/IP. In most cases, no changes to the existing network infrastructure need to be made in order to install a NAS solution. The network attached storage device is attached to the local area network (typically, an Ethernet network) and assigned an IP address just like any other network device. Network Attached Storage IP Network Unix Linux Netware Windows 2
Physically, NAS appliances are intelligent external disk sub-systems with network cards. They contain a stack of disks, usually in some highly available RAID format, and are recommended to be on a self-contained part of the local area network (LAN). Workstations and servers on a network gain access to the NAS appliance through connectionless protocols such as Network File System (NFS) and connection-orientated protocols such as Common Internet Files System (CIFS). What is a SAN? Like NAS, the storage on a storage area network (SAN) resides separately from the server. The difference is that storage devices on a SAN are connected to servers by a highly specialized and standardized disk storage networking protocol, the most common of which is Fibre Channel. Fibre Channel runs over Fibre Optic or Copper cables across a switched Network called a Fabric. The Storage Area Network consists of File Stores (intelligent disk sub-systems) and specialized Switches collectively called The Fabric and is accessed by the Servers over dedicated devices (Host Bus Adapters or HBA). The Host currently does not know that the SAN exists and thinks it is talking to local disks. Thus end users attached to the Host see these SAN-based disks as simply local storage on the server. IP Network Unix Linux Netware Windows Switch Switch SAN Switch SAN Switch Switch Switch Fibre Channel Fabric SAN storage device SAN storage device 3
This configuration allows SANs to transfers large amounts of data between servers and storage devices without creating a bottleneck at the storage device. SAN networks may include servers and disk arrays interconnected by the switching technology. Differences between NAS and a SAN Servers see SAN attached volumes as locally attached disks, whereas NAS presents them as remote Network File System (NFS) or Common Internet Files System (CIFS) file shares. With Network Attached Storage, the NAS Server itself understands the file and directory structures and does the handling just like any other file server attached across a network. That is, the NAS server understands what locking it can do (very limited at operating system level) and all the logical handling of the files (managing File Handles). In contrast, a SAN simply deals in Blocks, huge numbers of Blocks. The SAN does not know about file names and directories; only the Host needs to see that level. The SAN just sees all the disk traffic as pure streams of disk blocks. SAN products run on Fibre Channel, iscsi and various others protocols that run over the SAN Fabric. In contrast, NAS products can run over your existing TCP/IP network, and, as such, are prone to latency and broadcast storms, and compete for bandwidth with users and other network devices. A better method is to isolate the NAS network onto a private TCP/IP network that only handles the NAS traffic. This isolation has two advantages. First there is a more predictable amount of network traffic and latencies. The second, and often more significant, is it is more secure. A NAS product plugs into your existing IP network like any other device and looks like a normal file share on the network. So, a NAS can be dropped right into your existing IP network. A NAS network needs to be designed from the bottom up, and the traffic calculations done the same as with a SAN Fabric. Ethernet is a stable and mature protocol and most IT administrator are proficient in Ethernet and TCP/IP; so there is no steep learning curve compared with learning and understanding the SAN Fabric protocols. NAS security is typically implemented at the file-system level through traditional operating system access-control lists. The ability to use both hardware and software zoning security means SANs can provide a higher level of security than NAS. DAS, NAS or SAN: Does Lotus Domino care? In a word No! The Domino server architecture assumes it is operating in an environment provided by the underlying operating system (OS) that is reliable and tuned for accessing fast and reliable storage. That architecture has required very few provisions in the current Domino product to handle the unexpected loss of underlying storage. The Domino server interacts 4
only with the operating system supporting it and has limited knowledge of the underlying input/output (I/O) architecture. Therefore, Domino is dependent on the underlying operating system for fast and reliable I/O. If the operating system is not tuned to maximize the performance of the I/O, Domino performance will not be maximized. It is important to monitor the operating system and storage system for performance bottlenecks. This monitoring needs to be done at the storage subsystem level, the operating system level, and in Domino. IBM Lotus statement of support for Domino on SAN and NAS equipment In short, the use of a storage area network (SAN) attached to a supported operating system is a supported configuration for Lotus Domino servers. The SAN should be designed to provide dedicated storage systems for enterprise applications. This includes both Fibre Channel (FCP) and IP networks (iscsi), provided that the network connecting the servers to storage is a SAN. The Domino server is also supported on NAS equipment provided the NAS is deployed in a dedicated storage network configuration. NAS deployed for general-purpose file serving or in networks providing more than storage-related services is not a supported environment. A supported configuration for Lotus Domino is a dedicated, private network between the NAS storage device and the Domino server, used only between the storage and the server. For NAS, the use of NFS is recommended over CIFS because of the stateless nature of NFS. For example, if a Domino server with databases in a NAS/NFS configuration encounters a connection failure (such as a power failure in the NAS system) the Domino server thread(s) that are accessing database files on the NAS will block until the NAS system comes back on line. There is no data loss. If, however, this happened on a NAS/CIFS configuration, when the NAS system fails then Domino will detect that the database was unexpectedly disconnected and left in an unknown state. There can be data loss at this point. A transaction log playback or fixup must be performed before the database(s) can be used. Placing non-transaction-logged databases on NAS/CIFS is not recommended. Placing either transaction logs or transaction-logged-databases on NAS/CIFS is not supported. Configuring a SAN for Domino Because Domino and the operating system are unaware of the of the underlying SAN or NAS hardware, there are no tuning parameters available in Domino specifically targeted at SAN implementations. Therefore, to get the best performance it is necessary for you to optimize the SAN environment for Domino. When you configure a SAN for use with Lotus Domino, follow these recommendations. Consult with your SAN, NAS, or drive vendor for how to implement these recommendations. 5
Disk performance It may seem obvious that hard drive performance is a major contributor to the overall I/O throughput because faster drives perform disk I/O in less time. Although there are others, the most significant components to the time it takes a disk drive to execute and complete a user request are queuing time, then seek time and finally rotational latency. Queuing time is the time from the I/O request being made by the application to the operating system to the time the disk sub-system receives that request and starts to action it. Seek time is the time it takes to move the drive head from its current cylinder location to the target cylinder. Average seek time is usually 3-5 ms for current drives in use today. Once the head is at the target cylinder, the time it takes for the target sector to rotate under the head is called the rotational latency. Average latency is half the time it takes the drive to complete one rotation and so it is inversely proportional to the revolutions per minute (RPM) value of the drive: 15 000 RPM drives have a 2.0 ms latency 10 000 RPM drives have a 3.0 ms latency 7 200 RPM drives have a 4.2 ms latency 5 400 RPM drives have a 5.6 ms latency Choosing drives with a combination of low seek time and high RPM will provide the best performance at the disk level. RAID strategy Consult with your SAN vendor about how to configure the physical drives in the SAN device. Depending on your vendor s implementation of the disk array, you may or may not need to consider RAID strategy. It is important that disks be configured for best performance, while still considering reliability and data integrity. What follows are our best practices for configuring local disks for Domino s use. Your RAID strategy should be carefully selected because it significantly affects disk subsystem performance. For best performance for Domino data, use dedicated disks in RAID 1/0 Striped Mirrors. (Although RAID-5 is also acceptable, it does have roughly a 20% overhead as it has to write an additional block called the checksum as well as all the original data). While using multiple logical drives on a single physical array may be convenient, it can significantly reduce server performance. The fastest configuration is a single logical drive for each physical RAID array. If you have a requirement to partition your data, you should configure multiple RAID arrays instead of configuring multiple logical drives in one RAID array. The number of disk drives in an array also significantly affects performance because each drive contributes to the total throughput. In practice it has been found that where the average queue depth for a logical drive is much greater than the number of drives in the array, then adding additional drives is likely to improve performance. 6
With RAID 0 (striping) and RAID 5 (striping + checksum) technology, data is striped across an array of hard disk drives. Striping is the process of storing each data block across all the disk drives that are grouped in an array. The granularity at which data from one file is stored on one drive of the array before subsequent data is stored on the next drive of the array is called the stripe unit or interleave depth. The selection of stripe size affects performance. In general, the stripe size should be at least as large as the average disk I/O request size generated by the application. As a rule of thumb the stripe size used for Domino data should be between 32 64 Kb. Cache memory There are two types of cache memory used in the data path between the Domino server and the disk: Random access memory (RAM) such as in the host operating system (main RAM). RAM is volatile and will lose data on power loss or system crash. Non-volatile random access memory (NVRAM), which is typically used in the SAN or NAS disk storage device and in some cases the Host disk (RAID) controller. The NVRAM content will be preserved in the case of system crash or power loss. In order to preserve the integrity of the Domino databases, Domino forces the flushing of the RAM cache maintained by the operating system at strategic points. This flushing ensures that all write operations to databases are committed to non-volatile memory or physical disk. It should be noted that the write cache on many disk drives is configurable, yet because it is a RAM cache, it should not be used. This is because the flush operation that Domino uses will only flush the operating system RAM caches and does not extend to the drive caches. The SAN, NAS, or disk subsystem may have NVRAM cache memory in the write path to the underlying disk. The use of this cache is fully supported and recommended. The NVRAM cache will significantly improve Domino performance by returning a write completed response once the data reaches the cache, rather than later after it has actually been physically written to disk. The use of NVRAM actually makes the disk sub-system WRITES faster because once cached, the disk sub-system s operating system can inspect the blocks to be written in cache, and sort them into a more logical WRITE order in terms of cylinders and heads and thus reduce head movement and prevent the early onset of disks thrashing under high I/O. The cache within the SAN is one of the primary ways for a SAN to improve its performance. The SAN reads/writes directly to the cache, so it is important to monitor this, and make sure you have enough. As a general rule of thumb you can never have too much of this cache, and you should configure the SAN with as much RAM cache as you can afford. Dedicate as much of the RAM as possible as WRITE cache. This cache helps to prevent slow WRITE requests, delays which can cause performance issues in the Domino environment. 7
Cache memory on the RAID controller is shared for read and write operations. Careful consideration should be given to the configuration of the caching capabilities of the controller, because they potentially enhance the effective I/O capacity of the disk subsystem. Read and write caching Read caching can affect the reading performance, and an incorrect setting can have a large negative impact. The principle of read caching is to store additional sequential blocks in cache following a read request in the belief that it is likely they will also be required. For sequential workloads, this results in fewer but larger I/O transfers (between disk and cache) required to handle the same amount of data, which leads to an increase in performance. If, however, the workload is random, then read ahead caching should be disabled as the data blocks that are pre-fetched with each read request are rarely needed, and so the performance is negatively impacted. The file I/O read/write pattern in Domino is database dependent, but for almost all cases follows a random read/write pattern. Therefore read ahead caching should be disabled or set to a minimum setting. Write caching means the data is not written straight to the disk drives, but written to the cache. It is then the responsibility of the cache controller to eventually flush the unwritten cache entries to the disk drives. Because the slowest operation that a disk can do is to move the heads, the write cache will show a greater improvement to overall performance in the case of random writes to the file system, as compared to sequential writes. The write caches of both types RAM and NVRAM allow the OS, SAN, NAS, and RAID controller to plan the order of cache to disk writes to minimize head movement. Because of the Domino cache flush operations, the RAM cache will show significantly less performance benefit as compared to NVRAM cache. However, having a large OS cache buffer for I/O allows blocks, once read by Domino, to be stored for longer; thus if a block is dropped from the Domino Buffer (which has limited buffer capacity because it is a 32-bit process) and then Domino decides it needs that block again, if it is still residing in the OS cache it can be given back to Domino without the need for a real I/O to disk. For NAS and SAN systems, there can be latency between I/O request and I/O delivery because of the nature of NAS and SAN and their respective networks and devices. Improvements in SAN and NAS technology mean that newer SAN/NAS generally perform better than older devices, so consult with your vendor for performance data for your system. To minimize the amount of physical I/O needed, it is highly recommended to have a larger amount of operating system I/O Cache than would be required by a system with just locally attached disks. It should be noted that transaction log files have the RAM cache disabled (at a file level) but will benefit from NVRAM caches. The NVRAM write is always able to improve performance over a system with no NVRAM write cache by optimizing the order of 8
blocks written to minimize unnecessary head movements, and take the best advantage of the disk s current rotational position at all times. The use of operating system RAM write cache is not always a win. When Domino does a flush operation against a file that is cached, the operating system needs to examine the RAM cache pages, both read and write, to determine which pages have been modified. If you have a 10 gig database on a system with a large amount of RAM cache and the database resides largely in the RAM read cache, then finding which pages have been modified can be a costly operation. The Domino development team has worked with operating system vendors to develop more efficient algorithms. In one case, the time to flush the cache was reduced from.5 seconds to micro-seconds. Sizing the cache correctly has two aspects. In all cases NVRAM cache will help Domino performance. As the NVRAM cache is increased in size, there will be a point where adding more NVRAM cache will add little additional benefit in performance. On most storage system the cache utilization can be monitored; this can be used to indicate if the cache is of the correct size. In most operating systems, you will have a limited ability to control how much RAM is being used as a cache. If the operating system that you are using has the ability to control the RAM cache (often called the file system cache) and the operating system has not yet implemented fast dirty page file cache scan algorithm, then you may be able to improve performance by adjusting the cache size. It should be remembered that adjusting the file system RAM I/O cache is of system-wide scope and will impact everything running on the system. Device drivers Device drivers and firmware play a major role in performance of the subsystem with which the driver is associated. A device driver is software written to recognize a specific device. Most of the device drivers are vendor-specific; these drivers are typically supplied by the hardware vendor. The firmware and configuration is stored on the disk controller itself. Setting the optimal configuration and using the most recent versions of the vendors microcode is often the source of significant performance improvements. Wherever it is practical, you should always maintain your servers with the latest version of driver and firmware that is certified by your hardware and/or application vendors. Refer to your vendors support sites for information about which driver or firmware is the best to run. And seek specialist help from your vendor before changing configuration parameters. The default settings are usually optimal in the majority of situations. Changes may have unexpected and adverse consequences, but when done correctly can enhance performance and reliability. For drives that connect via the network, network card drivers and BIOS are also important. Make sure you check with your network card vendor to ensure you have the best, most reliable drivers and BIOS for your specific hardware. The best scenario is a dedicated, non-switched Fibre Channel directly connected to the SAN for each Domino partition (DPAR). Where the number of Domino partitions is small (less than three), a host bus adapter (HBA) controller should be dedicated for each Domino partition. However as the number of Domino partitions sharing the same 9
operating system increases, the ability to leverage scaling increases. For resiliency and high availability, in all cases a minimum of two separate Fibre Channel cards should be used, configured to full load-balancing and set to failover either way without a break in service in the event that either host bus adapter (HBA) should fail or a Fibre Channel cable breakage occurs. Most modern Fibre Channel HBAs are dual port, thus giving four HBA paths to the data server, which at 2Gb each should provide sufficient throughput for three busy Domino servers. Mounting a NAS/NFS device If the operating system supports mounting NFS devices with local locking, then this can be enabled and is supported with Domino. On UNIX mount o llock setting, this flag can significantly reduce the read traffic to the NAS device. Optimizing Domino performance Databases that you create in Lotus Domino 6.5 perform considerably better than databases created in previous releases. In 6.5, database operations require less I/O and fewer CPU resources, view rebuilding and updating is quicker, and memory and disk space allocation is improved. If your server has sufficient memory, you can improve the performance of the server by increasing the number of databases that Lotus Domino can cache in memory at one time. To do so, use the NSF_DbCache_Maxentries statement in the NOTES.INI file. The default value is 25 or the NSF_Buffer_Pool_Size divided by 300 KB, whichever value is greater, with the maximum being approximately 10,000. To determine if increasing this parameter will yield better performance, monitor the Database.DbCache.Hits statistic on your server. This statistic indicates the number of times a database open request was satisfied by finding the database in cache. A high value indicates that the database cache is working effectively. If the ratio of Database.DbCache.Hits to InitialDbOpen is low, you might consider increasing NSF_DbCache_Maxentries. To set the number of databases that a server can hold in its database cache at one time, set the NOTES.INI value as follows: NSF_DbCache_Maxentries = [number] Transaction Logging and SANs Domino transaction logging captures all changes made to Notes databases (*.NSF files) and writes them to a transaction log before writing them to the actual.nsf file. The logged transactions are then written to disk immediately as fast serial writes to a series of sequential files of 64MB in length in 4k blocks. There will be a few larger size blocks, but almost all will be 4k. It also is of note that the transaction log file is opened in a synchronous mode, whereas all other files used by Domino are opened in a buffered mode. Therefore, transaction log write operations do not utilize any RAM cache but do take advantage of NVRAM caches. The output to the transaction logging disk is almost entirely sequential writes, except during restart or recovery operations. Thus the writingto-disk performance profile and overall reliability are the keys to a successful configuration of transaction logging for a Domino server. 10
Best Practices for Transaction Logging Because data loss or corruption in the transaction logs can impact all databases on the Domino server, the decision as to where to place the logs is of paramount importance. Transaction logging has been measured adding up to 30% additional disk I/O. It is extremely important that the transaction logs are on the most reliable and highest performing disk subsystem that is available on the system. In some cases this is local disk, in other case it is on a SAN device. On NAS devices there is a difference between placing the transaction logs on a NAS/NFS or on a NAS/CIFS storage system. For NAS/NFS there is potential for performance problems unless you make sure the disks used for transaction logging perform as fast as possible. Where NAS/NFS performance is slower than Domino requires, as seen in some older systems, we have recommended transaction logs be placed on local disks. On NAS/CIFS there are potential problems with data loss or corruption as well as performance problems. Placing either transaction logs or transaction logged-databases on NAS/CIFS is not supported. Most SAN or NAS systems have additional features that can augment backing up the transaction logs and databases by using snap shot type actions. Taking advantage of this feature can reduce the cost of backups, reduce the time to do a backup, and improve the granularity of the backups. Transaction logging is a very different kind of disk I/O, as compared to regular database I/O. Transaction logging primarily performs sequential writes, versus normal Domino random I/O. For this reason, it is important to put the transaction logs on their own dedicated disks and with a dedicated path to those disks as well. The Transactional Logs should be on a separate physical drive to maximize the I/O write potential that transaction logging requires. It is not sufficient to simply redirect the logs to a separate partition or a separate logical drive. In general, if the transactional logs are on a separate drive, a 10-20% improvement should be seen. However, if the logs are put on the same drive, it is likely that there will be approximately 60% degradation. Important recommendations for using SAN or NAS for transaction logging: Use a separate file system, separate pathway, and separate disks for the transaction logs. Use RAID 1/0 (Stripe mirror) or mirrored pair (RAID 1), rather than RAID 5. o We recommend a mirrored disk set for transaction logs. Thus we recommend RAID 1 or RAID 1/0 for transactions logs because that provides best performance/reliability when you must use RAID strategy for sequential writes. o This depends on the vendor's recommendations, because some SAN/NAS hardware may internally do mirroring. Use the fastest, most reliable disks available. Configure the device with a Hot Spare available in case a disk physically fails. Do not share the disk controller (SAN and NAS) with any other users, if possible. 11
If using a SAN/NAS or separate disk system, then consider the following: o Use larger disk block size and matching Stripe size (transaction logging writes fixed sequential 4k blocks to 64 MB or greater files). o Because the transaction log files are opened in a synchronous mode, OS file system cache is not used. NVRAM cache in the disk subsystem helps. o Use 2 GB Fibre Channel rather than 1 GB. Have dedicated channels and avoid using data switches. Make sure you have adequate I/O capacity for transaction logging. Placement and type of the transaction logging If the logs are placed on a SAN or NAS, they should be placed on dedicated devices within the SAN or NAS. Each DPAR should have its own HBA connection to the SAN. The use of switches should be avoided. The performance of data transfer from the server must be monitored closely to ensure optimal transfer of data into the SAN or NAS. The speed of committal of transactions into the logs on SAN or NAS will greatly determine the performance of the server as a whole. If you have configured the Domino servers for failover and/or load balancing, we recommend the following configuration: Running circular or looping linear style transaction logging on user-facing servers for optimal performance and faster recovery after an outage. Running archival style transaction logging on the non-user-facing cluster mate to perform back up or restore activities. If possible use a cluster member or a separate offline system for recovery or restore activities. Summary As described in this paper, you can use a Lotus Domino server in conjunction with storage area network (SAN) and network attached storage (NAS) technologies. Careful planning with these best practices and recommendations in mind can provide a successful implementation. 12
Resources and references Storage Networking Industry Association (SNIA) http://www.snia.org Introduction to Storage Area Networks IBM RedBook number SG24-5470-01 http://www.redbooks.ibm.com/abstracts/sg245470.html Tuning IBM eserver xseries Servers for Performance IBM RedBook number SG24-5287-03 http://www.redbooks.ibm.com/abstracts/sg245287.html IBM SAN Survival Guide IBM RedBook number SG24-6143-01 http://www.redbooks.ibm.com/abstracts/sg246143.html Using Lotus Domino with Network Appliance storage products http://www.ibm.com/support/docview.wss?rs=463&uid=swg27005236 IBM Network attached storage (NAS) products home page http://www.ibm.com/systems/storage/nas/ 8 Copyright International Business Machines Corporation 2005, 2007. All rights reserved. 13