CommuniGate Pro White Paper Dynamic Clustering Solution For Reliable and Scalable Messaging Date April 2002
Modern E-Mail Systems: Achieving Speed, Stability and Growth E-mail becomes more important each day as a method of global communication. According to a report from technology analysts IDC, on an average day in 2000, 9.7 billion e-mail messages were sent worldwide. By the year 2005 that number will grow to 35 billion. Along with the increase in message size and traffic, comes an increase in users expectations. People send e-mail to communicate critical information across continents and time zones: they expect their information to get through. E-mail glitches are high profile events, generating negative press for the organization, frustrated users, loss of productivity, and most importantly loss of business. To meet the increasing demands of their users, modern Service Providers and IT departments face these major issues: The complexity of e-mail access operations grows every day (POP, IMAP, Web, Wireless.) The solution must provide carrier-grade availability, meeting the 5 nines (99.999%) uptime requirement. The solution must scale to support future growth. In reality, none of these issues can be addressed with a single-server implementation: clustering solutions are a must for modern large-scale installations. Clustering architecture loosely defined is multiple computers linked together to handle variable workloads or to provide continuous operation. As IT executives are quick to point out however, building e-mail server clusters is much more complicated than building Web server clusters because of the high data modification rate, complicated data processing, and variety of access methods (POP3, IMAP4, WebMail, etc.) that must be supported. This paper will explore various clustering methods to help determine which work best to keep your large-scale e-mail system fast, accessible and ready for growth. Static Clusters One solution commonly used by older e-mail software (developed in the mid-90 s) offered scalability by partitioning user accounts. This architecture distributed the load of e-mail processing by spreading users across multiple machines. While each server in a cluster had its accounts created on its local disks, a Directory server kept the location information for each account, and some sort of mail multiplexor, mail router, or frontend server directed user connections to the proper server based on the Directory information. Load Balancing but No Uptime Guarantees While this type of distributed solution, often called a Static Cluster, could scale
effectively (limited mostly by the performance and stability of the Directory server), it did not improve site availability. A failure of any server in the clus ter would make all of its accounts unavailable. This architecture also was difficult to maintain; as the system grew, administrators had to schedule downtime to move mailboxes to rebalance the load. These systems did not consider or implement efficient disk usage, single copy message store for example. Static Cluster Architecture: Early NFS (Network File System) Clusters Systems developed more recently tried to address the high availability issue by using a Shared File System (such as NFS file servers) with several e-mail servers running in parallel. The multiple e-mail servers accessed account and domain data through the Shared File System, as opposed to static clustering where accounts resided on specific servers. This configuration allowed sites to provide access to all account data even if some of the e-mail servers crashed. Slow/Unreliable But, while a Shared File System might offer uninterrupted access to data if one of the servers failed, the performance and reliability of the messaging system depended heavily on the performance of the storage. In systems like e-mail with high I/O demands, file access protocols (like NFS) sometimes caused instability and slow performance. Traditional data synchronization methods were based on file system locks which had several disadvantages: Slow - File locking procedure for each I/O operation produced very large overhead (up to 400%.)
Account corruption - In instances where 2 processes were initiated simultaneously and one process did not release the lock in time, the mail account became corrupted. Inefficient Use of Storage In an attempt to decrease the negative effects of file-locking, some mail servers only supported the MailDir (Mdir) mailbox format (one file per message), and then relied on the "atomic" nature of file directory operations rather than on file-level locks. This approach theoretically could solve some of the outlined problems, but it wasted much of the file server storage. Many high-end file servers used 64K blocks for files, while an average mail message size is about 4K. Storing each message in a separate file resulted in wasting more than 90% of the file server disk space, and overloaded internal file tables. Difficult to Administer Simple NFS clustering did not provide any additional features for administrators (like Single Server Image), so administering a 10 server cluster would be much more difficult than administering 10 independent servers. Dynamic Cluster Architecture - Solution for Speed, Uptime and Growth Today vendors of modern messaging software must develop solutions that support cluster architecture, while improving performance and stability of the storage system. CommuniGate Pro from Stalker Software Inc. is a product designed with the demands of dynamic, large-scale messaging systems in mind. In the CommuniGate Pro Dynamic Cluster architecture, the software actually helps to optimize performance and stability of the system, by removing the inefficiencies associated with file locking methods. Because the CommuniGate Pro server does not use file-locking mechanisms, it works 3-5 times faster with file systems and does not suffer from the problems associated with file locks. A CommuniGate Pro Dynamic Cluster can run with NAS (Network Attached Storage) such as NFS file servers, or take advantage of CFS (Cluster File Systems). What is Dynamic Cluster? The Dynamic Cluster solution operates as a group of front-end and back-end servers, accessing the domain and account data on the Shared File System. The front end servers perform mail relaying operations, SSL encryption/decryption, and protect the back-end servers from all possible types of TCP/IP attacks (including DoS attacks). The back-end servers provide access to the domain and account data. Load Balancing The two-tier architecture enables multi-level load balancing. Traditional traffic-based methods (such as Layer4 Switches) and DNS round robin are used to distribute incoming connections between front-end servers. The tunable load-based methods built into the Dynamic Cluster software distribute load between back-end servers.
Speed and Stability No File Locks The CommuniGate Pro Dynamic Cluster solution implements Account Level Synchronization with a Cluster Controller, using a special inter-server protocol to ensure that an account is used directly by only one cluster member at any given moment. If cluster server A has opened the account X, and some other server B tries to open the same account, the Controller instructs it to connect to server A and use it as a proxy to access the account data. Synchronization within each cluster member is implemented using faster multi-threading techniques, instead of file system locks. Using this fast inter-server protocol, CommuniGate Pro controls multiple client access to any mailbox served by the cluster, without requiring file locks. Every mailbox can be opened with several IMAP, POP, and Webmail sessions simultaneously, by the same user or different users. This allows the cluster to support modern multi-session IMAP clients, and provides shared mailboxes for group collaboration. 99.999% Uptime The multi-level, multi-server design allows the system to survive failure of any member, providing access to all accounts as long as at least one server survives. If the Cluster Controller server fails, some other cluster member assumes the role of the Controller. Such a Dynamic Cluster meets or exceeds 99.999% availability requirements. With CommuniGate Pro Dynamic Cluster technology, even if several servers fail, all users can access their e-mail as long as at least one front-end and one back-end server are operational. Growth The unique CommuniGate Pro Dynamic Cluster architecture provides almost unlimited scalability. Customers can easily increase capacity and services without interrupting end users. They simply add front and/or back-end servers to the cluster at any time, with no switch-off/recovery operations required. This enables support for unlimited numbers of users with 99.999% availability. Flexible Storage Options As previously mentioned, Dynamic Cluster architecture requires a Shared File System, so cluster members can work with the same data files at the same time. The most popular and well-known implementation of a Shared File System is a file server, called NAS (Network Attached Storage). CommuniGate Pro Dynamic Cluster also works with an alternative architecture, called SAN (Storage Area Network), which is becoming more and more popular with many companies implementing Shared File Systems using SAN. Dynamic Cluster Architecture with NAS The CommuniGate Pro Dynamic Cluster architecture allows all account and domain data to be stored in a network file server, without the drawbacks previously mentioned in the Early NFS Clusters section.
Speed Released from file-locking duties, the storage devices can handle a much larger number of accounts, increasing stability and performance. Unlike unreliable file locks, Account Level Synchronization does not leave a trace in the file system, so no cleanup is needed if any server fails. Efficient Use of Disk Because there is no longer the same risk of mailbox corruption, organizations can store e- mails in many formats. The single file (Mbox format) requires much less storage than the individual file (Mdir format) which creates large storage overhead as much as 90% in high-end file servers. Dynamic Cluster Architecture (using file server): Dynamic Cluster Architecture with CFS (Cluster File System) An alternative to NAS, many modern operating systems provide advanced clustering capabilities. A Cluster File System allows all servers in an OS cluster to mount and use the same file system(s) on shared devices. Unlike Network File Systems (NFS), Cluster File Systems do not require a dedicated server on the network. Cluster File Systems can utilize multiple SCSI connections provided with high-end SCSI storage devices, and they allow each server to exchange the data directly with storage
devices via a fiber SAN (Storage Area Network). To ensure file system integrity, Cluster File Systems use high-speed server interconnects. Speed and Reliability The CommuniGate Pro software and Cluster File System work together to allow quick access to account and domain data. CommuniGate Pro s Account Level Synchronization ensures that only one server deals with an account at any given moment. When this server closes the account, the user may reconnect quickly and be routed through another server. The second server would then begin to access the account data. At that time it would be essential that all changes the first server made to account data be "visible" to the second server. Here is where the Cluster File System would have synched all the caches and file allocation tables. There are some advantages to running a Dynamic Cluster on an actual cluster OS compared to using a shared file server: Performance - The SAN protocols are very effective for file transfers. Reliability Cluster File Systems provides better reliability than single-server NFS solutions (where the NFS server is a single point of failure.) Price - Even a low-end file server is an additional hardware expense. While cluster software is not free it can be less expensive. Dynamic Cluster Architecture (using Cluster File System): Dynamic Cluster Administration Whether you choose NAS or SAN storage for your CommuniGate Pro Dynamic Cluster, the architecture does not require additional administrative overhead. Because of the highly integrated nature of Dynamic Clusters, the Web-based, API, and SNMP
administration interfaces present the entire cluster as a Single Server Image. Cluster management is as simple as single-server management, as administrators connect to any cluster member and view/modify cluster-wide settings. Dynamic Cluster Speed, Stability and Growth CommuniGate Pro s unique Dynamic Clustering functionality sets it apart from other solutions by allowing organizations to provide fast, reliable, advanced messaging services. The multi-server architecture always appears as one seamless messaging system to end-users and administrators. Currently Stalker Software s CommuniGate Pro is the only e-mail solution on the market providing Dynamic Cluster architecture based on Account Level Synchronization. For organizations requiring 99.999% uptime and unlimited growth, it is the best choice. For more information, visit Stalker Software Inc. on the Web at http://www.stalker.com or contact them at (800) 262-4722 or (415) 383-7164.