1 A Performance Study on Internet Server Provider Mail Servers Jun Wang Computer Science & Engineering Department University of Nebraska Lincoln Lincoln, NE Yiming Hu Department of Electrical & Computer Engineering and Computer Science University of Cincinnati Cincinnati, OH Abstract This paper presents a comprehensive performance study on Internet Service Provider (ISP) mail server, which plays an important role in Internet-based distributed computing. By feeding the SPECmail1 benchmark into an ISP mail server testbed set up by a commercial mail server software system MDaemon 5.0.1, we study both networking and I/O performance by varying its user population from to 10,000. The benchmark utilities are adopted for networking analysis while file system traces are collected for I/O measurement. Based on the benchmark study and offline trace analysis, we arrive at several important conclusions for ISP mail servers and give corresponding technical suggestions to improve the performance. First we observed that, in SMTP and POP sessions, the initial network connection setup step usually takes a very long time. Second, I/O latencies typically contribute to 40 55% of the total data transfer time in requests, especially in a server with large user population support. Third, A group of messages will easily make a remote recipient server become overloaded. 1. Introduction Mail servers play an important role in information society today. As Internet user population continues to grow exponentially, there is an increasing demand for high performance mail servers. Internet Service Providers (ISP) provide a majority of such mail services. Unfortunately, few results have been published on studying and analyzing mail servers. One possible reason may be that, it is difficult and impossible to explore real life mail service systems for privacy and security concerns. In order to understand the impact of modern computer systems architecture on the ISP mail server performance, we set up an experimental framework and conducted a performance evaluation on a state-of-the-art ISP mail server, specifically in networking and I/O. During the experiments, we chose SPECmail1  as a standard mail service benchmark and made some modifications to emulate today s real-life service environment. The enterprise version of MDaemon  is installed as the ISP mail server under test. systems can be evaluated in different dimensions such as performance and scalability (in terms of and per user data storage limits), data persistence and fault-tolerance, etc. In this paper we emphasize the aspect of performance. This paper makes the following contributions. First, we developed an experimental framework to study the ISP mail server performance. A detailed picture of network behavior and I/O performance of ISP mail servers is clarified. Second, we conducted a comprehensive performance analysis on ISP mail server and gave technical suggestions on how to improve the performance. Specifically, we have arrived at some major observations and propose some schemes for performance optimizations. 2. Evaluation Methodology 2.1. The Mail Server and Benchmarks There are many types of mail server systems from both academia and industry on the market. We chose one of the most popular commercial products, MDaemon from Alt-N Technologies Inc. , as the major ISP mail server under test. MDaemon product provides both ISP and enterprise services. It is a versatile mail server which supports POP3 , SMTP  and IMAP  protocols. Its friendly configuration interface allows us to flexibly vary the system input parameters like the maximum number of concurrent. As the first stage in the experiments, we installed MDaemon as an ISP mail server in the Windows 0 Operating Systems. Currently there are very few standard mail server benchmarks. We chose SPECmail1 as the major mail server benchmark, which is developed by one of the leading benchmark software vendor SPEC organization. SPECmail 1 is a standard mail server benchmark de-
2 signed to measure a system s ability to act as a mail server serving requests, based on the Internet standard protocols SMTP and POP3. This benchmark characterizes throughput and response time of a mail server system under test based on realistic network connections, disk storage, and client workloads. The benchmark design focuses on ISP mail servers rather than enterprise class of mail servers, with an overall user count in the range of approximately 100 to 1,000,000. The goal of SPECmail1 is to enable objective comparisons among different kinds of mail server systems. We study only POP3 and SMTP protocols in this paper since SPECmail1 benchmark does not incorporate IMAP workload at present. To emulate the real life today s service in ISP mail servers, we modify the default values set in SPECmail1 based on empirical data. For example, messages sent per user, per day is increased to 10 messages from original 2 and messages received per user, per day is increased to 10 messages from original 2. One reason to increase the number of messages comes from spam/junk messages. The percentage of using 56 K bit modems to establish Internet connection is reduced from 9 to. The detailed workload characteristics are shown in Table 1. Parameter Workgroup Estimate messages sent per user, per day 10 average recipients per message 10 messages received per user, per day 4 mailbox checks per user, per day 10 % of mailbox checks which 75% don t download messages average message size 25 KB % of using modems (56 Kbit) % of mail to/from 70 % remote addresses Table 1. The Message Traffic Characteristics Filemon/EE (i.e, Enterprise Edition), a commercial software developed by System Internal Inc., is chosen as a file system trace collector that can monitor the file system activity in Windows NT/0 OS . It runs as a log process working in the background without affecting normal system behavior. Every file system event is recorded in a log buffer and written to the disk later soon Experimental Framework We set up an experiment framework in a switched Fast Ethernet (100 Mbps) Local Area Network. Three machines from a same small-scale cluster are used to emulate Benchmark Manager Load Generator Intel Pentium III 800 MHz, 512 MB RAM, RedHat Linux 6.2 SMTP Sink Server POP3 Retrieval, 100Mb/s SMTP Store, 100Mb/s Mdaemon Mail Server 5.0 Intel Pentium IV 2.0 GHz, 2 GB RAM, Windows 0 Filemon/EE Tracer SMTP Relay, 100Mb/s Figure 1. Experiment Framework servers and in experiments. The isolated local area network with a firewall protection could prevent the disturbance from the external network traffic. One Dell 4500 PC, with a Pentium IV (2 GHz) CPU and 1 GB SDRAM main memory running Windows 0, acted as the ISP mail server under test. Other two PCs with Pentium III (800 MHz) CPUs and 512 MB SDRAM memory functioned as both local clients and remote clients. In addition, one of them worked as a remote mail server (also called sink server). The experimental system architecture is shown as in Figure 1. The MDaemon ISP mail server system is running to provide mail service all the time during experiments. Two load machines run separate load generator processes and initialize clients to communicate with the ISP mail server. The benchmark management process, which synchronizes two load generators based on pre-configured parameters, is running on either of two load machines before the experiments start. Here are the major operations in the experiments: POP Session (Uplink). Clients retrieve s from the mail server. SMTP Store (Downlink). Both internal (i.e, from local area network ) and external (i.e, from remote mail servers) send s to the ISP mail server. These requests are generated by two load generators. In SPECmail1, we assume 9 of messages are sent from external. SMTP Forward (SMTP Relay). The sink server receives s from the MDaemon ISP mail server. In SPECmail1, we assume that 9 of the messages to send are going to external (i.e. are forwarded to remote mail servers). The maximum number of (user population) a mail server can support is the best metric to study scalability of a mail server. How much load the SPECmail1 benchmark and its transactions generate are determined by this number. During the experiments, we varied the user population
3 from to 10,000 and studied both network and I/O performance of ISP mail server with different user population. For performance study, we modified some parts of SPECmail1 source codes to produce a set of statistical results on networking process. Since Windows 0 does not provide tools to measure I/O storage subsystem performance, we used a commercial software called Filemon/EE to collect file system traces on the mail server side. We can study I/O performance of the mail server in detail by analyzing the file system traces collected. To guarantee experiments run correctly and traces are collected successfully, we carefully validate all benchmark results and file system traces we collected, which satisfy all the factors with respect to the quality of service, delivery time and remote delivery defined in SPECmail 1. In addition, we installed and ran Filemon/EE in the mail server system to collect file system traces for I/O performance analysis. 3. Performance Evaluation and Analysis In this section, we analyze networking and I/O performance based on benchmark results and analysis of file system traces Networking Performance Networking performance is a very important factor in ISP mail server. To study the network implications on ISP mail server performance, we analyze networking traffic in two directions, between ISP mail server and, and between a local mail server and remote mail servers Effects of The Number of Users ISP mail server performance is very sensitive to the number of. To study the service capacity of ISP mail server system, we varied the number of and collected the experimental results in every benchmark run. Table 2 shows the detailed results. In Table 2, we include results for two different POP sessions: those download mails from the server and those do not. The latter one is a common case in which mail clients contact the mail server to check for new mails but do not find any, and derive from a POP Retry session. We can see that network traffic (the number of attempts) in ISP mail server becomes heavier as the number of is increasing. Also, the average transaction latency is increasing. Because the maximum concurrent number of threads in SMTP (or POP) outbound (or inbound) sessions in MDaemon is limited to 500, ISP mail server would most likely meet with a network congestion problem when it serves more at the same time, especially exceeding a threshold (2,000 in this paper) The Breakdown of SMTP sessions Because SPECmail1 only tells part of breakdown results in SMTP and POP sessions (mean, minimum and maximum), we modified and added a few lines of Java source codes (around 100 lines) in SPECmail1 to collect timing issues for each network transaction, which allows us conduct an in-depth analysis. Based on our own tools and benchmark utilities, we collected latency results on every phase of SMTP store session. SMTP rely session has similar results and is not included here. Figure 2 shows the latency result in each phase as well as its percentage of the total time in a SMTP session. The SMTP Data phase, in which the body text is transferred to the server, is the most time-consuming part. This phase takes almost 3 of the total latency. Surprisingly, the initial network connections, including SMTP Connect and SMTP Hello, are also very expensive. Both steps together take more than 3 of the total session time. The remaining steps, including SMTP Mail From and SMTP Quit, have much shorter latencies than that in other phases. A long SMTP Hello phase may be related to the expensive TCP/IP protocol, handling TCP SYN and TCP ACK signals. The expensive SMTP Connect phase derives from a fact that s involve a more complicated protocol (with more round trips) than the simple Web service (i.e., HTTP protocol). Hence SMTP connections stay around with a longer time than that of simple data transfers. One example is that, each message may get sent to one or more recipients. SPECmail1 configures 86% messages sent to single recipient while others sent to up to 30 recipients. The SMTP-sender and SMTP-receiver would even negotiate with several recipients if some messages get rejected. An intricate SMTP protocol results in more overhead in network communications. We suggest that the initial network overhead of the mail server can be reduced by grouping and reorganizing techniques. Since the s are not real time, we can adaptively schedule the sending sequence for messages within a time window frame based on their destinations and priority rates. Messages from the same user or sent to the same remote server (even not the same user) can be packed up together into a super message. We do not delay messages that are in high priority status. When we need to send multiple small messages to a same remote server, a single packed super message will be formed and sent only once to the remote server rather than in many separate times. This strategy saves many unnecessary initial overhead even for different. The implementation issue is also easy. We can either allow SMTP protocol to incorporate a pair of message pack and unpack operations, or develop an application program to encapsulate and decapsulate messages in a predefined format. During the experiments, we found a few error messages like the SMTP Sink Server is busy, try again later when running the peak load benchmark. The reason is that, if all outbound SMTP s are directed to a same sink server (simulating a remote server) within a short period,
4 User SMTP Session POP Session with Downloads POP Session without Downloads Number Attempts Mean Mean Attempts Mean Mean Attempts Mean Time(ms) Size(KB) Time(ms) Size(KB) Time(ms) , ,000 2, , ,000 4, , , ,000 9, , , Table 2. The Statistics of Network Transactions. Latency(ms) (a) SMTP Session Breakdown SMTP Connect SMTP Mail From SMTP Data SMTP Hello SMTP Rcpt to SMTP Quit (b) The Breakdown of SMTP sessions SMTP Connect SMTP Hello SMTP Mail From SMTP Rcpt to SMTP Data SMTP Quit 0 Figure 2. The Breakdown of an SMTP session the sink server easily becomes overloaded. This phenomena is quite common in real life services. Most of outbound SMTP s from an ISP mail server will go to remote mail servers rather than to individual PCs, and therefore making a remote recipient server become overloaded. It may be a possible solution to solve this problem by introducing a load negotiation and balancing protocol among mail servers in a wide-area network. This protocol helps reorder the sending sequence of outgoing s in one or multiple mail servers if excessive messages will go to a same remote mail server in a short time. For example, since the can be delayed in a reasonable time (This does not include priority s.), we could arrange those senders to send their messages to the remote mail server based on a round robin rule in case the sender detects that its one-time outbound traffic is too heavy The Breakdown of POP sessions Figure 3 and Figure 4 show the breakdown results for POP sessions. In POP sessions with download, the POP Retrieve phase takes the longest time (about 3) to transfer body text to. POP Delete, the second most expensive phase, takes about 20 23% of the total time. This is because message deletions are very expensive file I/Os in native file systems. Other phases have relatively short latencies. In POP sessions without download, the latencies of all phases are similar. The reason is that, POP session is a much simpler mail retrieval protocol compared to SMTP protocol. It uses the ASCII standard as the message transfer format without any marshal/unmarshal operations at all I/O Performance Since servers are I/O intensive applications, it is important to understand in what role I/O plays for ISP mail server systems. A typical ISP mail server has frequent file I/Os involved to maintain a large number of messages. Incoming new s result in file creations while retrieval requests generate file reads and file invalidations. Outbound (sending/forwarding) s lead to file reads and file creations for saving message copies on disk. It is expected that a large user population may easily result in more I/O traffic. In experiments, MDaemon server adopts NTFS as its mail store. It creates a specific directory for every user and a unique file to store each . SPECmail1 generates workloads that are initialized by the number of
5 Latency(ms) (a) POP Sessions Breakdown without Downloads POP Connect POP Password POP Retrieve POP User ID POP Status POP Delete (b) The Breakdown of POP Sessions without Downloads POP Connect POP User ID POP Password POP Status POP Retrieve POP Delete 0 Figure 3. The Breakdown of POP session without mail downloads Latency(ms) (a) POP Session Breakdown with Downloads POP Connect POP Password POP Retrieve POP User ID POP Status POP Delete (b) The Breakdown of POP Sessions with Downloads POP Connect POP User ID POP Password POP Status POP Retrieve POP Delete 0 Figure 4. The Breakdown of POP session with mail downloads. Such workloads obey a distribution reflecting an average message count of 3.43 messages per mailbox. The average message count per mailbox is a base figure for estimating the size of the mail store. In a mail server experiment with ten thousand support, the total message count in the mail store (average over time) is 34,300 and the overall raw message data volume is around 860 MB. For every user population from to 10,000, we collected a corresponding file system trace during the benchmark run. Total I/O traffic observed in ISP mail server scales proportionally to the number of. In addition, mail server has both file read and file write intensive workloads, with the rate around 3:1. Small messages result in many small file write operations The Breakdown of Mail Server Response Time In order to compare the roles that networking and I/O (i.e., file I/O in this paper) play in ISP mail server, we explored the breakdown of total response time for every request. To figure out the average file I/O response time, in both SMTP and POP sessions, we went through file system traces, and calculated the latency of file operations by checking their corresponding file events, such as file open, file write, file read, file close etc. Figure 5 presents the results of the percentage of I/O latency among data transfer portions of SMTP and POP sessions, including SMTP Data, POP Retrieve and POP Delete. Other phases such as SMTP Connect, SMTP Mail From, POP Connect, etc. are not included because there are no I/O operations involved. We can see that I/O part places a more impact on server performance as the user population is growing. When the user population exceeds 5,000, I/O becomes the major performance bottleneck. For an ISP mail server to support 10,000, the I/O latency takes 40 55% of data transfer time in both SMTP sessions and POP sessions with download. For POP sessions without download, since there is no file to be read or deleted, the I/O overhead remains relatively small.
6 (a) age of I/O Overhead in SMTP Data Phase (b) age of I/O Overhead in POP Retrieve and POP Delete Phases with Downloads (c) age of I/O Overhead in POP Retrieve and POP Delete Phases without Downloads Network + CPU Latency(ms) 0 I/O Latency(ms) Network + CPU Latency(ms) 0 I/O Latency(ms) Network + CPU Latency(ms) 0 I/O Latency(ms) Figure 5. The Breakdown of Mail Server Work Sessions 4. Related Work There are a limited number of research papers published on mail servers. Recently Behren et al. described a workingin-progress project called NinjaMail, a high-performance clustered, distributed system . It used a collection of clusters distributed through a wide area to provide thousands of with highly available and scalable services. Saito et al. described the motivation, design and performance of Porcupine, a scalable mail server . The goal of Porcupine was to provide a highly available and scalable electronic mail service using a large cluster of commodity PCs. Their focus was on dynamic load balancing, automatic configuration, and graceful degradation in the presence of failures. They used duplicated services that were distributed homogeneously and dynamically across nodes in a cluster. Christenson et al. presented a highly scalable electronic mail service using Open Systems in EarthLink . They used the Network Appliance family of servers  as the network file storage to achieve good I/O performance, high reliability and easy maintenance. The above papers all focused the design and implementation of the distributed mail servers. To the best of our knowledge, we are unaware of published results on performance evaluation of mail servers. 5. Conclusions In this paper, we study the performance of ISP mail server with a wide range of user populations support, especially in networking and I/O fields. We used SPECmail1 as a standard benchmark to evaluate a mediumscale ISP mail server performance. We also collected file system traces via a Filemon/EE tracer for I/O analysis. By analyzing benchmark results and file system traces, we arrive at several important conclusions: 1) In SMTP and POP sessions, the initial network connection setup and the first phase (e.g., SMTP Hello), usually takes a very long time. 2) I/O latencies contribute up to 40 55% of the data transfer time in requests, especially in a server with large user population support. Such a high I/O overhead would seriously limit the scalability of ISP mail servers. 3) A group of messages may easily make a remote recipient server become overloaded. References  Alt-N Technologies, Inc. Mdaemon versatile server, 1.  Nick Christenson, Tim Bosserman, and David Beckemeyer. A highly scalable electronic mail service using open systems. In Proceedings of the USENIX Symposium on Internet Technologies and Systems (ITS-97), pages 37 48, Berkeley, December USENIX Association.  D. Hitz, J. Lau, and Malcolm. File system design for an nfs file server appliance. In Proceedings of the USENIX 1994 Winter Technical Conference, pages , San Francisco, CA, Jan  J.Postel. Simple Mail Transfer Protocol, Internet RFC 821, Aug  M.Crispin. Internet Message Access Protocol, Internet RFC 2060, Dec  J. Myers and M.Rose. Post Office Protocol Version 3, Internet RFC 1939, May  Yasushi Saito, Brian N. Bershad, and Henry M. Levy. Manageability, availability, and performance in Porcupine: a highly scalable, cluster-based mail service. ACM Transactions on Computer Systems, 18(3): , 0.  Standard Performance Evaluation Corporation. Specmail1, Jan. 1.  System Internal Inc. Filemon/EE Manual, Sep  J.R. von Behren, S. Czerwinski, A. D. Joseph, E. A. Brewer, and J. Kubiatowicz. Ninjamail: The design of a highperformance clustered, distributed system. In P. Sadayappan, editor, Proceedings of International Workshops on Parallel Processing 0, volume pp , Toronto, Canada., August