1 This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier s archiving and manuscript policies are encouraged to visit:
2 Computer Networks 55 (211) Contents lists available at ScienceDirect Computer Networks journal homepage: Understanding data center network architectures in virtualized environments: A view from multi-tier applications q Yueping Zhang a,, Ao-Jan Su b, Guofei Jiang a a NEC Laboratories America, Inc., Princeton, NJ 854, USA b Northwestern University Evanston, IL 628, USA article info abstract Article history: Received 27 April 21 Received in revised form 23 December 21 Accepted 2 March 211 Available online 11 March 211 Responsible Editor: H. De Meer Keywords: Data center networks Multi-tier systems Server virtualization Application performance In recent years, data center network (DCN) architectures (e.g., DCell , FiConn , BCube , FatTree , and VL2 ) received a surge of interest from both the industry and academia. However, evaluation of these newly proposed DCN architectures is limited to MapReduce or scientific computing type of traffic patterns, and none of them provides an in-depth understanding of their performance in conventional transaction systems under realistic workloads. Moreover, it is also unclear how these architectures are affected in virtualized environments. In this paper, we fill this void by conducting an experimental evaluation of FiConn and FatTree, each respectively as a representative of hierarchical and flat architectures, in a clustered three-tier transaction system using a virtualized deployment. We evaluate these two architectures from the perspective of application performance and explicitly consider the impact of server virtualization. Our experiments are conducted in two major testing scenarios, service fragmentation and failure resilience, from which we observe several fundamental characteristics that are embedded in both classes of network topologies and cast a new light on the implication of virtualization in DCN architectures. Issues observed in this paper are generic and should be properly considered in any DCN design before the actual deployment, especially in mission-critical real-time transaction systems. Ó 211 Elsevier B.V. All rights reserved. 1. Introduction Driven by the recent proliferation of Cloud services and trend of consolidating enterprise IT systems, data centers are experiencing a rapid growth in both scale and complexity. At the same time, it becomes widely recognized that traditional tree-like structures of data center networks (DCN) encounter a variety of challenges, such as limited server to server connectivity, vulnerability to single point of failure, lack of agility, insufficient scalability, and resource fragmentation . To address these problems, in the recent two years several network architectures q A shorter version  of this paper appeared in IEEE IWQoS 21. Corresponding author. addresses: (Y. Zhang), northwestern.edu (A.-J. Su), (G. Jiang). [1,8,1,11,14,16] have been proposed for large-scale data centers and gained a significant amount of attention from both industry practitioners and the research community. In general, all existing DCN architectures can be classified into two categories, hierarchical and flat. The former is represented by the conventional tree topology (followed by recent proposals such as DCell , FiConn , and BCube ) and employs a layered structure that is constructed recursively from lower-level components. The latter (e.g., FatTree  and VL2 ) on the contrary organizes all servers into the same level and interconnects L2 switches using certain topologies (e.g., the Clos networks ). All these newly proposed DCN architectures are shown to significantly improve scalability, failure resilience, and aggregate network bandwidth over the conventional tree structure. However, all existing work assumes general-purpose underlying systems and does not explic /$ - see front matter Ó 211 Elsevier B.V. All rights reserved. doi:1.116/j.comnet
3 Y. Zhang et al. / Computer Networks 55 (211) itly consider performance of complex applications or interactions between different application components. In contrast, data centers may have very different traffic patterns based on the applications running on the top, e.g., searching/indexing, media streaming, cloud computing, or transaction systems. Moreover, there does not exist an experimental comparison of these DCN architectures for a better understanding of their characteristics in a common physical setting. Finally, an in-depth study of the impact of server virtualization on DCN architectures is also missing in the current picture. In this paper, we seek to fill the void by conducting an experimental evaluation of two DCN architectures, FiConn and FatTree (which are respectively a representative of the hierarchical and flat architectures, see Section 2), in a conventional mission-critical multi-tier transaction system. We first implement FiConn and FatTree in a fully virtualized testbed and then justify correctness of our implementation by comparing the experimental results to those obtained from a non-virtualized testbed. We then deploy RUBiS , a clustered three-tier ebay-like on-line auction system, on both FiConn and FatTree and examine application performance in different scenarios. We focus on two test cases, service fragmentation and failure resilience. In the study of service fragmentation, we mix all servers together instead of grouping servers (i.e., VM s in our case) belonging to the same service tier into the same physical machine (PM). The reason is twofold. On one hand, this allows sufficient network traffic to be injected into the network, thereby enabling us to study networking properties of the underlaying DCN architectures. On the other hand, this way, we are also able to examine, when service tiers are fragmented into various network locations, how application performance is affected by DCN architectures. Particularly, our results suggest that Fi- Conn is subject to performance degradation under service fragmentation (especially when CPU- and network-intensive components are collocated on the same PM) due to its server-centric design, while FatTree demonstrates higher resilience to this situation as a result of its flat server layout. The second testing scenario is failure resilience, where we examine performance of FiConn and FatTree after certain servers are down. Since both FiConn and Fat- Tree have multiple routing paths for each pair of nodes, traffic is rerouted through other paths upon server failures. Our results show that application performance is more subject to degradation in FiConn than in FatTree due to Fi- Conn s interference issue between routing and computing. However, FatTree s network-centric design does not come without expense. For instance, FatTree exhibits inefficient resource utilization in certain cases as a result of its routing behavior Contributions The main contributions of this paper can be summarized into the following three aspects. First, we perform an experimental comparison of newly proposed DCN architectures in a common system setting. Second, this paper employs a fully virtualized implementation and explicitly examines the impact of server virtualization on DCN architectures. Third, all experiments are performed in a cluster-based three-tier system where application performance (e.g., request throughput and response latency) is the major measurement metric. At the time of this writing, we believe that none of the above three points have been properly studied in existing literature and this paper is the first to address all of them simultaneously. Even though our experiments focus on FiConn and FatTree, most of our results can be generalized and applied as guidelines for designing any DCN architectures purported to be deployed in modern virtualized data centers hosting mission-critical real-time transaction systems. However, we understand that this work is by no means a comprehensive experimental study of all existing DCN architectures. Instead, we encourage readers to consider it as an initial step towards a deeper understanding of these architectures in a more realistic environment. The rest of the paper is structured as follows. In Section 2, we provide a brief review on existing data center architectures. In Section 3, we discuss the motivation for conducting this work and goals we seek to achieve in the paper. In Section 4, we provide details of system setup for our experiments. In Section 5, we present the experimental results and our main findings. Finally, in Section 6, we conclude this paper and point out directions for future work. 2. Background In this section, we briefly review the conventional and newly proposed data center network architectures. Based on how the system is constructed, we can classify existing DCN architectures into two categories, hierarchical and flat Hierarchical architectures In a hierarchical architecture, servers are arranged in different levels. A higher-level network is composed of multiple lower-levels components. Clearly, the most widely adopted hierarchical data center architecture is the conventional tree structure, in which the entire system can be constructed by recursive composition of sub-trees. Another hierarchical DCN architecture is DCell , which is proposed by Guo et al. and designed to improve DCN s robustness to link/server failures and scalability to a large number of servers. Similarly to the tree structure, DCell employs a recursive design, in which a level-k component DCell k is composed of (t k 1 + 1) level-(k 1) elements DCell k 1, where t k 1 represents the number of servers that are in a single DCell k 1. It is shown in  that the number of servers in DCell scales doubly exponentially to node level k. In addition, DCell is proven to be highly robust to link failures in that it has a bisection width lower bounded by 1/2t k lnt k of its embedded complete graph. A comprehensive study of DCell s structural and graph properties is available in .
4 2198 Y. Zhang et al. / Computer Networks 55 (211) Guo et al. further propose BCube , which uses several layers of mini-switches to connect different levels of BCube nodes. For instance, when constructing a level-k BCube, one needs to connect the kth port of the ith server in the jth BCube k 1 to the jth port of the ith layer-k switch. This way, BCube provides multiple parallel paths between any pair of servers, thereby improving server-to-server network capacity and fault tolerance. In both DCell and BCube, routing is performed by each server, which is equipped with multiple network ports for forwarding traffic. Specifically, each server in a DCell k or BCube k requires (k + 1) network ports. Using BCube-based containerized data centers, Wu et al.  develop an inter-container architecture called MDCube. Similar to DCell and BCube, MDCube also employs a server-centric design and implements routing and load balancing functionalities in servers. The last hierarchical topology in our list is FiConn , which shares the same recursive design principle as DCell and BCube, but uses only two ports for each server. Thus, FiConn benefits from reduced wiring cost, but as a direct consequence provides less aggregate network capacity and lower fault tolerance than DCell and BCube. FiConn also differs from DCell in that FiConn balances link usage at different levels, while in DCell links in higher levels will carry more traffic than in lower levels. Same as DCell and BCube, FiConn also achieves high scalability and excellent fault tolerance Flat architectures In contrast to hierarchical architectures, another class of DCN s utilize a flat organization of servers, which are placed into a single layer and are interconnected with switches. Network traffic routing and forwarding is conducted purely in switches. One representative of this class is the FatTree architecture introduced by Vahdat et al. . FatTree applies the Clos topologies originally designed for telephone networks and organizes commodity switches into a k-ary fat-tree, which constitutes a richly connected backbone for inter-server network communication. Using redundant switches and interconnections, each pair of servers in FatTree enjoy a separate routing path and therefore can communicate with each other at nearly the full speed of their local network interfaces. Based on the Fat- Tree architecture, the authors further develop a network addressing scheme called PortLand , which separates the MAC address and physical location of each server to improve agility of the entire system. Bearing the same design principle as FatTree, VL2  (or its predecessor Monsoon ) is another example of flat architectures. In VL2, interconnection topology of switches follows a folded Clos network  where intermediate and aggregation switches are respectively organized into either side of a bipartite graph. According to , if each aggregation and intermediate switch has D A and D I ports, respectively, the aggregate capacity between these two layers is D A D I /2 times of the link capacity. In the same vein of Port- Land, VL2 employs a directory system to realize separation of a server s network address and physical location. 3. Motivation The data center market is currently experiencing an exponential growth  due to consolidation of enterprise IT systems and proliferation of Cloud services (e.g., Amazon Web Services , Google Apps , and Microsoft Azure ). The core technology that is driving this trend is virtualization due to its appealing cost reduction capability, high agility, excellent flexibility, etc. On the other hand, the cost benefit and operational efficiency brought by data center consolidation in turn stimulates the development and deployment of virtualization technologies. According to , more than half of the new servers shipped in 29 are virtualized, and that number will hit 8% by 212. At the same time, however, virtualization also blurs the boundaries between physical resources and introduces an added layer of complexity in performance management of large-scale data centers. This is especially the case in mission-critical multi-tier transaction systems, where multiple layers of servers and a spectrum of services need to work together following an intricate application logic. However, all existing efforts on experimental understanding of DCN architectures [1,8,1,11,14,16] focus only on general-purpose systems under synthetic work load patterns with application contexts limited to MapReduce and scientific computing. None of them considers more realistic and complicated systems (such as multi-tier applications) or examine system performance from the applications perspective, both of which we believe are critical for understanding and developing any DCN architectures. Driven by this motivation, we conduct in this paper an experimental evaluation of DCN architectures in a fully virtualized testbed hosting a three-tier on-line auction system. Through this study, we attempt to answer questions such as (1) Will application performance be affected by DCN architectures, if so, how and why? (2) What are the fundamental differences between hierarchical and flat DCN architectures? (3) Are the DCN architectures of interest vulnerable to certain failure patterns? (4) Does server virtualization introduce any complications into performance of DCN architectures? 4. System architecture We start with an introduction of our system architecture and implementation methodologies Cluster-based three-tier transaction system In the paper, we conduct experiments in a three-tier transaction system running an ebay-like auction service RUBiS  as illustrated in Fig. 1. As seen from the figure, the system is composed of a workload generator and a three-tier processing system. We set up the RUBiS workload generator on a separate physical machine to avoid performance interference with the back-end servers. The generated workload is represented by a number of concurrent user sessions with a series of interactions following a pre-defined Markov transition matrix. RUBiS client sessions are implemented in light-weight Java threads and
5 Y. Zhang et al. / Computer Networks 55 (211) Fig. 1. Three-tier Web service. therefore are able to emulate a large set of concurrent endusers performing a rich variety of actions of an auction web client, such as registration, bidding, searching, browsing, and leaving feedbacks. These end-user requests are further converted by the three-tier servers into different operation sequences in the back-end according to the application logic. This three-tier architecture is a stereotype for many Web services that consist of a Web server layer, an application server layer, and a database layer. We implement each of the three tiers using a cluster of four virtual machines with certain load balancing mechanisms. First, we set up the Web server cluster with Apache  Web servers and the HAProxy load balancing Web cluster application . The master Web server dispatches HTTP requests to other Web servers using a round-robin mechanism. Second, the application server cluster is implemented with Tomcat 4.1  running RUBiS servlets. The front-end Web servers communicate with application servers via the Apache JServ Protocol (AJP) load balancing mechanism and connect to back-end database servers by the Java Database Connectivity (JDBC) protocol. Third, the database cluster consists of one MySQL database server and three storage nodes. Data are evenly partitioned and distributed onto the three storage nodes to avoid database hotspots Virtualization We employ a fully virtualized implementation of DCN architectures. Specifically, for each physical machine, we install a CentOS bit operating system and Xen 3.1 hypervisor . We note that conclusions obtained in the paper are generic and should hold for other hypervisor implementation. All physical machines are equipped with Intel Core 2 Duo 2.33 GHz CPU and 4 GB RAM. In addition, we install on each physical machine an Intel quad-port Gigabit network interface card to provide abundant link capacities and isolated network channels for the VM s. For the VM s, we obtain CentOS 5.3 VM images from stacklet.com  and install appropriate applications according to the role of the VM s in the three service tiers as discussed in the previous section Monitoring system RUBiS is equipped with a distributed monitoring system called Ganglia , which installs a daemon process called GMond on each server and streams real-time monitoring data (e.g., CPU, memory, network, and disk I/O) of the servers to a central visualization server. However, these data are rendered in XML and directly exported without maintaining a local copy in each server. There is no clear way for us to obtain the raw monitoring data and conduct relevant analysis, such as correlation between time series and deep inspection of network packets. Therefore, Ganglia cannot be directly used for our purpose. Instead, we instrument our own monitoring systems by combining standard utilities such as tcpstat, vmstat, and TCPdump. For the client, we just utilize the tools provided by RUBiS to monitor various application QoS metrics, such as response time, request throughput, and loss rate FiConn setup We implement two DCN architectures, FiConn and Fat- Tree, which are respectively a representative of the hierarchical and flat DCN topologies. Fig. 2(a) depicts a nonvirtualized setup of a level-1 FiConn, which is composed
6 22 Y. Zhang et al. / Computer Networks 55 (211) Fig. 2. FiConn testbed setup. of three level- FiConn s each consisting of four servers with the same specifications as described in Section 4.2. These four servers are bridged together via a mini switch and two of them also serve as routers for relaying network traffic traveling to/from the other two level- FiConn s. Fig. 2(b) is an illustration of our implementation of this level-1 FiConn in a virtualized testbed. As seen from the figure, we use three physical machines (PM) to implement the three level- FiConn. Each PM hosts four virtual machines and utilizes three Gigabit Ethernet ports, one of which serves as the network bridge for inter-connecting the four VM s and the other two serve as router ports. The three PM s are directly connected with each other using CAT 6 cross-over cables. We implement FiConn s routing algorithm using a statically configured IP routing table in each VM FatTree setup We employ a 12-server FatTree topology as shown in Fig. 3(a). These 12 servers are grouped into three Pod s, each of which is connected to the Core layer via a group of four inter-connected Pod switches. The Core layer also has four switches, each of which is directly connected to all the three Pod s. Our virtualized implementation of this FatTree topology is illustrated in Fig. 3(b) and is similar to FatTree s original implementation presented in . Specifically, each Pod is contained in a single physical machine. The Pod switches and the corresponding two-level routing algorithm are implemented using the Click modular router  with user-level configurations. The physical machines are equipped with four Gigabit Ethernet ports, each of which is assigned to one of the four up-facing links connecting the Pod and Core switches. The four Core switches are implemented as four separate VLANs on a single 24-port Gigabit switch. Based on the above implementation, we next present an experimental evaluation of FiConn and FatTree in our virtualized testbed hosting the three-tier online auction system RUBiS. 5. Experimental evaluation 5.1. System validation Before conducting further evaluation, we first need to ensure that our virtualized implementation does not introduce undesirable artifacts and is able to faithfully reveal characteristics of the original DCN architectures. Towards this end, we implement FiConn and FatTree in a separate non-virtualized testbed. Specifically, our non-virtualized implementation of FiConn exactly follows the settings illustrated in Fig. 2(a) where each server corresponds to an actual physical machine. Similarly, our non-virtualized implementation of FatTree is identical to that shown Fig. 3(a) except for the Pod switches (i.e., edge and aggregation switches), which in our case are implemented using the Click modular router due to unavailability of programmable hardware switches. Then, we conduct on both the virtualized and physical testbeds the same set of experiments, which is composed of seven experiments with workload generated by 16, 32, 64, 128, 256, 512, 124 clients, respectively. In each experiment, we measure the average aggregate request throughput of all clients and average response time of all (around 27) types of runtime transactions, including Register, Browse, ViewItem, PutBid, and Search. We also calculate standard deviation of throughput and response time to capture second-order dynamics of the measurements. Experiment results of FiConn and FatTree in both the virtualized and non-virtualized testbeds are shown in Fig. 4. As seen in Fig. 4(a), throughput of RUBiS in virtualized FiConn increases linearly (we note the logarithmic scale of the x-axis) with the number of clients up to 512 clients and drops for 124 clients. This drop is because the machine running 124 concurrent client threads experiences high CPU utilization and therefore is not able to generate requests at a higher frequency. This is also justified by a similar behavior observed from the non-virtualized testbed as shown in the figure. End-to-end response time also exhibits a similar trend, where the response latency exhibits a nonlinear jump when the number of clients
7 Y. Zhang et al. / Computer Networks 55 (211) Fig. 3. FatTree testbed setup. Throughput (req/sec) Throughput virtualization Throughput non virtualization Response time virtualization Response time nonvirtualization Response time (ms) Throughput (req/sec) Throughput virtualization Throughput non virtualization Response time virtualization Response time nonvirtualization Response time (ms) Number of clients (threads) Number of clients (threads) Fig. 4. Implementation validation.
8 222 Y. Zhang et al. / Computer Networks 55 (211) change from 512 to 124. Compared to throughput, the high standard deviation of response time is due to the nature of different types of requests, some of which are simple front-end operations (e.g., Browse) and some require complicated back-end processing (e.g., Search). From Fig. 4(b), we can see that FatTree behaves similarly to FiConn under different number of clients. We note that under large numbers of clients (i.e., 512 and 124), FatTree exhibits lower throughput and longer response time as compared to Fi- Conn. This phenomenon is mainly a result of a fundamental property of the FatTree architecture. We delay a detailed discussion of this issue until Section Moreover, as seen from both figures, FiConn and FatTree produce consistent behavior in the virtualized and non-virtualized testbeds, thereby demonstrating that our virtualized implementation does not introduce fundamental artifacts or performance overhead and therefore is able to faithfully capture properties of DCN architectures with non-virtualized implementation. As virtualization becomes the driving technology of today s data centers, we focus in this paper on evaluating DCN architectures in fully virtualized environments. Thus, in the following two subsections, we examine FiConn and FatTree in more testing scenarios using only the virtualized implementation. In Section 5.4, we summarize our findings with a discussion of the impact of server virtualization on DCN architectures FiConn experiments We start with FiConn and evaluate it in two testing cases, service fragmentation and failure resilience. 1 We keep this naming for ease of reference and note that it may not be the ideal placement for other application settings or network architectures Service fragmentation As mentioned in Section 4, a level-1 FiConn includes three level- FiConn s, each consisting of four servers. At the same time, from the perspective of application setup, we organize the 12 servers into three tiers (i.e., Web, application, and database), each of which also has four servers. Clearly, performance of the system is optimized if the three tiers are exactly mapped to the three level- FiConn s, i.e., placing the three clusters of Web servers, application servers, and database servers in FiConn , FiConn , and Fi- Conn , respectively. This is because this way intra-tier network traffic (which is oftentimes data-intensive, such as in the database layer) is delivered by efficient memory swapping between VM s residing on the same physical machine. Experiments performed in the previous subsection (see Fig. 4) follow this VM placement scheme, which we call ideal placement 1 for ease of reference. In practice, however, it is very difficult to guarantee that servers with intensive communication are always placed close to each other. This is especially true in Cloud environments where applications arrive/depart dynamically and resources are allocated/recycled on-demand. In such environments, servers with intensive communication or belonging to the same service tier are very likely to be scattered into different network segments. We call this situation service fragmentation and are interested in understanding its impact on different DCN architectures. To achieve this goal, we mix together servers of different service tiers and rerun experiments conducted in Fig. 4(a). We plot results for 256, 512, and 124 clients in Fig. 5(b). Comparing these results with those obtained from ideal placement illustrated in Fig. 5(a), it is clear that under heavy workloads (i.e., 512 and 124 clients), throughput in FiConn exhibits severe fluctuations and low average value under service fragmentation than under the ideal placement. The underlying cause is twofold. First, when VM s belonging to the same service layer are placed onto different physical machine, a significant amount of communication traffic is converted from low-overhead intra-pm memory copying to high-overhead inter-pm network I/O, which slows down the system s performance. Second, placing multiple resource-consuming VM s onto the same PM may lead to contention problems. Particularly, in our setting, two database servers and one Web server are placed on a single physical server. Both types of servers are very CPU-intensive and are subject to competition of CPU under large numbers of concurrent clients. This situation is illustrated in Fig. 6, where under 512 clients, both the Web and database servers experience CPU saturation, which slows down processing of Web requests and database queries and in turn translates into throughput fluctuation depicted in Fig. 5(b). Moreover, for 512 clients, the Web server and database server exhibits transition of CPU utilization from around 1% to 1% at time 8 s and 12 s, respectively. This shift is a result of behavior of RU- BiS clients, which usually start with light actions (such as accessing the home page and browsing items) and then switch to operations involving heaving back-end processing (such as searching and bidding) Failure resilience A common feature of both FiConn and FatTree is that there exist multiple routing paths between every pair of nodes. As a result, both architectures provide a certain resilience to link/node failures. In this subsection, we examine FiConn s robustness to such failures. First, we take a closer look at how routing is realized in virtualized Fi- Conn. Each level- FiConn has three active Ethernet interfaces, eth1, eth2, and eth3. Interface eth1 is a dummy port used as a network bridge in Xen s Dom to interconnect the four VM s. The other two ports, eth2 and eth3, are actual routing ports for forwarding traffic to/from the other two level- FiConn s. Each routing port is associated with one VM, which we call a routing node. InFiConn , for instance, the two routing nodes are VM s A and C. Everytime when a packet generated by any VM in Fi- Conn  needs to go to a server in FiConn , it first has to go to VM A, then to a routing node in FiConn  via eth2, and finally to its destination server. Based on this understanding, we examine FiConn s performance in two failure patterns as demonstrated in Fig. 7. Specifically, we construct the first failure scenario (which we call Case I) by bringing down a non-routing node B (which is a Web server in our case) in FiConn . As a consequence of this action, Web requests will be evenly distributed to the other three Web servers; but traffic
9 Y. Zhang et al. / Computer Networks 55 (211) Clients Clients Clients Clients Clients Clients Fig. 5. FiConn service fragmentation test. between FiConn  and FiConn  will still go through VM A. In the second failure scenario (which we call Case II), we bring down the routing node A. As a consequence, traffic traveling from FiConn  to FiConn  is no longer able to directly go through interface eth2 due to failure of its corresponding routing node. Instead, it has to take a detour by first going to VM C, then to VM D in FiConn  through eth3, then to the other routing node, and finally to FiConn . The experiment results in both testing cases are given in Fig. 8. It can be easily seen from the figures that throughput of FiConn under 512 and 124 clients is much lower in Case II than in Case I. This is because, in Case II, failure of node A leads to traffic redirection and significantly increases the load on the other routing node C. Under larger numbers of concurrent clients, the increased traffic load saturates the routing node and slows down performance of the entire system. The effect of this traffic redirection can be seen from Fig. 8(d), where network traffic encountered by node C in Case II is almost four times of that in Case I. We note that FiConn s performance degradation demonstrated in this section is due to its routing behavior. The impact of the application-level role (e.g., Web or database server) of the failed node is limited to the amount of network traffic associated with the node FatTree experiments We next rerun the above experiments in FatTree Service fragmentation Similar to FiConn, the ideal placement of FatTree is defined as a setting where servers belonging to the same service tier (i.e., Web, application, and database) are placed in the same Pod. Accordingly, service fragmentation occurs when these servers are blended into different Pod s. Plots in Fig. 9 illustrate FatTree s performance in the ideal placement and service fragmentation under 256, 512, and 124 clients. There are two observations that can be obtained from the plots. First, comparing Fig. 9(a) and (b), it is evident that these two server placement schemes does not impose a significant impact on application performance in the FatTree architecture. This is in contrast to FiConn as shown in Fig. 5 (or grey curves in Fig. 9), which exhibits
10 224 Y. Zhang et al. / Computer Networks 55 (211) Fig. 6. Resource contention of FiConn under service fragmentation. Fig. 7. Failure patterns of FiConn. a noticeable performance degradation due to resource contention between VM s under service fragmentation. This difference is because, compared to FiConn, FatTree servers are relatively lightly loaded due to isolation of computing and routing and are less prone to resource contention. Second, in the ideal placement Fig. 9(a), FatTree yields lower throughput under 512 and 124 clients than FiConn.
11 Y. Zhang et al. / Computer Networks 55 (211) Case I Case II Case I Case II Case I Case II Bandwidth (Mb/s) Case I Case II Number of clients (threads) Fig. 8. Failure resilience of FiConn. The reason for this throughput degradation is twofold. On the one hand, FatTree s two-level routing algorithm implemented using Click-based emulation incurs additional processing overhead, which however is very small. On the other hand, this overhead is amplified by the routing behavior of FatTree, as explained below. In FatTree, all network traffic is first aggregated at and processed by Pod switches, which differentiate intra- and inter-pod traffic by maintaining a list of terminating prefixes for subnets contained in the same Pod. If a packet belongs to intra- Pod traffic, it is directly sent to the destination by the Pod switches; otherwise, it is forwarded to Core switches for further processing. This scheme works well in physical environments, but may have challenges in virtualized networks. For instance, if a packet is destined at another VM residing on the same PM, it has to travel up to the Pod switches and then come back, even though a much more efficient path (i.e., memory swapping within a PM) exists. Thus, an extra amount of traffic is introduced into the network, resulting in increased network load, reduced goodput, and prolonged response latency. In comparison, in FiConn, VM s collocated in the same PM are connected via a network bridge and are able to directly communicate with each other. This explains high utilization of the Click router and reduced application throughput observed in FatTree Failure resilience We next proceed to examine FatTree in two failure scenarios. Different from FiConn, FatTree does not have the concept of routing node and routes network traffic purely through switches. Thus, we cannot reproduce in FatTree the same two failure patterns as shown in Fig. 7. Instead, we create two scenarios each with failure of a single node. In the first case, we bring down one of the four Web servers. Then, all Web requests will be evenly distributed to the other three Web servers. Since these Web servers are all lightly used, we expect application performance in this case should not significantly deviate from that illustrated in Fig. 9(a). Experiment results shown in Fig. 1(a) justifies our speculation. In the second case, we bring down one database server, which is more data-intensive than a Web server and therefore more likely to introduce performance bottlenecks. As demonstrated in Fig. 1(b), application throughput in this case indeed exhibits more pronounced fluctuations. These fluctuations are a result of increased workload on the other three database servers and lookup failures of queries destined at records located in the failed database server. Despite this oscillatory behavior, the average throughput is maintained the same as that in Figs. 9(a) and 1(a), demonstrating FatTree s robustness to link/node faults. Similar observations can be obtained under failures of other nodes and are omitted for brevity Discussion We finish by summarizing conclusions drawn from above experiments, aiming to gain a clear understanding of what phenomena are because of fundamental properties of underlying DCN architectures and what are a result of virtualization.
12 226 Y. Zhang et al. / Computer Networks 55 (211) Clients Clients Clients Clients Clients Clients Fig. 9. Service fragmentation test of FatTree (grey curves are results of FiConn illustrated in Fig. 5) Server-centric vs. network-centric Our previous classification of DCN topologies into hierarchical and flat architectures is based on the way the network is constructed. As discussed in Section 2, based on how networking functionalities are realized, these two classes can also be mapped to server-centric and network-centric architectures, respectively. Specifically, in a server-centric design (i.e., FiConn, DCell, and BCube), each server is both a computing resource and network traffic relay node, while in network-centric architectures (i.e., FatTree and VL2), servers and network devices (e.g., switches and routers) are respectively responsible for computing and traffic forwarding. Both methodologies have their own strengths and limitations. On the one hand, server-centric architectures can benefit from low cost of network equipment, good scalability with modular construction, and flexible routing mechanisms. On the other hand, they may suffer from interference between the routing and computing modules. Moreover, in a server-centric design, traffic load tends to be skewed towards routing nodes. As a consequence, the system is more prone to performance degradation in the presence of faulty links/nodes and suffers from single point of failure. One such example is shown in Fig. 8, where the system throughput significantly decreases when the routing node is brought down. In contrast, FatTree and VL2 employ a network-centric design principle. In FatTree, for instance, all network traffic is routed and redistributed by switches. One benefit of such architectures is performance isolation between routing and computing functionalities, which further leads to improved performance under fragmented resource allocation as demonstrated in Fig. 9. Another advantage of network-centric design is that there is no specific bias in the importance of servers, leading to robustness to node/link failures as suggested in the FatTree example shown in Fig. 1. However, as discussed in Section 5.3.1, FatTree may result in inefficient network utilization because regardless of location all network traffic is aggregated at the Pod switches first before further distribution. This leads to the declined throughput of FatTree observed in Fig. 9(a) Impact of virtualization Virtualization introduces an extra layer of complexity to the problems discussed above. For instance, the aforemen-
13 Y. Zhang et al. / Computer Networks 55 (211) Clients Clients Clients Clients Clients Clients Fig. 1. Failure resilience of FatTree. tioned interference problem of server-centric architectures becomes more profound in virtualized environments, where both computing and network resources of physical machines are shared among guest VM s. This problem is manifested by the FiConn experiments conducted in Fig. 5(b), where the VM for Web server stops processing HTTP requests due to its competition of CPU utilization with database servers co-residing on the same PM. Another issue introduced by virtualization is that endnodes in the whole network architecture are extended from physical servers to virtual machines. Network communication between VM s on the same PM conducted through intra-domain memory copying is much more efficient than network I/O. Thus, any new DCN architectures should explicitly take this into account; otherwise, unnecessary traffic may be injected into the network even though more efficient paths exist. This is especially the case in today s data centers, where most VM consolidation mechanisms try to place VM s with data-intensive communication in the same PM to avoid propagating traffic to the higher-level network. As recently pointed out by Shenker et al. , the impact of virtualization on networking is far less explored than on computing. We believe that virtualization will certainly play a non-negligible role in any future DCN architecture design. 6. Conclusion In this paper we conducted an experimental evaluation of FiConn and FatTree in a virtualized environment in the context of a multi-tier transaction system. Even though it is still unclear which class of DCN architectures, hierarchical or flat, will prevail, there are certain fundamental issues that both classes should properly address. On one hand, the concept of locality is naturally embedded in hierarchical architectures in that servers belonging to the same-level component (e.g., FiConn  or DCell ) are physically close to each other and are able to communicate efficiently. This property can be utilized by VM placement schemes to avoid traffic propagation to higher-level networks. On the other hand, due to the hierarchical layout of servers, machines (e.g., routing nodes in FiConn) aggregating traffic between components belonging to the same or different hierarchies are more prone to overloading
14 228 Y. Zhang et al. / Computer Networks 55 (211) and performance bottlenecks. This problem is mitigated in flat architectures, where all servers are equally treated and communication between any pair of servers can almost achieve full link speed. However, these architectures may not be able to fully take advantage of locality (e.g., in Fat- Tree) due to their flat layouts. Both problems become more profound when virtualization is introduced, which not only blurs isolation between physical resources, but also expands the edge of a network from to physical servers to virtual machines. As a consequence, a variety of problems, such as resource contention, performance isolation, and inter-vm communication, emerge. We believe that even though identified in the context of a conventional transaction system, these issues are generic and should be properly addressed by any DCN architecture design. Our future work involves evaluating other DCN architectures in a larger-scale testbed and an actual operational data center, examining performance of DCN architectures under applications with different computing styles and workload mix, and expanding the background environment with more multi-tier applications. In addition, designing new data center network architectures that explicitly address issues raised in this paper forms another line of our future work. References  M. Al-Fares, A. Loukasass, A. Vahdat, A scalable commodity data center network architecture, in: Proceedings of the ACM SIGCOMM, August 28, pp  Apache. <http://www.apache.org/>.  Amazon Web Services (AWS). <http://aws.amazon.com/>.  J. Brodkin, Half of New Servers are Virtualized, Survey Finds. <http://  W.J. Dally, B. Towles, Principles and Practices of Interconnection Networks, Morgan Kaufman Publishers, 24.  Forrester Consulting, How Server and Network Virtualization Make Data Centers More Dynamic. <http://www.cisco.com/en/us/ solutions/collateral/ns34/ns856/ns872/virtualization_c Forrester.pdf>.  Google Apps. <http://www.google.com/a/help/intl/en/index.html>.  A. Greenberg, J.R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D.A. Maltz, P. Patel, S. Sengupta, VL2: a scalable and flexible data center network, in: Proceedins of the ACM SIGCOMM, August 29, pp  A. Greenberg, P. Lahiri, D.A. Maltz, P. Patel, S. Sengupta, Towards a next generation data center architecture: scalability and commoditization, in: Proceedings of the ACM Workshop on Programmable Routers for Extensible Services of Tomorrow, August 28, pp  C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, S. Lu, BCube: a high performance, server-centric network architecture for modular data centers, in: Proceedings of the ACM SIGCOMM, August 29, pp  C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, S. Lu, DCell: a scalable and fault-tolerant network structure for data centers, in: Proceedings of the ACM SIGCOMM, August 28, pp  HAProxy: The Reliable, High Performance TCP/HTTP Load Balancer. <http://haproxy.1wt.eu/>.  M. Kliegl, J. Lee, J. Li, X. Zhang, D. Rincon, C. Guo, The Generalized DCell Network Structures and Their Graph Properties, Microsoft Research, Tech. Rep. MSR-TR-29-14, October 29.  D. Li, C. Guo, H. Wu, K. Tan, Y. Zhang, S. Lu, FiConn: using backup port for server interconnection in data centers, in: Proceedings of the IEEE INFOCOM, April 29, pp  Microsoft Windows Azure. <http://www.microsoft.com/ windowsazure/?wt.srch=1>.  R.N. Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, A. Vahdat, PortLand: a scalable fault-tolerant layer 2 data center network fabric, in: Proceedings of the ACM SIGCOMM, August 29, pp  B. Pfaff, J. Pettit, K.A.T. Koponen, M. Casado, S. Shenker, Extending networking into the virtualization layer, in: Proceedings of the ACM SIGCOMM HotNets, October 29.  The Click Modular Router Project. <http://read.cs.ucla.edu/click/>.  Stacklet Virtual Servers. <http://www.stacklet.com>.  Ganglia Monitoring System. <http://ganglia.sourceforge.net/>.  RUBiS: Rice University Bidding System. <http://rubis.ow2.org/>.  Apache Tomcat. <http://tomcat.apache.org/>.  H. Wu, G. Lu, D. Li, C. Guo, Y. Zhang, MDCube: A high performance network structure for modular data center interconnection, in: Proceedings of the ACM SIGCOMM CoNEXT, December 29.  Xen. <http://www.xen.org/>.  Y. Zhang, A.-J. Su, G. Jiang, Evaluating the impact of data center network architectures on application performance in virtualized environments, in: Proceedings of the IEEE IWQoS, June 21. Yueping Zhang received a B.S. degree in Computer Science from Beijing University of Aeronautics and Astronautics, Beijing, China, in 21, and Ph.D. degree in Computer Engineering from Texas A&M University, College Station, USA, in 28. Since then, he has been a Research Staff Member at NEC Laboratories America at Princeton, New Jersey. His research interests includes Internet congestion control, delayed stability analysis, network management, and data center networks. Ao-Jan Su is a Ph.D. candidate in the Department of Electrical Engineering and Computer Science at Northwestern University. Before joining Northwestern University in 25, he has respectively been a software architect at Yahoo! and core developer of NCTUns Network Simulator at Academia Sinica in Taiwan. His research interests include Internet measurements, content distribution networks, data center networks, and denial-of-service resiliency. Guofei Jiang is a Department Head with the Robust and Secure Systems Department in NEC Laboratories America (NECLA) at Princeton, New Jersey. He leads a dozen of researchers focusing on fundamental and applied research in the areas of distributed system and networks, autonomic system management, system and software reliability, machine learning and data mining, and system and information theory. Dr. Jiang was an associate editor of IEEE Security and Privacy magazine and has also served in the program committees of many prestigious conferences. He has published nearly 9 technical papers and also has over 2 patents granted or applied. His inventions have been successfully commercialized as NEC products and have significantly contributed to NEC business.