Introduction Data Center Switch Fabric Competitive Analysis This paper analyzes Infinetics data center network architecture in the context of the best solutions available today from leading vendors such as Cisco, Juniper Networks, Arista Networks and Force 10 Networks. The target audience is designers of network infrastructure hardware and control software used in large- scale data centers. The document has the following organization: Overview of Infinetics architecture Analysis of leading industry solutions Analysis of Current Architecture of Choice Comparison of Infinetics architecture Conclusion Overview of Infinetics Architecture Infinetics has developed a new way of connecting large numbers of nodes consisting of some combination of computation and data storage, with behaviors and features that are hard or impossible to achieve using current methods. We have developed software that runs on standard data center switches and hypervisors, and supports any network topology. We have also determined that specific new topologies work far better than all those discovered to date, and have tuned our initial implementation to support one of them. The essential difference between Infinetics approach and all presently existing solutions is the flexible, practically unlimited radix of networks that can be constructed. Although there are presently switches which can be upgraded from an initial configuration with a smaller radix to a configuration with a higher radix, the maximum radix is fixed in advance to at most a few hundred to a few thousand ports. Further, the radix multiplier switching fabric for the maximum configuration is hardwired in the switch design. For example, a typical commercial switch such as Arista Networks 7500 can be expanded to 384 ports by adding 1-8 line cards, each providing 48 ports; but the switching fabric gluing the 8 separate 48 port switches into one 384 port switch is rigidly fixed by the design and it is even included in the basic unit. In contrast the Infinetics architecture has no upper limit, neither in advance nor later, on the maximum number of ports it can provide. For any given type of switch with radix R, the upper limit for simple expansion without performance penalty is 2 R- 1 component switches. Since typical R is at least 48, even this conditional limit of 2 47 1.4 10 14 on the radix expansion is already far larger than the number of ports in the entire Internet, let alone in any existing or contemplated data center. The Flexible Radix Switch does not require the very expensive core and fabric switches usually required to control broadcast flooding and other adverse behaviors of large data center networks. Instead, it can be configured to run on basic commodity switches. It can also run on more powerful switches, but it does not require complicated configuration. Additionally, the architecture provides a significant performance edge over any existing or proposed data center Layer 2 network. Infinetics Technologies, Inc. Rev. DCFCA-5111 ALL RIGHTS RESERVED 2009-2011 Page 1 of 7
Infinetics new network architecture provides: a) Near limitless number of nodes b) Throughput that scales nearly linearly with the number of nodes without bottlenecks or throughput restriction c) Simple incremental expansion whereby increasing the number of nodes requires only a proportional increase in the number of switching components, while maintaining the throughput per node d) Maximized parallel multipath use of available node interconnection paths to increase node- to- node bandwidth e) Long hop topology enhancements that simultaneously minimize latency (average and maximum path lengths) and maximize throughput at any given number of nodes f) Fully unified and scalable control and management plane g) Very simple connectivity nodes connected to interconnection fabric do not need to have any knowledge of topology or connection patterns h) Streamlined interconnection paths. All dense interconnections have regular wiring patterns and use very short cables ; physically distant nodes have sparse connections, resulting in very simple and economical interconnection and wiring. Leading Industry Solutions Many leading network hardware vendors have developed products that can be configured to provide very high throughput as required by modern data centers with large numbers of physical and virtual servers. While these solutions are a great improvement over traditional switch hardware, they have inherently high costs that are a result of the fundamental characteristics of the hardware and firmware within the switches. Vendors emphasize various attributes of their networks, sometimes focusing on the improvement over traditional networks, and sometimes on the relative merits of one vendor s solution over another s. The following analysis of four vendors hardware sets the scene for a direct fact- based comparison with the behavior of the Infinetics network architecture. This approach removes all bias introduced by vendor self- promotion and focuses solely on what is physically possible based on each product s publicly disclosed operating characteristics. Cisco Analysis The analysis was performed on the Cisco Nexus 7018 in its FabricPath configuration, with a trunking factor of 32 meaning that each of the links from a top layer switch to a bottom layer switch uses 32 ports on each switch for interconnection. This results in a network that: Uses 6 Nexus 7018 switches Costs $3,300K (or $3223 per port) Consumes 40 Watts per port Infinetics Technologies, Inc. Rev. DCFCA-5111 ALL RIGHTS RESERVED 2009-2011 Page 2 of 7
Arista Networks Analysis The analysis was performed on the Arista 7500 series, with a trunking factor of 72. This results in a network that: Uses 8 Arista 7500 switches Costs $2,880K (or $2,813 per port) Consumes 40 Watts per port Juniper Networks Analysis The analysis was performed on the Juniper EX8216, with a trunking factor of 8. This results in a network that: Uses 24 Juniper EX8216 switches Costs$10,440K (or $10,195 per port) Consumes 141 Watts per port Force 10 Networks Analysis The analysis was performed on the Force 10 E1200i, with a trunking factor of 8. This results in a network that: Uses 24 Force 10 E1200i switches Costs $8,544K (or $8344 per port) Consumes 110 Watts per port Cost and Power Savings with Infinetics Approach The Infinetics architecture that provides an equivalent 1024 available ports and equal available bandwidth to the industry vendor solutions described above: Uses 64 PICA8 Pronto 3780 switches Has oversubscription 1.057 (effectively the 1 of the other vendor networks) Costs $768K Consumes 22W of power per port Table 1 shows the relative cost and power of the Infinetics network compared to the four industry solutions described above. Cisco Arista Juniper Force 10 Relative cost 4.3x 3.8x 13.5x 11x Relative power 1.8x 1.8x 6.4x 5x Table 1. Infinetics Technologies, Inc. Rev. DCFCA-5111 ALL RIGHTS RESERVED 2009-2011 Page 3 of 7
Current Architecture of Choice First, some background. Folded Clos is a family of network topologies that is typically controlled by 3 parameters that select a specific member from the very wide range of possibilities. The Fat Tree is parametrized as FT(h,m,w), where h is the number of layers, each non- leaf has m children and each child has w parents. The simple tree is obtained by setting the number of parents w to 1. The equivalent of hypercube (of dimension d=2h) is obtained by FT(h,4,2h). Many other topologies are possible. One partially scalable Folded Clos subclass (SFC) is of particular interest because of its use in the emerging TRILL standard and in Cisco s FabricPath based networks. In this two layer network, the top layer is called the "spine" and has no servers connected, and the bottom one is called the "leaf" layer, where servers connect. This network is only partially scalable since the maximum number of external ports (hence servers) is only P = R 2 /2, where R is the radix (number of ports) of the switch used as the building block. A truly scalable network does not limit the maximum number of attached servers. In Figure 1 below, the B switches are the spine layer, A switches are the leaf layer, Q represents the trunking factor, and M is the number of switches in the spine layer. Figure 1. SFC becomes scalable if an arbitrary number of layers are used, instead of the usual 2. With H layers, the number of external ports is P = 2*(R/2) H. Hence for any fixed radix (R) for a component switch, the number of ports (P) can grow arbitrarily large. The present generation of solutions offered by the major switch vendors only achieves the behavior of SFC with large radix component switches. This scheme locks users into chasing ever larger radix R as the network grows, while making obsolete the previous switches with smaller R. Therefore, we will consider below only the SFC topology, since it applies to the commercially available FabricPath, QFabric, Fulcrum's FocalPoint hardware implementation, and other solutions. Infinetics Technologies, Inc. Rev. DCFCA-5111 ALL RIGHTS RESERVED 2009-2011 Page 4 of 7
The chief feature of SFC is that it produces a non- blocking network with maximum bisection for any given radix and number of component switches (N), where N must be divisible by 3. Since the cost of such a network is N times the switch cost, this implies that no other network using the same component switches and having the same bisection can cost less than SFC. Namely, to reach the matching bisection, a would- be competitor of FCS would have to use at least as many switches of given R as FCS, which means it would cost at least as much. However, SFC pays a hidden price for the desirable feature of maximum bisection above: It is over- optimized for worst case traffic, the case in which each source is sending only to the farthest destination in the network, i.e. for the singular case when all paths are of precisely of the maximum length. As a result SFC suffers a big throughput penalty on the remaining 99.99+ percent of all possible traffic patterns. The worst case pattern is an exponentially small fraction of all traffic patterns, the magnitude of which is shown in Figure 2 below: Figure 2. The vertical axis expresses latency due to buffering of the frames that cannot be forwarded, i.e. an indirect measure for the network overload. The horizontal axis shows the traffic load relative to the 'all pipes full' capacity. At exactly 50% of the maximum load, the throughput of SFC tops out and all extra frames have to be indefinitely buffered. In contrast, the hypercube and flattened butterfly topologies undergo the same overload only when the traffic load reaches the actual full capacity at twice the load of the SFC overload point. Thus, for the sake of achieving the maximum bisection possible for a given number of switches with a given radix R, SFC squanders half its switching capacity for that goal. Thus for 99.99+ percent of the traffic patterns it is heavily underutilized, with a maximum of 50% utilization. As a result, its cost per Gb/s of throughput is double the cost for the regular hypercube or flattened butterfly, being dominated by the throughput of the non- worst case traffic patterns. Infinetics Technologies, Inc. Rev. DCFCA-5111 ALL RIGHTS RESERVED 2009-2011 Page 5 of 7
Comparison of Infinetics Architecture Infinetics long- hop hypercube augments the bisection to almost the level of the folded- Clos (mathematically the maximum possible), while simultaneously shortening the average path lengths by a factor of 2 or more compared to a plain hypercube, as illustrated in Table 2 below. Also, capacity is boosted by the same factors over hypercube for the average traffic (random, all- to- all, etc). Table 2. Therefore, Infinetics long- hop hypercube augmentation achieves the best of both worlds: It handles the worst case traffic (bottleneck capacity) nearly as well as the best that can be achieved by SFC, while simultaneously improving on the most common traffic case over a hypercube by a factor of 2 or more. Infinetics Technologies, Inc. Rev. DCFCA-5111 ALL RIGHTS RESERVED 2009-2011 Page 6 of 7
Conclusion The basic conclusions on how well Infinetics compares against the Fat Tree, used by Cisco s FabricPath and Juniper s QFabric backbone and other upcoming TRILL- enabled networks are quite clear and simple: 1. With the same component switches as used in the competing topologies, Infinetics cost per Gb/second of network throughput will be less than half of the cost for a Fat Tree based topology. 2. Using common off- the- shelf commodity switches with very low per- port cost, Infinetics can build a network that has approximately 10 times lower cost per port with equivalent available bandwidth, with no penalties imposed by traffic patterns encountered in real world data center usage scenarios. 3. Infinetics bottleneck capacity for the worst case traffic patterns (which also defines the oversubscription figure) will be practically identical to the mathematically best possible value, as shown in the Cisco example above where the Infinetics oversubscription ratio is 1.057, compared to exactly 1 for Fat Tree. 4. Unlike SFC, which limits the size of the flat Layer 2 network to R 2 /2 ports for the component switch radix R, the upper limit of the Infinetics flat Layer 2 network is exponential in R, which even with 48 port COTS switches is for all practical purposes unlimited. In contrast, any Fat Tree architecture requires the largest available switches to even achieve a flat Layer 2 size of a few thousand 10GbE ports. In other words, Conclusions (1) and (2) above are the best that a Fat Tree can do in the limited context of fairly small data center networks. To support larger networks, the Fat Tree approach has to rely on using current data center expand- up schemes, which vastly increases the cost per Gb/second. Infinetics Technologies, Inc. www.infinetics.com info@infinetics.com T: 877-438- 1010 Notice: Infinetics and the Infinetics logo are trademarks of Infinetics Technologies, Incorporated. All other trademarks are the property of their respective owners. Infinetics Technologies, Inc. Rev. DCFCA-5111 ALL RIGHTS RESERVED 2009-2011 Page 7 of 7