DATACENTER SWITCHES IN THE CAMPUS BACKBONE Dan Matthews Manager Network Engineering and Unified Communications, Case Western Reserve University
Using Fixed Configuration Data Center Switches in the Campus Backbone CONTENTS CC-NIE Cyberinfrastructure Challenge (Background) Evaluating Backbone Upgrade Options Evaluating Upgrade Requirements Evaluating What is Available Campus vs Data Center - Features CWRU Implementation and Experiences Deployment, Topology, Benefits Performance Monitoring (SNMP vs Splunk) Buffer monitoring and VXLAN [ 2 ]
CC-NIE Cyberinfrastructure Challenge (Background) CWRU Received a CC-NIE grant in October 2013 Included Science DMZ Component (100GE to OARnet / Internet2) Included network upgrades to extend 10GE to research centric buildings and labs Campus Backbone is Aged, Not ready for 10GE to buildings. Current Network Infrastructure is from circa 2003 Distribution routers are pairs of Cisco 6509-E/sup720Base (L3 HSRP) Core to Distribution is 10GE (WS-X6704-10GE) Distribution to Building (Access) is 1GE (WS-X6516A-GBIC) Buildings are dual connected to distribution pairs (Fairly typical campus design) Multiple Pis within Bingham distribution collaborated on CC-NIE. Bingham Distribution contains 20 Buildings Need to upgrade three to 10GE, the more the better obviously. [ 3 ]
CC-NIE Cyberinfrastructure Challenge (Background Cont.) Solution 1: Status Quo in the Backbone (Install Line cards) Install a Cisco WS-6708 and X2-LR optics in each distribution 6509 (List price $149k) Provides 10GE ports enough for 8 buildings (16 ports @$9,312.50 list per port). No other changes required. One of the benefits of chassis gear. Solution 2: Spend that money on something different. Possibly even replace the old equipment altogether. Lets generate a list of requirements and nice to haves for this part of the network. Lets look at the market and see that what else may meet these requirements at that price point. Seek better per port value. Lets look at feature sets that may provide more options in terms of high performance networking and ease of operations. We went with Option 2. [ 4 ]
Step 1. Take Inventory (What do we really need?) Interface Count and Type Need 24 optical 1/10GE ports minimum (12 10GE for fair comparison to option 1) Having 48 would be better for many reasons. L3 Requirements (modest tables sizes, standard protocols) Needs to support OSPF, OSPFv3, IPV6, a FHRP of some sort. Needs to support policy based routing, Standard ACLs, QoS Other standard campus routing needs like IPv4/IPv6 DHCP relay, RPF, PIM Needs to support 1000 ipv4 routes (Current is less than 500 in these routers) Needs to support 1000 ipv6 routes (Current is less than 25 in these routers) L2 Requirements Needs to support Spanning Tree Protocol Needs to support 10,000 entry CAM table (Currently ~5,600) Needs to support 10,000 entry ARP table [ 5 ]
Evaluating Backbone Upgrade Options (What's Available?) Traditional Campus Core and Dist Options (Overkill and Expensive) (gleaned vendor web sites and reference designs) Cisco Catalyst / N7K, HP 5400/7500, Brocade ICX / MLX, Juniper EX6200/8200 Most are chassis based, most have huge L2 and L3 table capabilities (~256k+ ipv4). Most have power supplies ranging from 2500W to 6000W Cost per 10GE port ranges from $4,000 to $7,000+ list with vendor optics. Cost per 40GE / 100GE too much for this exercise or just plain not available. Data Center ToR Switches (Sweet spot for Value and Functionality?) Cisco Nexus 9372px, Arista 7050/7280, Dell S4048-ON, HP 5900AF-48XG, Etc A lot of 1U fixed 48 SFP+, 4 to 6 QSFP, albeit smaller L2 and L3 tables. Most have efficient power supplies ranging from between 500W and 1200W Cost per 10GE port between $2,000 and $5,000 list with vendor optics Cost per 40GE / 100GE still pricy but available with flexible options (break out cables) [ 6 ]
Campus vs Data Center: Features (Differentiators) Features are comparable, but not the quite the same (see below). Data Center switches offer some neat stuff though. Traditional Campus Core and Distribution Features Most offer a virtual chassis system. (No FHRP, fewer configs, multi-chassis LAG) Most offer full MPLS / VPLS implementations. Some Offer integrated Security / NAC features. Some offer services line cards (Firewalls / Load Balancers / Wireless Controllers) Data Center Switch Features Most have some sort of fabric (if you are into that sort of thing). multi-chassis LAG. Most have VRF / VRF Lite Most offer Network Telemetry and very low latency forwarding. Most have API / OpenFlow integrations and automation tools (Puppet, Chef, XMPP). Most offer VXLAN for extending L2 over L3 networks. [ 7 ]
Campus vs Data Center: Sanity Check Are ToR switches suitable for both IMIX and research flows? Data Center Pros: More ports, Less Power, Less Space, Cool Features We get ~96 10GE capable ports instead of 16 and an upgrade path for all 20 buildings We get at least a 2x40GE EtherChannel between the pair and Multi-Chassis LAG. We get a 40GE or 100GE upgrade path for core links. We get to features like advanced buffer monitoring, automation, VXLAN. We use way less power, generate less heat, and take up less space. Data Center Cons: Longevity? More Risk. Shorter life span. No easy (change-less) upgrade path to dense 40GE/100GE. No operational experience with most of these devices and OSes. Higher risk overall by replacing all L2 and L3 services with new equipment. We won t be able to scale this OSPF area to 256k IPV4 routes bummer [ 8 ]
Data Center Switch Options Abound Many other Data Center ToR switches might be a good fit in campus backbones. Some Include Dell S4048-ON, Cisco Nexus 9372px, Brocade ICX-7750, HP 5900AF-48XG, Juniper QFX 5100-48S. Choose your favorite vendor, I bet they have something to look at. Most are based on merchant silicone. Software and support are key. Many campuses have already started using 1U switches like Cisco 4500-X, Juniper EX4550, etc, as those are cross-marketed as campus and data center. They lack some features of data center offerings. Dense 100GE switches are now on the market or shipping soon. Dell Z9100, Z6100 Arista 7060CX Cisco Nexus 3232C [ 9 ]
Lets roll the dice! We decided to take a shot. If it fails, we can always use the switches in well a data center. We settled on a really new switch at the time, the Arista 7280SE-64. Choosing Arista helped minimize some of the operational risk. We had been using Arista in HPC for a while so Engineers were familiar with EOS. We also chose Arista 7500 for HPC / Science DMZ integration. The Arista 7280SE-64 specs exceeded our needs (table sizes, port count) Based on the Broadcom Arad chipset. 48 1/10GE SFP+, 4 40GE QSFP (typical 4Watt per 10GE port) 64k IPv4 /12k IPv6 LPM routes, 128k MACs, 96k ARP / Host entries, PIM, VRRP Buffer Monitoring, VXLAN, Splunk App for Network Telemetry (we like Splunk), MLAG, etc. [ 10 ]
Data Center Switches in Campus Backbone: Outcomes Arista 7280SE-64 in production today and working really well. No VoIP /QoS / Multicast issues. No packet loss or high CPU or high latency that we have seen. Five Engineering buildings were upgraded to 10GE uplinks. Cost was less that adding line cards and optics to Catalyst 6509-E. We deployed pairs of Arista 7150S-24 as building aggregators to take care of the other side of the links and provide 10GE ports within the buildings. Energy Savings Add Up (Nearly $5k/year per pair) US Average (All Sectors) is 10.64 Cents /kwh http://www.eia.gov/electricity/monthly/epm_table_grapher.cfm?t=epmt_5_6_a Old equipment costs $5,331.40/yr (4*1430W)/1000)*.1064*24*365 New equipment costs $354.18/yr (4*95W)/1000)*.1064*24*365 If only our budgets recognized this, energy savings pays for maintenance! [ 11 ]
Data Center Switches in Campus Backbone: Measurement [ 12 ]
Actual PerfSonar Throughout Test Graph Hourly PerfSonar 10GE throughput tests from Bingham Dist to Science DMZ [ 13 ]
Traditional SNMP Obscures Traffic Bursts This shows only 750Mbps. Where are my spikes? [ 14 ]
Splunk Network Telemetry App: Bandwidth Chart Wow! I can see the PerfSonar traffic Bursts! [ 15 ]
Buffer Monitoring (Also with Splunk App) Looking at buffer (queue) utilization of Bingham Eth 33 (uplink to core) Can you guess when I stopped the 10GE PerfSonar Throughut Tests? [ 16 ]
Buffer Monitoring (No Splunk Required) You can see this via the CLI too Might be useful for identifying microburst congestion events that could cause packet loss. [ 17 ]
Extending your Science DMZ using VXLAN No real bandwidth advantage, but aids in applying consistent security controls and inspection. Make sure the VTEPs have a firewall free path! Glennan CWRU ScienceDMZ Deployment CASC Internet2 1GE 10 GE 40 GE 100 GE Olin PerfSonar-bing inetrouter0 WS-C6509E 40GE trunks w/ science DMZ and private HPC Nets DTN1 PerfSonar-dmz White Bingham Rockefeller bingham- h0- e1 MLAG bingham- h0- e2 Case Backbone 129.22.0.0/16 sciencerouter0 Juniper MX480 hpc-rhk15-m1-e1 Arista 7508E Po10 Nord CC-NIE Engineering Buildings Science DMZ Vlan Trunked to Building Lab System PBR Enabled FW Bypass Crawford inetrouter1 WS-C6509 KSL HPC VXLAN Tunnel CWRU HPC [ 18 ]
Summary Data Center class ToR L3 switches can work in campus backbone deployments. Thought must be given to current and mid-term requirements in terms of advanced features. The value proposition is compelling in comparison to traditional (or at least marketed as traditional) campus core and distribution options. Data Center network equipment is designed with power, heat, and space efficiency in mind. Depending on the size of your backbone, this could make a difference for you. Data Center network equipment seems to adopt new networking technology more rapidly than campus centric offerings. Some of which can be helpful to Cyberinfrastructure engineers. Data Center network equipment has a robust set of API and automation tools that are not as mature in campus or enterprise offerings. (didn t have time to cover this next time) [ 19 ]
References === Brocade List Pricing === http://des.wa.gov/sitecollectiondocuments/contractingpurchasing/brocade/price_list_2014-03-28.pdf === Cisco List Pricing === http://ciscoprice.com/ === Juniper List Pricing === http://www.juniper.net/us/en/partners/mississippi/juniper-pricelist-mississippi.pdf === HP List Pricing === http://z2z-hpcom-static2-prd-02.external.hp.com/us/en/networking/products/configurator/index.aspx#.vfwjxcbvhbc http://www.kernelsoftware.com/products/catalog/hewlett-packard.html === Dell Campus Networking Reference === http://partnerdirect.dell.com/sites/channel/documents/dell-networking-campus-switching-and-mobility-reference-architecture.pdf === HP Campus Network Design Reference === http://www.hp.com/hpinfo/newsroom/press_kits/2011/interopny2011/fcra_architecture_guide.pdf === Cisco Campus Network Design Reference === http://www.cisco.com/c/en/us/td/docs/solutions/enterprise/campus/ha_campus_dg/hacampusdg.html http://www.cisco.com/c/en/us/td/docs/solutions/enterprise/campus/campover.html http://www.cisco.com/c/en/us/td/docs/solutions/enterprise/campus/borderless_campus_network_1-0/borderless_campus_1-0_design_guide.pdf === Juniper Campus Network Design Reference === http://www.juniper.net/us/en/local/pdf/design-guides/jnpr-horizontal-campus-validated-design.pdf http://www.juniper.net/techpubs/en_us/release-independent/solutions/information-products/topic-collections/midsize-enterprise-campus-ref-arch.pdf https://www-935.ibm.com/services/au/gts/pdf/905013.pdf === Brocade Campus Network Design Reference === http://community.brocade.com/t5/campus-networks/campus-network-solution-design-guide-bradford-networks-network/ta-p/37280 [ 20 ]
DATACENTER SWITCHES IN THE CAMPUS BACKBONE Dan Matthews Manager Network Engineering and Unified Communications, Case Western Reserve University