Best Practices in Planning and Implementing Data Center Networks OVERVIEW Financial enterprises are expanding geographically through growth and acquisitions at an unprecedented rate. Simultaneously, they re forced to comply with increasingly stringent government regulations. To meet these twin challenges, financial institutions are relying more heavily on any-to-any connectivity between sites and users connectivity between large data centers. Such data center networks extend local area network (LAN) and storage fabric between geographically dispersed financial enterprise locations from headquarters to data center to individual branches. To accomplish this extension of LAN and storage fabric, data center networks often aggregate several data traffic types into a few (or one) transport technologies. Various data traffic types might be used in disparate locations from Ethernet and physical layer (PHY), to time-division multiplexing (TDM), to Fibre Channel and beyond. This data traffic must be aggregated into telecommunications services into some combination of dark fiber, dense wavelength division multiplexing (DWDM) and. This consolidation of financial business information into connected data centers can reduce costs, enhance security, simplify management, and can also enable the necessary infrastructure for complying with government regulations like the Sarbanes-Oxley Act and Gramm-Leach-Bliley Act.
DATA CENTER PROTOCOLS Ethernet FastE / GigE / 10G LAN PHY TDM DS-X / OC-X TRANSPORT TECHNOLOGIES dark Fiber dwdm sonet Storage FibreChannel / FICON / ESCON Infiniband / FCIP To provide optimal benefit, the data center network should be private, stable and resilient. It should be able to grow with your business and provide a migration path to future technologies; and it should also adapt quickly to changing business requirements. Last, but definitely not least in the financial industry, is the issue of business continuity and disaster recovery (BCDR). Financial information, from account records to loan and investment information may be stored in separate locations, yet interdependent and tied to a single client. Data center networks enable not only the recovery of such information, but its interdependent usability for true business continuity. Bandwidth as Driver for Data Center Networking The business drivers toward data center networks can be boiled down to one technical need more bandwidth. Infrastructure inside financial data centers is often at multi-gigabit levels, particularly with LAN and storage area network (SAN) fabrics. Emerging use of e-mail and instant messaging applications within the financial services industry, along with new standards and regulations, such as Regulation NMS, have dramatically increased storage requirements. Between e-mail, e-mail attachments, presentations and other image files, large binary objects abound and many must be retained indefinitely. From a bandwidth perspective, server I/O bus bandwidth is at N x 10G levels and rising, while PCIe2.0 bus bandwidth has increased from 250 Mbps to 500 Mbps per lane. These types of throughput are driving the ever-increasing bandwidth requirements. The chart in Figure 1 illustrates the bandwidth that today s more robust technologies consume with respect to current PCI bandwidth limits.
PCI Bandwidth Limit 1K 10K 100K 1M 10M 100M 1G 10G 10-Gigabit Ethernet 1-Gigabit Ethernet Fast Ethernet IEEE 802.11a/g IEEE 802.11b 10 Megabit Ethernet ADSL (Broadband) Cable Broadband ADSL (POTS) Megabytes per Second [Figure 1] Ever-advancing technologies require ever-greater capacity, and hard disk size reflects the ongoing growth in financial data storage. If the enterprise has storage needs represented by X and they purchase 2X worth of storage, they know they will fill it. There is no such thing as enough storage in any enterprise, as the chart in Figure 2 below illustrates. The dots represent usage, and the line represents capacity. Over the years, any increase in capacity has been overwhelmed by increased consumption. Despite the dramatic increase in hard-drive capacity over the years, the steady state of hard disks in any given data center has remained full. [Figure 2]
Challenges for Data Center Networks When developing a data center networking strategy to facilitate scalable, flexible transmission between continually growing data centers, there are five principal challenges that should be addressed: Site selection: Where to move your data? This can often be the first challenge in a data center network build. Selecting an appropriate site depends upon many factors, including space required, cost of real estate and power, the skill set of the area s labor force, fiber, telecom infrastructure, location and distance from the primary data center, and latency requirements. Adequately assessing and weighing these factors can take weeks or even months. Diversifying access: Once a data center network site has been chosen, you should look to establish sufficiently diverse telecom access into your facility. For financial enterprises with demanding bandwidth and redundancy requirements, it is generally recommended to have three or more carriers into the facility in order to find two that are diverse from each other on any given route. Diversification is sound practice throughout the financial services sector, and that strategy is no different when it comes to working with carriers. Your carriers should provide the flexibility to work within your locations, within your requirements, and be able to build into facilities close to their network. Revolving requirements baseline: Business needs constantly change. While the data center network is being planned and developed, in-house customers can often test and change their needs, equipment configurations and architectures. The long lead times typically associated with data center network configuration practically make shifting requirements a certainty, and the in-process data center network must be prepared to accommodate them. Technology limitations: With the rapidly changing nature of technology, utilizing a single technology for a long period of time should also be carefully considered. It is important to provide a migration path for technological growth (e.g., TDM to Ethernet, 2.5G to 10G to 40G to 100G) and to have the flexibility to move to the next incremental transmission technology or level of bandwidth. Multiple architectures: Both metro (synchronous) and long-haul (asynchronous) replication and BCDR may be necessary.
Solutions: Do-it-Yourself vs. Managed Services There are a variety of approaches to data center networking, but they can be largely broken down into two categories: managed service through outside providers, and do-it-yourself, in-house approaches. These two approaches can then be separated by their use in metro networks and long-haul networks. Managed Service Do-It-Yourself [Figure 3] Common approach, lots of choices in fiber-rich markets Manages, private and public rings Common approach in financial vertical Fiber lease with equipment purchase Metro Networks Typical solution approach for servicing out-of-region data centers Usually combined with metro solutions at each data center Hard to implement due to cost structure and complexity of logistics (reamplification, regeneration, co-location) Retail long haul dark fiber market is almost extinct Long Haul Networks As outlined in Figure 3, the do-it-yourself approach is more common in the metro network space, typically due to its relative ease of operability and lower cost structure. However, as networking needs expand beyond a single metro market, the complexities of deploying and maintaining the network will often require carrier expertise. Historically, dark fiber has been integral to the do-it-yourself approach to long-haul networking, but long-haul dark fiber is rarely made available by facilities-based carriers for lease to anyone but large wholesale partners. In addition, evidence suggests that as many as half of the companies that buy dark fiber ultimately opt for managed service instead, because dark fiber does not meet their networking needs. Solutions: Sizing the Network Data center networks can size from optical carrier OC-3 all the way to OC-192 and multi-wavelength solutions, depending on the enterprise needs: 5
OC-3 to OC-12 solutions can deliver Fast Ethernet, fractional gigabit Ethernet (GigE) and TDM. The carrier edge device for this type of service is typically an on-premise multiplexer such as a Lucent Metropolis DMXtend Access Multiplexer or a TDM-Ethernet converter such as the Overture Networks ISG Series. The customer equipment is typically a router/switch such as the Cisco Catalyst 6500 Series Switches with TDM, packet over (POS) and Ethernet cards. The carrier will almost always manage their edge device and usually will manage routers for an additional fee. However, it can provide limited flexibility in what you can take from the carrier and provide to your in-house customers. With these smaller aggregate circuits, you cannot provide true multiservice provisioning. Your scalability becomes limited to upgrading to a larger service. This type of service is most appropriate to the small enterprise with lesser requirements (i.e., the size of the databases they must move from one location to another). Smaller organizations tend to rely on a single type of architecture with one type of replication protocol and one simple pipe that s moving data to a bunker site or BCDR location. OC-48 to OC-192 solutions can deliver fractional and full line-rate GigE or storage over (Fibre Channel, Fiber Connectivity [FICON]). The carrier edge device deployed at the customer premise for this type of service is usually a multiplexer such as a Lucent Metropolis DMX Access Multiplexer or Cisco ONS 15454 series. The customer equipment is typically a combination of routers, Ethernet switches, SAN switches (e.g., Brocade 5000 switch) and storage channel extension devices (e.g., Brocade M3000 Edge Storage Router and M3700 Storage Director-eXtended [USD-X] switches). This bandwidth range can be considered the sweet spot for deploying true multi-service platforms, which are typically multiplexers with a variety of TDM, Ethernet and storage tributary interfaces. These systems provide greater flexibility through more choices in tributary handoffs and, depending on day-one utilization, can upgrade to a larger or additional service or multi-wavelength services. Multi-wavelength solutions can deliver multiple TDM circuits, OC-X, GigE, Fibre Channel, FICON and others. Carriers deploy multi-wavelength services by placing one or more Wave Division Multiplexers (WDM) on the customer premise. Carriers can deliver, Ethernet and storage services directly from these platforms, and often will use the WDM in combination with a multiplexer to provide
a full range of services to customers. The customer will connect to the carrier s edge equipment the same kinds of devices that they connect for a multi-service provisioning platform: routers, switches, and storage gear. These multi-wavelength services provide flexibility and scalability in data center builds that approach or surpass an OC-192 of day-one capacity. Carriers often subtend one or more rings over a DWDM infrastructure, thereby protecting any circuits handed off to the customer via. Additional overlays can then provide an incremental growth path with additional flexibility and resilience (Automatic Protection Switching). Solutions: Shaping the Network Depending upon the needs of an enterprise, it is possible to construct a data center network that accommodates multiple network needs and levels of complexity. Two of the most common types are ring and linear topologies. A ring topology can be as simple as diverse paths between two data centers within a metro area. In practice, both enterprises and carriers often find it beneficial to include one or more carrier points of presence (POPs) or carrierneutral facilities in the ring. This gives the enterprise diverse access to long-haul services, potentially with hand-off to other carriers in a carrier-neutral facility. This topology also allows project stakeholders to recoup the costs of a data center build by providing the enterprise with access to a wide array of products and services including private-line, Internet access and colocation services. Figure 4 illustrates a logical view of a typical access-ring configuration. Dual Points of Entry (POE) in all buildings. Data Center Service Provider PoP Typically, 2-4 nodes 1-2 data centers 1-2 service provider PoPs [Figure 4] Data Center Service Provider PoP 7
While linear topologies with diverse transmission facilities are sometimes used in metro applications to achieve resilience, they are more common in the long haul where an enterprise needs to achieve route and POP diversity, often by using more than one service provider. Figure 5 illustrates two corporate data centers connected via diverse linear facilities. Diversity on several levels: POE, Metro fiber, POP, Long Haul routes Data Center metro Service Provider PoP metro Long Haul OC-X or Wave Long Haul OC-X or Wave metro Service Provider PoP metro Data Center [Figure 5] Service Provider PoP Service Provider PoP Appendix A includes diagrams representing additional ring topologies, linear topologies, and finally more complex ringed metro with linear long-haul examples. Diversity, scalability and other characteristics of each design category are identified as they relate to an enterprise s data center connectivity requirements. Trends With the consistent growth in bandwidth requirements for data center networking, it s important to review the technologies being created to satisfy those requirements. Below are technologies that customers have been or soon will be requesting of their managed services providers to help address current and future bandwidth headroom needs: 10 GigE LAN PHY: The 10 GigE LAN PHY is the next quantum leap in Ethernet bandwidth beyond gigabit Ethernet. 10 GigE LAN PHY interfaces are readily available today on some switching equipment inside data centers. More economical than the -equivalent OC-192 interface, 10 GigE LAN PHY has a far lower cost per port, and as a true Ethernet interface, behaves in the same way as its Ethernet predecessors. Requests to support 10 GigE LAN PHY are now common in large enterprise data center network requirements. Multi-Gigabit Fibre Channel: Enterprises are increasingly requiring support for multigigabit Fibre Channel interfaces in their metro networks beyond single-gigabit Fibre Channel. Multi-gigabit Fibre Channel equipment and ports are now available on SAN 8
fabrics, channel extension gear, and /DWDM platforms at the 2 gigabit (FC200) and 4 gigabit (FC400) levels, with 10G Fibre Channel soon to come. Multiplexers with G.709 capability: G.709 is an International Telecommunication Union (ITU) standard that s been incorporated into the latest generation of optical equipment. It provides in-band management and forward error correction that enables optical signals to travel longer distances without degrading the signal. Carriers use these multiplexers to deliver next-generation optical services like 10 GigE LAN PHY. Nextgeneration platforms that incorporate G.709 capability typically also support features like virtual concatenation (VCAT) and generic framing procedure (GFP), which allow enterprises to more efficiently use the bandwidth they purchase. 40G/OC-768: The next quantum increase in broadband service in the hierarchy is widely believed to be 40G, based on the past progression from OC-3 to OC-12 to OC-48 to OC-192 (always a factor of four). Some enterprises are already asking for scalability to 40G data center networks on projects in development. 100G Ethernet/100G optics: While bandwidth has traditionally increased by factors of four, Ethernet s quantum increases have been at a factor of 10, and 100G Ethernet is already being planned in the Institute of Electrical and Electronics Engineers (IEEE) standards committees. There is compelling evidence that national service provider backbones may leapfrog from their current 10G infrastructures to a next generation that supports 100G services. Time will tell whether 40G is economically viable as a pervasive carrier infrastructure, or whether it will be overtaken by 100G technology before it is widely deployed. Conclusion Several conditions are driving extraordinary growth in bandwidth requirements for the financial services organization. Mergers and acquisitions, along with general business growth are resulting in disparate sites with interdependent data. Regulatory requirements mean such data must be stored longer and more securely. Competition, meanwhile, insists that financial data be quickly and securely accessible, and quickly recoverable in the event of a catastrophe. Data center networks can help financial enterprises meet these challenges more efficiently. Several critical issues should be considered when evaluating data center network options, including: 9
Site selection ensuring that your site is appropriately located and provides enough space and the right technology. Diversity of access ensuring multiple carriers in a facility, so that you have two that are diverse from each other on any given route. Evolving requirements adapting to internal requirements that shift during the lengthy course of data center network development. Technology limitations seeing that your facility has appropriate technology to meet not just today s needs, but tomorrow s as well. Multiple sizes and topologies can be adopted for financial data center networks, and they can meet almost any enterprise need while providing sufficient bandwidth to scale seamlessly. The most sophisticated enterprises will also have to look toward tomorrow s technologies today such as 10 GigE and multigigabit Fibre Channel. Any-to-any connectivity is no longer a luxury for the financial enterprise it is a requirement. For any-to-any connectivity between sites and users, data center networks must provide scalable, flexible and secure platforms from which to satisfy the financial industry s rapidly evolving needs. 10
Appendix A Ring Topologies NOTE: Most data center networks employ some form of automatic protection switching (APS) between diverse transmission facilities to attain the level of availability required to support critical business operations. In a ring topology, this means the data center network is using either Unidirectional Path Switched Ring (UPSR) or Bidirectional Line Switched Ring (BLSR) protection. While the details of these protection mechanisms are beyond the scope of this appendix, it is important to note that these technologies provide a failover mechanism that requires less than 50 milliseconds to complete. With this instantaneous failover in the event of an outage, higher-level protocols and applications continue to operate normally, with no impact to end users or business processes. When an enterprise or carrier has a forecast or a requirement for multiple protected services, or there is a requirement for considerable scalability beyond day-one requirements, data center network designers may choose to overlay one or more protected systems over a DWDM infrastructure. Since current generation metro DWDM systems are capable of implementing up to 40 individual wavelengths in a system, this technique provides scalability well beyond an initial OC-48 or OC-192 system without the need for disruptive day-two system overbuilds or -to-dwdm conversions. The ringtopology architecture depicted in Figure A can service multiple data and storage requirements out of different services delivery platforms ( data and storage client-side interfaces, or service delivery directly from the DWDM platform). DWDM Ring with ring or Linear overlays Figure A: System Overlays on DWDM Ring 11
A variation on the above theme is depicted in Figure B. This is essentially a ring topology at the level, but the underlying DWDM system consists of separate linear optronics systems implementing each span in the ring. This architecture offers increased resilience as each optical span operates independently of the other, and one DWDM optical add-drop multiplexer () failure will only affect a single span. This approach is common in high-end designs where the enterprise is targeting a greater-than 99.99999% availability in the overall transport system at an additional cost. Linear spans with ring overlay Figure B: Ring Topology with Linear DWDM Spans Linear topologies An enterprise typically chooses the following type of architecture to achieve optimal latency performance over a primary route, while having a suboptimal backup route available between two facilities in the event of a failure. Often different carriers provide the diverse routes, with either one carrier or the enterprise performing protection switching between the two diverse facilities. Carriers would typically perform APS via a multiplexer, while enterprises may employ their own multiplexer, IP switches or redundant SAN fabrics across the diverse facilities. Figure C illustrates both of these types of equipment subtending off of the DWDM equipment. 12
Data Center Metro POP POP Data Center Long Haul Wave Metro Metro Long Haul Wave Metro Diverse DWDM optronics; APS provided by mux (could) choose redundancy vs. APS) Diverse PoPs LH will be one or more waves unless aggregate is <OC-48 Figure C: Diverse Linear Service with and Layer 3 APS Ringed metro, linear long haul In what many consider to be the most robust and scalable scenario, enterprises build ringed metro solutions that connect to carrier hotels so that they can implement large, long-haul circuits that are unprotected between carrier POPs, but protected in the metro. This architecture allows enterprises to plan data center networks with predictable and optimal latencies between geographically dispersed locations while having an element of protection in the metro, where most transport failures occur. When combined with the diverse services of multiple carriers, this topology can overcome most transport-failure scenarios between data centers. Figure D illustrates this basic topology. Dual Points of Entry (POE) in all buildings. Typically, 2-4 nodes 1-2 data centers 1-2 service provider PoPs Unprotected IOC/LH POP POP Data Center Data Center POP Figure D: Linear Long Haul with Protected Metro 13
The final type of data center build explored here provides a combination of resilient and scalable metro and long-haul transport solutions that enable an enterprise to implement a combination of synchronous and asynchronous replication strategies between more than two data center locations while accommodating future growth and other (e.g., branch office to data center) networking requirements. Long Haul OCX or Wave Hand-off into carrier LH Network optical DXC or direct into LH DWDM platform, depending on service and carrier Figure E: Linear Long Haul with Synchronous and Asynchronous Replication Strategies 14