How Microsoft IT Developed a Private Cloud Infrastructure



Similar documents
Juniper Networks QFabric: Scaling for the Modern Data Center

SummitStack in the Data Center

Network Virtualization and Data Center Networks Data Center Virtualization - Basics. Qin Yin Fall Semester 2013

SummitStack in the Data Center

Data Center Networking Designing Today s Data Center

Radware ADC-VX Solution. The Agility of Virtual; The Predictability of Physical

Brocade Solution for EMC VSPEX Server Virtualization

M.Sc. IT Semester III VIRTUALIZATION QUESTION BANK Unit 1 1. What is virtualization? Explain the five stage virtualization process. 2.

Data Centers. Mapping Cisco Nexus, Catalyst, and MDS Logical Architectures into PANDUIT Physical Layer Infrastructure Solutions

Radware ADC-VX Solution. The Agility of Virtual; The Predictability of Physical

Navigating the Pros and Cons of Structured Cabling vs. Top of Rack in the Data Center

Simplifying the Data Center Network to Reduce Complexity and Improve Performance

Ethernet Fabrics: An Architecture for Cloud Networking

Simplifying Data Center Network Architecture: Collapsing the Tiers

IBM BladeCenter H with Cisco VFrame Software A Comparison with HP Virtual Connect

Lufthansa Systems Uses Hybrid Cloud to Trim IT Delivery to Hours and Reduce Costs

Walmart s Data Center. Amadeus Data Center. Google s Data Center. Data Center Evolution 1.0. Data Center Evolution 2.0

BUILDING A NEXT-GENERATION DATA CENTER

Chapter 1 Reading Organizer

Brocade One Data Center Cloud-Optimized Networks

OPTIMIZING SERVER VIRTUALIZATION

EMC Integrated Infrastructure for VMware

A Platform Built for Server Virtualization: Cisco Unified Computing System

The Benefits of Virtualizing

How To Design A Data Centre

Datacenter Efficiency

Private cloud computing advances

Visibility in the Modern Data Center // Solution Overview

Fabrics that Fit Matching the Network to Today s Data Center Traffic Conditions

Building Tomorrow s Data Center Network Today

Building the Virtual Information Infrastructure

LAYER3 HELPS BUILD NEXT GENERATION, HIGH-SPEED, LOW LATENCY, DATA CENTER SOLUTION FOR A LEADING FINANCIAL INSTITUTION IN AFRICA.

What Is Microsoft Private Cloud Fast Track?

All-Flash Arrays Weren t Built for Dynamic Environments. Here s Why... This whitepaper is based on content originally posted at

Concepts Introduced in Chapter 6. Warehouse-Scale Computers. Important Design Factors for WSCs. Programming Models for WSCs

A 10 GbE Network is the Backbone of the Virtual Data Center

Windows Server 2008 R2 Hyper-V Live Migration

Data Center Network Evolution: Increase the Value of IT in Your Organization

Using Multi-Port Intel Ethernet Server Adapters to Optimize Server Virtualization

IBM Global Technology Services March Virtualization for disaster recovery: areas of focus and consideration.

Cloud Networking: A Novel Network Approach for Cloud Computing Models CQ1 2009

Upgrading Data Center Network Architecture to 10 Gigabit Ethernet

Data Center Evolution without Revolution

Core and Pod Data Center Design

OVERLAYING VIRTUALIZED LAYER 2 NETWORKS OVER LAYER 3 NETWORKS

Unified Computing Systems

Managing the Real Cost of On-Demand Enterprise Cloud Services with Chargeback Models

Extreme Networks: Building Cloud-Scale Networks Using Open Fabric Architectures A SOLUTION WHITE PAPER

Business Case for BTI Intelligent Cloud Connect for Content, Co-lo and Network Providers

Virtualized Security: The Next Generation of Consolidation

I D C T E C H N O L O G Y S P O T L I G H T. I m p r o ve I T E f ficiency, S t o p S e r ve r S p r aw l

Cloud Service Provider Builds Cost-Effective Storage Solution to Support Business Growth

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation

Enterprise Cloud Services HOSTED PRIVATE CLOUD

CLOUD NETWORKING FOR ENTERPRISE CAMPUS APPLICATION NOTE

Scalable Approaches for Multitenant Cloud Data Centers

The Impact of Virtualization on Cloud Networking Arista Networks Whitepaper

Software-Defined Networks Powered by VellOS

Cisco, Citrix, Microsoft, and NetApp Deliver Simplified High-Performance Infrastructure for Virtual Desktops

APPLICATION NOTE. Benefits of MPLS in the Enterprise Network

WHITE PAPER. Copyright 2011, Juniper Networks, Inc. 1

Extreme Networks: Public, Hybrid and Private Virtualized Multi-Tenant Cloud Data Center A SOLUTION WHITE PAPER

Cisco Unified Data Center

Windows Server 2008 R2 Hyper-V Live Migration

FEAR Model - Review Cell Module Block

PROPRIETARY CISCO. Cisco Cloud Essentials for EngineersV1.0. LESSON 1 Cloud Architectures. TOPIC 1 Cisco Data Center Virtualization and Consolidation

SPEED your path to virtualization.

Top of Rack: An Analysis of a Cabling Architecture in the Data Center

Expert Reference Series of White Papers. Planning for the Redeployment of Technical Personnel in the Modern Data Center

GUIDELINE. on SERVER CONSOLIDATION and VIRTUALISATION. National Computer Board, 7th Floor Stratton Court, La Poudriere Street, Port Louis

Enabling an agile Data Centre in a (Fr)agile market

Photonic Switching Applications in Data Centers & Cloud Computing Networks

TRILL for Service Provider Data Center and IXP. Francois Tallet, Cisco Systems

The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment

VMDC 3.0 Design Overview

Comparing Multi-Core Processors for Server Virtualization

PRODUCTS & TECHNOLOGY

Data Center Convergence. Ahmad Zamer, Brocade

Ethernet Wide Area Networking, Routers or Switches and Making the Right Choice

Enterprise Storage Solution for Hyper-V Private Cloud and VDI Deployments using Sanbolic s Melio Cloud Software Suite April 2011

Cloud-ready network architecture

EMC Backup and Recovery for Microsoft SQL Server

Microsoft s Cloud Networks

HBA Virtualization Technologies for Windows OS Environments

EMC XTREMIO EXECUTIVE OVERVIEW

Impact of Virtualization on Cloud Networking Arista Networks Whitepaper

Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V

Virtualizing the SAN with Software Defined Storage Networks

Transcription:

Situation Microsoft IT wanted to reduce lab space server sprawl and introduce a new level of management and support efficiency. The facility needed to be both efficient and flexible enough to support the research and development needs of the different product groups. Solution MSIT built an energy-efficient, flexible, high-density facility that meets the needs of the research and development community at Microsoft and is able to host private clouds that provide infrastructure as a service (IaaS). Benefits Reduced the footprint of on-campus lab space Efficiencies in scale and facility design provide power-consumption savings Virtualization reduced the number of required physical systems and the amount of resources required to manage them Offering IaaS through the private cloud shortened the time it takes to deploy systems and reduced variations in the deployment of systems Products & Technologies Hyper-V Private cloud virtualization Infrastructure as a Service (IaaS) Storage area networks (SANs) How Microsoft IT Developed a Private Cloud Infrastructure Published: August 2011 Microsoft IT created an efficient, flexible Research and Development (R&D) facility that serves the development and test environments at Microsoft. Learn how Microsoft IT leveraged the flexibility and density of the facility, along with the supporting network to develop a private cloud infrastructure that uses cutting edge technology to provide infrastructure as a service and support the needs of the internal businesses. To reduce lab space server sprawl and introduce a new level of management and support efficiency, Microsoft built a flexible facility that reduced the footprint of private research, development, and test labs spread across the campus. This facility offered a flexible and efficient infrastructure able to meet the varying demands of all the individual product teams within the company. The facility marks a milestone in the cultural shift under way at Microsoft from the traditional model, where product groups managed their own labs in an office building, to a centrally managed and more energy-efficient alternative. Many of the facility s benefits are realized through the economies of scale, remote hosting methodologies, and efficient infrastructure as well as stretching the distance between developers and physical systems. Creating separation from systems with respect to proximity reinforces a new nature of code development being inherently remote and in the cloud. Once the facility was built, Microsoft IT (MSIT) wanted to fully leverage its capabilities to offer a private cloud that provided an elastic and infinitely scalable infrastructure as a service platform. This allowed the R&D community at Microsoft to dynamically scale their application development environments up or down, on demand, paying only for their actual consumption. By building a private cloud with a very large centrally procured resource pool, MSIT was able to realize the benefits of higher density, fewer resources, and lower costs per resource. MSIT built a private cloud that provided virtual infrastructures that the research, development, and test business customers at Microsoft would have been looking to build for themselves. In doing so, MSIT was able to reduce: The number of physical systems. The sprawl of the systems. The cost of procuring and managing the systems. The number of resources required to manage the systems. The time the systems took to be deployed. Variations in the deployment of the systems.

MSIT needed the systems to be available in a predictable amount of time. They needed the systems to look and behave the same way every time. And this had to be accomplished without a great deal of human intervention. This could have been done without virtualization, but it would have been more difficult and not as cost-effective. Currently, the facility has tens of thousands of virtual machines, representing multiple private clouds. While private clouds represent only a fraction of what is physically hosted within the facility, the number of virtual machines well exceeds the number of physical devices. This technical case study provides a high-level description of how the facility, the network, and fabric, and the private cloud build upon each other s capabilities to provide an infrastructure as a service offering. It also provides some best practices and insights that MSIT developed during the planning and deployment of the private cloud hosted within the R&D facility. This case study should not function as a procedural roadmap, however, because operational environments differ among organizations. Facility Infrastructure MSIT created an efficient, flexible Research and Development (R&D) facility at Redmond Ridge that serves the development and test environments at Microsoft. The opening of this facility represented a transition point in the company culture because in the past the vast majority of the development and test community at Microsoft was developing products in private lab spaces on campus. With usable floor space of about 34,000 square feet, the facility replaces more than 180,000 feet of office space that would have been converted, at a large expense, to support R&D lab space within office buildings. As opposed to other lab spaces within Microsoft, the facility was designed and purposely built to be a research and development space, providing a common environment with uniform densities and cooling to provide resiliency across the environment without favoring one particular area of the lab (R&D space). It also features a robust fiber backbone for distribution to the entire environment. This design allows MSIT to deliver stable lab space and provide the flexibility to meet various clients needs, including those required to host private clouds. The facility s design criteria fell into three major themes: Resiliency over redundancy High-density delivery Energy efficiency Resiliency over Redundancy This concept is present in the design of several important components within the facility. All of the pods are UPS backed. The dynamic nature of the eight-generator system allows for each pod within a cluster to be prioritized as required, based upon current business need such as during a key development or test cycle. The operations staff can dynamically edit the load shed procedure to provide coverage as required by the business. The local utilities provider has provided some level of redundancy by offering dual 25 megawatt power connections into the building from a dedicated power substation. Delivering High-Density MSIT delivered higher than industry standard density through increased use space capabilities within racks. The facility is fitted with 52 U racks with 51 U of usable space per How Microsoft IT Developed a Private Cloud Infrastructure Page 2

rack. All racks are equipped with 15 kilowatts (kw) of usable power. There are 48 pods (8 rows of 6 pods) separated into two groups of 24 on either side of a central control/work area. Energy Efficiency The building was designed to have a Power Usage Effectiveness (PUE) of 1.3 or lower and has been operating at 1.2. Because MSIT maximized density, having a smaller footprint inherently contributed to power saving. A smaller facility requires less lighting, needs less wiring, and has less square footage to maintain with climate control systems. All of the space lighting is on sensor systems that detect the presence of individuals in the work areas and only lights an area as required, based upon presence. Climate control is a major power-consumption consideration when designing a data facility. MSIT decided to use evaporative coolers and fully contained hot aisles and a shared cold supply air space that flows around all pods within the room. Positive pressure is kept within the supply aisles, with negative pressure inside of the contained hot aisle, further facilitating air flow. Blanking of empty spaces is also utilized to maintain proper pressure differentials between supply and return spaces. Evaporative coolers are more energy efficient than normal chillers and fully contained hot and cold aisles allow for greater efficiency of the cooling system and heat removal from the devices. Network and Fabric Infrastructure In determining the best design for a network service for the facility, MSIT took into consideration the shift from one server-one IP address to private clouds that offer multiple virtual hosts per server. Emerging server technologies and Hyper-V capabilities have indicated an increasing need for MSIT to maintain enough network capacity to stay ahead of the curve. As more services transition into the cloud, MSIT will need to design networks that accommodate the product groups growing need to test their next generation of software products that operate on adhoc schedules. Accommodating network design in a progressive facility, for an industry in transition, required that MSIT focus on the costs of logical capacity, as well as the future capabilities of routers and switches, and oversubscription rates. Other design criteria for the network infrastructure were that it should be easy to operate and that it also included standards-based interfaces to automate against. Selecting Devices To keep costs down and simplify the network environment, MSIT made some tradeoffs and chose to use fewer devices, but the design still had to provide all of the required capacity, so device selection became even more critical. MSIT made a conscious effort to select products that could deliver on both today s requirements and scale up for future capabilities. When looking at the product data sheets that described the capabilities of individual hardware devices, MSIT had to understand the difference in performance capabilities as a single device and as part of a larger environment. Designing the Network Fabric To consolidate many separate groups into a single facility, MSIT needed to deliver access ports to servers with low oversubscription. The scale and density of the facility at Redmond How Microsoft IT Developed a Private Cloud Infrastructure Page 3

Ridge impacted the way MSIT needed to plan for the three logical aspects of network equipment: Layer 2 MAC address tables Layer 3 Address Resolution Protocol/Neighbor Discovery (ARP/ND) Cache Control Plane Protocols MSIT designed and implemented a four-tier network with well-defined roles. Tier Role Responsibilities One Core Links Loopbacks Label Distribution Protocol (LDP) High-speed Layer 3 transport Two Layer 3 Distribution First hop (redundant) Multiprotocol Label Switching (MPLS) Termination Virtual Routing and Forwarding (VRF) Layer 2 virtual circuit (VC) Layer 3 virtual private network (VPN) Security policy Three Layer 2 Aggregation High-speed Layer 2 Switching L2 Loop Management- Spanning Tree Protocol (STP), virtual port channels (vpc) Four Layer 2 Host Access 802.1Q tagging to hosts Table 1. Four-tier network roles and responsibilities Starting from the host up to the core, each role has a predefined oversubscription rate to ensure consistency over the entire facility. With the exception of the Layer 2 Host Access, each device in the Core, Layer 3 Distribution, and Layer 2 Aggregation is a modular chassis with enhancements. MSIT has been working with the network hardware vendor to develop ways to further increase capacity over time. Arriving at Logical Constraints Host virtualization, with anywhere from 8 to 25 virtual hosts per server (with the exception of logical ports), drives the network like a stand-alone server. A single Layer 2 domain of a prior generation may have seen 8,000 to 24,000 hosts; today MSIT is seeing 40,000 to 100,000 in a fraction of the space. This combined with the ongoing IPv6 transition can triple memory requirements. It also significantly increases CPU load with ARP/ND and the control plane traffic from the number of hosts on a network. Delivering for Today While Planning for Tomorrow MSIT s current business model buys the capacity that they require throughout the depreciation cycle and provides opportunities to scale up in the future. MSIT has formed close partnerships with their network vendors to drive feature enhancements and capacity gains within their respective products. MSIT made deliberate decisions to design a network with enough logical capacity and oversubscription to realize the full value of the initial investment over its lifetime. They also How Microsoft IT Developed a Private Cloud Infrastructure Page 4

designed it in such a way to include the possibility of an in-chassis upgrade in the future to further expand its logical capacity. Building a Private Cloud The private cloud provides a way for MSIT to supply the business with virtual machines for as long as they are needed with the configurations that are required. The virtual machines are primarily used for development, test, and quality assurance (QA) activities, but sometimes people from other areas of the business also require a virtual machine to host a file server for a couple of weeks or they need a temporary location to store some data because their facility is going offline. With a private cloud, internal customers no longer need to buy their own servers and contend with deploying and managing them. Alternatively, groups that still want to purchase and run their own systems only need to buy servers for their primary workloads, or steady state capacity, as they can always get additional resources as they are required from the private cloud. Determining Host Requirements Before MSIT could build out host machines, they needed to define what it meant to be successful in the delivery of their workloads. Did all of the virtual machines need to look the same? Are they all going to run the same operating systems? Do they all need the same amount of memory and storage? If the answers to those questions were yes, it would be easy to purchase one server configuration that is optimized for that workload. But if the answer is no, flexibility of the underlying platform (of the private cloud) is required to provide variability in the amount of memory, cores, and storage as well as to accommodate for different operating systems and the network that are going to be assigned to the resources. To get the most efficiency out of deploying in the facility, MSIT purchased fairly homogenous hosts. Having homogenous systems simplifies the management of the cloud components, as long as the systems have sufficient flexibility (storage, network, etc.) to meet the varying virtual machine requirements. This resulted in lower overall operational cost. Memory, Storage, and Processors While building the private cloud, MSIT had to decide whether to constrain on memory, storage, or processors to keep costs down. Processors are not currently a limiting factor, but memory and storage can be. Memory is more expensive than storage, so MSIT maximized the usage of memory by putting as many virtual machines on a host as possible. One important hardware selection criteria was whether to use hosts with onboard storage, or blades with array-based storage. Typically cost ($/raw GB) favors the selection of local storage, and flexibility and reliability favor array-based storage. With the high variability of storage demands from virtual machines in a private cloud, sizing local storage becomes more challenging because there is either a risk of stranding memory by not having enough storage, or stranding storage by over-purchasing to meet peak demands. With local storage, each node must have a buffer to accommodate all potential requests, whereas on an array, only one buffer is needed. To ensure that the purchased hosts could support memory and storage requirements, MSIT also had to address another challenge. How would they ensure that what they were buying would support, for example, a virtual machine with 2 gigabytes (GB) of memory and 50 GB of How Microsoft IT Developed a Private Cloud Infrastructure Page 5

storage at the same time it is supporting a virtual machine with 16 GB of memory and a terabyte of storage? MSIT chose to solve that problem through the use of shared storage; in this case the total buffer required on the arrays was much lower than the sum of the buffers on local storage, reducing the cost advantages of local storage. Since MSIT was setting up the private cloud in a facility that was optimized for density, they deployed blade servers with fiber channel arrays providing shared storage, via storage area network (SAN). This allowed every host to serve as a compute node and storage was attached to it as needed. Some costs were associated with this fabric-based deployment, but there were advantages as well. One was the resulting flexibility if MSIT needs 300 GB of storage for one host and 3 terabytes for another, a single host configuration can support both of these workloads without any wasted storage space. There was also a specific type of workload identified that was very storage-intensive that leveraged discrete servers with a high density of low-cost local storage. Even with some of the wasted storage capacity, lower costs were associated with using local storage for those workloads. Optimizing Pods to Host the Private Cloud Once MSIT determined what their workloads required and decided on configurations for storage, network, and compute, they communicated the requirements to the facilities team. MSIT and the facilities team then worked together to determine what changes would be needed to support those requirements. Higher density. When MSIT deployed the blade and SAN-based solution, they needed higher density and removed the keyboard, video, and mouse (KVM) and the top of rack (TOR) Ethernet switches, which are included in the blade chassis. Power needs. The private cloud also had higher power needs in their blade racks than a traditional server rack would need. Rather than the two 30-amp three-phase power strips that were in the rack, they increased them to two 50-amp three phase. This was easily accomplished given the modularity of the bus duct system because the system allows for easy replacement of circuit modules within pods. Network. MSIT needed additional fiber cable between each rack and the networking rack to account for the fiber channel storage. Instead of utilizing the building routers, MSIT put one within the pod with the servers and storage so that the pod owned its own network. MSIT did that for two reasons: complexity and isolation. MSIT needed to host multiple different networks, so rather than dealing with different uplinks going into different places, MSIT aggregated them in the pod and then distributed to the hosts from there. And at the scale that they were going to deploy, it made more sense to dedicate the traffic rather than use a shared router. Deploying Hosts MSIT can quickly add capacity, enabled by the facility and supported by the choice of homogenous hardware. There was some tooling on the back end for MSIT to deploy, image, and configure the servers that needed to be automated. MSIT authored a number of scripts that automate all of the configuration steps and developed a robust process for quickly adding capacity and building out hosts. The scripts were written in Windows PowerShell and used error handling. Because they were written in PowerShell, MSIT will be able to leverage How Microsoft IT Developed a Private Cloud Infrastructure Page 6

System Center Orchestrator 2012 for further levels of deployment automation without rewriting scripts. Best Practices Enabled Self-Service To address one of the challenges in driving developers to adopt virtual machines in the private cloud, MSIT used industry standard KVM solutions to provide developers with a work experience similar to that of working from a physical server at their desk. Using KVM over IP, developers can start and stop services, access the BIOS, access DVD drives, shut down, restart, and do everything they typically did in a day except for change hardware or replace cables. Reducing Operational Costs Homogeny reduces variability in the environment, which lowers support costs. The more complicated a host is made, whether by increasing the number of supported configurations or introducing elements such as shared storage or fabrics, the more it can increase operational costs. Adding things like storage fabric and arrays does add a new operational aspect, the storage administrator. But those costs can be overcome through more efficient use of resources, higher resiliency with redundant fabrics, and with the development of automated processes. Use Any Host for Any Workload In the homogenous environment, it didn't make sense to dedicate certain groups of hosts to certain types of networks. If a host runs out of capacity and other hosts have capacity for the other networks and don t meet their demand, the hosts have to be reconfigured. MSIT combined all of the networks into a single location. Storage was either pre-allocated or dynamically assigned to the host as needed; for all intents and purposes it is a pure compute node. Since it has all of the networks on it, any host can be used for any workload. Proximity Considerations There are cases in the application space where it is necessary for a cloud to span more than one pod. In these cases network traffic is aggregated within the pod and significant amounts of bandwidth is available between the pods. For example, if there is storage in one pod and compute in another, there needs to be good network fabric between those pods for them to work. Conclusion The flexibility of the facility and the network fabric infrastructure provided an ideal location to host the private cloud that MSIT developed to offer IaaS to the business. The design of the facility delivered high density and energy efficiency and the network infrastructure was designed to meet both current and future capacity requirements. Those capabilities helped MSIT successfully build and run the private cloud in a flexible, cost-effective, and efficient environment. Virtualization reduced operational overhead by reducing the required number of physical systems. Depending on the server role, the ratio of physical host to virtual machine can be as high as 8:1. Offering a private cloud that provides virtual machine resources to the research, development, and test teams provided those teams with a way to attain the resources they How Microsoft IT Developed a Private Cloud Infrastructure Page 7

require in a shorter amount of time, with the configuration they required, and for as long as they required, at lower cost than procuring and managing physical systems in individual lab spaces. For More Information For more information about Microsoft products or services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada Order Centre at (800) 933-4750. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information via the World Wide Web, go to: www.microsoft.com www.microsoft.com/technet/itshowcase 2011 Microsoft Corporation. All rights reserved. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft, Hyper-V, and Windows PowerShell are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners. How Microsoft IT Developed a Private Cloud Infrastructure Page 8