Huawei Service-Driven Distributed Cloud Data Center (SD-DC 2 ) White Paper Prepared by Chen Jian 00224079 Date 2015-02-13 Reviewed by Yang Min 00224058 Wang Jinjun 00257498 Li Ke 00246921 Date 2015-02-13 Approved by Li Yang 00116887 Date 2015-02-18 Authorized by Date Huawei Technologies Co., Ltd. All rights reserved
Contents 1 Huawei SD-DC 2 White Paper... 1 1.1 Background... 1 1.2 Huawei SD-DC 2... 2 1.2.1 Hardware Restructure... 3 1.2.2 SDx... 5 1.2.3 Service Driven... 8 1.2.4 Data Innovation...10 1.3 Evolution Plan... 11 1.3.1 Phase One: From Multiple Virtualization Chimneys to a Converged Resource Pool... 11 1.3.2 Phase Two: From a Converged Resource Pool to an SD-DC 2...12 1.4 Summary...13 ii
1 Huawei SD-DC 2 White Paper 1.1 Background After the Industrial Revolution in the 19th century and the popularization of the Internet in the century after that, society today is entering the Information Age and digitization is sweeping across all sectors of society. The business processes and IT architecture are critical to successful digital transformation for enterprises of all sizes. As the transformation-enabling platform, the IT system is required to keep up to address such pressing challenges as: Making services more agile Different from the conventional IT system, which bears only the internal office and information management system and serves as a cost center, the IT system in the digital age is fully integrated with the production system and functions as the core platform supporting business development. An ideal IT infrastructure must fulfill two goals: Shorten the rollout period of new services. The new IT system is required to provide computing, storage, and network resources required for service rollout in the shortest time. Quickly respond to more innovative services and cope with Big Data challenges. During the business operation, a great deal of service and user data is generated. An ideal IT system is required to quickly consolidate and analyze massive data so as to uncover large hidden values to accelerate service innovation and business growth. Streamlining management The traditional IT infrastructures are closely dependent on the application system, which causes IT silos, repeated construction, and waste of resources. An open IT platform, with services and resources decoupled, is required to realize on-demand resource scheduling to increase resource utilization and reduce costs. As different service systems require different availability and performance from the IT infrastructure, the new IT infrastructure is required to manage and analyze IT resources based on service requirements and provide differentiated services. Creating more open platforms IT infrastructure transformation cannot be completed in a day. The traditional IT infrastructure and the cloud-computing-based architecture may coexist for a certain period time. An open IT infrastructure is critical to the IT infrastructure evolution for enterprises of all sizes. 1
The new IT infrastructure is required to allow service deployment in the cloud without recompiling applications. It is a real challenge to implement seamless deployment of applications over the cloud computing architecture. An ideal IT infrastructure is able to support an enterprise's legacy IT devices, maximize return on investment (ROI), and realize smooth evolution. IT technology updates fast, with diverse open-source software. It is critical to use the most appropriate IT technology to avoid a waste of time, manpower, and financial resources. The Internet cloud architecture has features, such as resource pooling, scalability and elasticity, distributed architecture, and centralized management. However, this architecture cannot be widely used if it cannot support enterprises' core applications, legacy hardware, and existing services and implement easy management and operation. Based on the concepts of agile services, efficient management, and open cooperation, Huawei launches the service-driven distributed cloud data center (SD-DC 2 ), which optimally integrates the features of the Internet cloud architecture and traditional IT architecture. This document describes how the SD-DC 2 integrates the core technologies and helps enterprises to cope with service challenges in the digital economy era. 1.2 Huawei SD-DC 2 Based on its deep understanding of IT and enterprise services and years of engagement in the industry, Huawei launches the SD-DC 2 the reference architecture designed for the next generation IT in the digital era. The SD-DC 2 allows agile services, efficient management, and open cooperation through hardware restructure, software-defined, service-driven, and data innovation technologies. 2
1.2.1 Hardware Restructure Server With rapid development of technologies and applications, enterprise services pose high requirements for the reliability, performance, maintainability, and costs of servers and storage devices. By leveraging its special expertise in chip, accelerator, and machine design as well as the latest processor technology, Huawei offers a rich portfolio of servers and storage products for high reliability, performance, cost-effectiveness, and easy maintenance. To improve comprehensive competitive strength, Huawei focuses on servers of vertical scaling (scale up), horizontal scaling (scale out), and convergence for data center applications. Huawei also has advantages in chip R&D and SSD card design. Scale up: servers with outstanding performance In addition to mainstream servers, Huawei provides high-end servers, such as RH8100, to meet high performance, reliability, and availability required by enterprise core application systems, such as databases. The RH8100 is an 8-socket rack server using Intel Xeon E7-8800 v2 processors and many Huawei advanced technologies. Compared with servers of the similar kind, the RH8100 offers higher reliability, better 3
performance, more advanced architecture and open management platform. It is ideal for enterprise core services, in-memory databases, virtualization, and high-performance computing (HPC). Scale out: servers allowing high scalability Different from conventional applications, the applications in the digital economy era require distributed architecture and massive data processing. The architecture and design of conventional servers cannot meet these requirements. Huawei provides distributed multi-node servers, such as X8000, for Internet and Big Data applications. The X8000 is a high-density server boasting high energy efficiency and easy maintenance. Its innovative architecture allows a maximum of 80 computing nodes or 40 storage nodes. An X8000 supports up to 160 Intel Xeon processors when fully configured with compute nodes and provides 2 PB storage space when fully configured with storage nodes. Converged: servers with optimal integration and simplicity Most application systems adopt the modular structure, where the computing, storage, and network are loosely coupled. The modular structure allows high flexibility and meets requirements of common applications. However, an increasing number of applications are required to process a great deal of user data in a short period of time. The loosely coupled architecture has bottlenecks in bandwidth and latency. To address this problem, Huawei launches the E9000 blade server, which integrates computing, storage, switching and management. The E9000 server provides high availability, computing density, and energy efficiency. It has distinguished advantages in backplane bandwidth, intelligent control and services, flexible configuration and expansion of computing and storage resources, low network delay, and application acceleration. Storage Virtualization, cloud computing, and Big Data technologies and enterprise business expansion drive collaboration, which poses new data management requirements. For example, heterogeneous resource consolidation demands unified data management, multi-service bearing requires a data sharing platform, and on-demand space deployment demands elastic smooth expansion. Real-time data encapsulation and periodic archive of historical data are mandatory for quick response. Massive data management involves transmission, storage, mining, and archive of PB of data. To meet these requirements, Huawei launches the OceanStor V3 series storage based on the converged data architecture to help customers to build future-proofing cloud data management systems. The OceanStor V3 uses the industry-leading hardware platform, highly efficient management software, and OceanStor OS to implement multi-level convergence. The convergence of SAN and NAS increases the IT management efficiency by 50%. The convergence of the flash memory and conventional storage media offers optimal balance between performance and capacity. The convergence of production and backup implements unified data lifecycle management and reduces costs. The convergence of low-end, mid-range, and high-end storage ensures free data flow. The convergence of multi-vendor storage devices consolidates enterprise's legacy facilities and maximizes ROI. The OceanStor V3 uses innovative RAID 2.0+ and decentralized and distributed technologies to implement intelligent data distribution in cloud data centers and efficient global data sharing. 4
Facility The Huawei data center facility consists of intelligent micro modules. A single module includes cabinets, power distribution system, cooling system, and monitoring system. Each module can independently support services and be easily applied or repeated elsewhere. The power supply and cooling systems collaborate with IT devices to implement on-demand power supply and cooling. The Huawei data center facility also supports multiple natural cooling technologies, reducing the PUE to 1.2. 1.2.2 SDx Huawei FusionSphere is a cloud operating system based on OpenStack. With reliability and performance enhancement on enterprise applications, Huawei FusionSphere defines the computing, storage, and network resources in enterprise data centers as shared resource pools for applications and implements unified resource management and scheduling for services. By leveraging the open architecture of OpenStack, Huawei FusionSphere provides high compatibility with legacy IT resources of enterprises. 5
Software-defined Computing Software-defined Storage Huawei software-defined computing uses bare-metal virtualization to virtualize physical resources, such as CPUs, memory, and I/O of servers, into a group of logical resources that can be uniformly managed, scheduled, and allocated. The logical resources are used to create multiple virtual machine (VM) running environments that are mutually isolated on a physical server to improve resource utilization. It implements dynamic resource allocation interacting with the OpenStack Nova service seamlessly. The virtualization engine consumes less than 5% of physical CPU resources, improving the server resource utilization by 80% and reducing the IT deployment cost by more than 30%. In addition, technologies, such as CPU binding, NUMA affinity scheduling, huge memory pages, and second-level hardware fault detection, are used to ensure a carrier-class computing virtualization platform with high performance and reliability for enterprise core services demanding real-time and high stability. Huawei FusionSphere has demonstrated leading performance in the SPECvirt tests. It provides a latency less than 20 s, meeting the latency requirements for using virtualization technologies in base station controllers (BSCs) in the telecom industry. This latency is far less than the latency offered by mainstream virtualization products. Huawei software-defined storage uses a distributed storage engine to seamlessly interwork with the OpenStack Cinder service to provide a cost-efficient storage solution for enterprise applications. Software-defined storage allows the local hard disks of universal servers to form a virtual resource pool. This virtual resource pool supports convergence of computing and storage devices and can replace external storage devices. Software-defined storage uses distributed management clusters, distributed hash-based routing algorithm, distributed and stateless engines, distributed intelligent cache. This distributed architecture eliminates single points of failure (SPOFs) and greatly improves IOPS and throughput performance. The data slices scattering in the resource pool enable automatic concurrent data rebuild in the resource pool. It takes less than 15 minutes to rebuild data of 1 TB for SSD storage. The distributed and stateless engines allow horizontal expansion, implementing concurrent smooth expansion of computing and storage nodes. 6
Software-defined Network It is difficult for data centers to implement automated service rollout, unified resource allocation, and associated fault detection if the network devices are located in an isolated physical network while software-defined computing and storage are implemented. To address this problem, Huawei provides the software-defined network solution. By using the Layer 2 Tunneling Protocol (L2TP) of VxLAN, Huawei Agile Controller implements automated configuration and deployment of the software-defined network (SDN), SLA service quality control, and multi-tenant isolation. In addition, basic network services, such as Loadbalance, DHCP, routing, and firewall, are provided to reduce the networking cost and improve network flexibility. The SDN provides high automation, intelligence, and application-oriented dynamic programming. Open OpenStack Architecture Enterprise data centers are evolving step by step. During the evolution process, physical and virtual resources coexist and heterogeneous virtualization exists. During the gradual evolution of the service driven data centers, the existing IT infrastructure cannot be utterly changed, so it must be compatible with the existing IT resources as much as possible. In this situation, open cloud platform architecture is developed. Using open southbound and northbound interfaces, the open-source OpenStack-based Huawei FusionSphere can be compatible with any third-party hardware, virtualization technologies, and management systems that support OpenStack industry standard. In scenarios where ultra-large-scale data centers are distributed across regions, a single OpenStack instance will be the bottleneck. To resolve this problem, Huawei provides the OpenStack cascading solution, which allows the data center scale to be expanded to 100,000 physical servers and 1,000,000 VMs. In addition, the cascading OpenStack can interconnect with multiple OpenStack versions provided by different vendors and expose the unified OpenStack API for upper-layer applications. Applications do not need to sense the differences of different vendors OpenStack versions. Based on this technology, the ROI is protected to the maximum extent and the data center scale expansion requirements, which are driven by the development of enterprise services, are fulfilled, realizing smooth cloud-based data center evolution. 7
1.2.3 Service Driven VDC In different phases of enterprise service development, cognition and selection of technologies are different. As support systems of enterprise service development, traditional data centers are encountered with problems brought by intensive construction, including high investments, long period, high energy consumption, low resource usage, system isolation, slow service response, and high-operating expense (OPEX) operation modes. For example, during the two-decade development process, based on the technological capabilities and service requirements of different phases, a global enterprise has built more than 85 data centers of different scales, owned more than 5,000 servers of different specifications, and supported more than 4,000 types of applications and more than 10 types of heterogeneous databases. The heterogeneous IT data center supports the enterprise service development in different phases while it may affect the future rapid development due to the high costs on management, upgrade, maintenance, and operation. Enterprises have to invest high workforce costs to maintain the existing large-scale data centers and the services and explore new architecture at the same time. Huawei innovates the virtual data center (VDC) and ManageOne solutions to resolve these problems. VDCs are used to consolidate the enterprise data centers that are distributed all over the world. Resources in a VDC can be from a physical data center, or multiple physical data centers that are located in different regions. De-regionalization, data center customization, automation, resource servitization, and service diversification are the main features of VDCs. The biggest benefit about VDC is time to market (TTM). VDCs can now be constructed in a day rather than months. Services can be initiated in minutes rather than days. Also, VDCs can be defined based on data center requirements such as SLA (gold, silver, and bronze levels). So, VDCs can be delivered according to the various requirements needed by the services running on the VDCs. Owners of VDCs can manage and schedule the resources in these VDCs in a unified manner as well as manage the service users and approve their service requests. Users in VDCs can manage their own IT resources, including online applying for, expanding, extending, monitoring, and maintaining the resources. In addition, open APIs are provided so that customers who have customization requirements can perform secondary development, 8
meeting personalized requirements. Based on this technology, the enterprise overall IT efficiency is improved, and VDCs can be flexibly created and released, meeting the requirements of quick provisioning of data center resources, which are driven by the enterprise service development and changing. Unified Data Center Management (ManageOne) With the development of enterprise services, data centers are distributed across different regions, and new service systems are added to the existing data centers. Multiple service systems, management systems, and data centers will make data center management more and more complex. With the emergence of cloud computing, enterprise data center IT architecture gradually evolves and clouds and non-clouds coexist. Enterprises need to manage physical servers, VMs, and remaining expensive midrange computers, even manage internal private clouds, and part of the leased public cloud services. A good and efficient data center management system needs to streamline the heterogeneous IT environment and make complex IT management simple, systematic, manageable, and controllable, meeting enterprises' requirements for high resource usage, rapid service rollout, and simplified O&M of data centers. Huawei ManageOne offers an efficient, intelligent data center management solution that automates and intelligentizes management of cloud services and resources. Multiple data centers can be managed as one, implementing physically decentralized but logically centralized management mode and improving maintenance efficiency. One data center can be used as multiple data centers, providing different departments and services with appropriate resource services that match the requirements, improving resource utilization, and accelerating service rollout. Based on the data center management platform, administrators can use unified views to implement unified management of multiple data centers, cloud and non-cloud, heterogeneous virtual platforms, and operation and maintenance, greatly improving data center O&M efficiency. 9
1.2.4 Data Innovation During the service development and digital technological revolution of enterprises, a large amount of customer data and enterprise operation data are accumulated. According to New Moore's Law, the data generated in every 18 months is the total amount of the data generated in the history, that is, the amount of data doubles every 18 months. The mobile Internet and Internet of Things become the catalyst of massive data. Enterprises have also realized that big data analysis and control are necessary for service innovation. The motivation for enterprise exploration of Big Data is shifting toward value creation, away from the original cost-driven model. Enterprise data center IT systems are evolving from the operation support systems to the service innovation systems. Data is changing enterprise business modes and improving customer experience. The Huawei FusionInsight big data platform is a unified enterprise-level data storage, query, and analysis platform. It is used to quickly build massive data processing systems for enterprises. Based on the real-time and non-real-time analysis and mining of massive data, FusionInsight helps enterprises to obtain values from massive data, discover risks, and make decisions upon opportunities in a timely manner. FusionInsight is a completely open big data platform. It can run on any standard x86 server, and does not need any dedicated hardware or storage device. To meet the O&M and application development requirements of the data-intensive industries such as finance and carriers, FusionInsight provides the reliable, secure, and easy-to-use O&M system and full-data modeling middleware, and forms a series of solutions for typical enterprise service scenarios, including data center O&M log analysis, historical data query, real-time event handling, and customer feature portrait. FusionInsight enables enterprises to extract value from massive data in a fast, accurate, and reliable manner. 10
1.3 Evolution Plan Construction of traditional enterprise data centers is a step-by-step process based on service requirements in different phases. This construction mode lacks unified planning and operation. As a result, many isolated nodes appear, causing great difficulty in service interconnection and data sharing. The increasing scale of data centers brings great pressure on follow-up construction and maintenance. In addition, traditional data centers require long construction period and cannot meet urgent service requirements, restricting the steps of enterprise service innovation. Based on its own service development and IT construction experience as well as the target architecture of SD-DC 2, Huawei suggests that traditional data centers evolve to SD-DC 2 in the following steps: 1.3.1 Phase One: From Multiple Virtualization Chimneys to a Converged Resource Pool Transformation of data centers mostly starts from data center virtualization. Resource virtualization enables enterprise application systems to share data center resources to improve resource usage. In this phase, the differences of devices are eliminated, and a unified resource pool is formed for users to use it based on their requirements. A unified and shared resource pool, instead of independent devices, is displayed. The resource pool supports elastic scaling based on different service requirements. Virtualization is implemented step by step along with the enterprise service development. In different phases, different virtualization technologies are selected to build service systems, and multiple service system virtualization chimneys are generated. These virtualization chimneys cannot be managed in a unified manner and resources cannot be shared between different virtualization chimneys, resulting in new problems in data center O&M. Huawei's innovative solution integrates the existing non-virtualization resources and heterogeneous virtualization resources to build an actual converged resource pool. 11
1.3.2 Phase Two: From a Converged Resource Pool to an SD-DC 2 With further development of enterprise services, the services are expanded to multiple regions. To support these services, data centers also need to be distributed across different regions. After the resources in a single data center are converged, unified management and resource sharing among distributed data centers also need to be implemented. Resource servitization and service differentiation are the main features of this phase. Based on orchestration and automation technologies, resources in cross-region data centers can be applied for in a self-service manner and resources can be automatically provisioned after being approved, thereby implementing resource servitization. Using VDCs, resources of different SLA attributes in the converged resource pool can be abstractly encapsulated and reliable security isolation is provided. In this mode, different resource level requirements of services can be met and service differentiation is implemented. Through SDN networks, cross-data center VDCs are built, allowing resource sharing between cloud data centers that are distributed across different regions. Based on requirements of different services or departments, enterprises can flexibly create or apply for VDCs, perform self-service maintenance on the physical and virtual resources in VDCs, and perform mixed orchestration on the physical and virtual resources. Data center becomes a cloud service (DCaaS) to agilely meet enterprise service development requirements. 12
1.4 Summary As a user of data centers, Huawei has embraced high-speed service development under the construction of data centers in the past two decades. As a constructor of data centers, Huawei has built more than 480 data centers all over the world, including more than 160 cloud data centers. As a leader of SD-DC 2, Huawei wants to share more valuable experience with others. For more information about Huawei SD-DC 2, please visit the Huawei official website or contact Huawei sales representatives. 13