HDS UCP for Oracle key differentiators and why it should be considered Computacenter insight following intensive benchmarking test
Background Converged Infrastructures are becoming a common sight in most IT departments. Their adoption increased by 50% year-on-year in the second quarter of 2013, surpassing $1.3 billion in value worldwide, according to IDC. Hitachi Data Systems (HDS) Unified Compute Platform (UCP) is the latest addition to a market containing a range of competitive converged infrastructure solutions. But how does UCP compare against these other offerings? Does it target a specific niche? With regards to reducing costs further, or adding innovative features does HDS shine in any or all of these categories? This document details the results from a configuration and testing exercise undertaken by Computacenter consultants in a proof of concept environment in our World-class testing facility, Solutions Centre. The aim of the testing was to validate UCP s management characteristics as well as the innovative features introduced by HDS in the x86 space. Although our testing focused on running Oracle workloads, UCP should be a suitable platform for any x86 workloads, and therefore should not be considered a niche platform. Computacenter carried out comprehensive testing in order to demonstrate the platform s functionalities as well as the performance characteristics of a typical Oracle workload. While this paper provides a detailed description of the hardware and its unique features, we have produced a second paper that covers the performance benchmark results in more detail. Our paper entitled HDS UCP for Oracle how does it stack up? is available for download at www.computacenter.com Executive summary HDS is approaching the x86 market with technologies originally reserved for the UNIX server market and Mainframes. These technologies are comprised of the following components: Hardware Virtualisation (LPAR) Symmetric Multi-processing (SMP) In addition, HDS has added further functionality through a: PCIe extensions Unit that can be used to provide a large Flash capacity UCP constitutes a highly available, flexible and high performing platform that offers a competitive set of capabilities in this segment of the market. Oracle database and Oracle products generally were in HDS engineers minds when building the solution, as many of the features can assist in hosting Oracle products. 2
UCP hardware capabilities & features Blade chassis includes both compute blades and networking with the Hybrid-IO components that integrate a Storage Area Network (SAN) and Ethernet switch directly into the chassis Blade chassis can be either HDS CB500 or CB2000 An added benefit of UCP being a Reference Architecture as opposed to being a pre-built converged infrastructure solution (although delivered pre-built by HDS) is its ability to be customised as required. The storage array can be made up of different HDS SANs: from the HUS-110; HUS-130; and HUS-150 to the HUS-VM; depending on the customer requirements PCIe expansion is a differentiator for this Reference Architecture and unique in the market, allowing large volumes of Flash storage to be connected directly to the backplanes of the blade servers UCP s management platform is a centralised web application allowing complete control over all components of the UCP, from SAN configuration to initial deployment of the Operating System. The management platform can be integrated within the UCP rack using a (2U, 4U) rack server from HDS or any other vendor. As of today, the management suite runs only on Windows OS. A Linux version is under development. 3
LPAR (Logical Partition) An LPAR is a physical partition utilising a subset of the server s resources. It is a hardware virtualisation technology that is mainly used in the IBM Mainframe and UNIX world. HDS is the first to port this technology to the x86_64 space. LPAR performs a similar function to a Hypervisor (VMware, HyperV, Oracle VM) although uses a different approach. The virtualisation with LPARs is physical, and allows a complete separation of loads, as any resource can be dedicated to a specific LPAR. It relies on a lightweight hypervisor that is loaded within the blade s firmware. An LPAR can have either dedicated or shared resources. In a dedicated mode, the hardware resources are split among the LPARs, and each LPAR has complete access and control over those resources. In a shared mode, however, multiple LPARs can use and share the same resource (an Ethernet port for example). Not all resources can be shared as per Table 1 below: CPU RAM CPU RAM FC ETHERNET PCle Can be used as a dedicated resource Can be used as a shared resource Table 1 Resource Management Migrating a physical partition to an LPAR and vice versa can be performed at any time by reassigning the LUNs (Logical Unit Number storage) to the new LPAR. For example, as well as reconfiguring the network and hardware resources. HDS recently added online migration of an LPAR from one physical partition to another. As with any technology, some limitations apply to LPARs. Most notable amongst these is Oracle licensing. If an LPAR is hosting an Oracle product the whole physical partition needs to be licensed, i.e. all 20, 40 or 80 cores depending on the extent of SMP expansion deployed. Oracle applies Soft Partitioning license rules to LPARs. 4
Hybrid I/O Hybrid I/O HDS has integrated both Ethernet and Fibre Channel switches within the blade chassis in order to consolidate all physical networks handling into a single location. This reduces cabling and configuration efforts. HDS provides the PCIe capability to the blades as well, allowing the chassis/blades to offer functionalities of both blades and larger rack servers. Rear View Integrated LAN & SAN Switches Integrated PCIe Slots 2 PCIe slots per blade Figure 3 Hybrid I/O Two PCIe slots per blade is a limiting factor if a customer is looking to build a large warehouse database with 20TB of data or consolidate databases, when today s largest flash card is 3TB. HDS has overcome this by supporting PCIe expansion. Instead of connecting the aforementioned PCIe cards directly into the chassis, it is possible to add a PCIe expansion to the UCP. This gives each blade access to 16 PCIe slots, with a maximum of 64 flash cards if four blades are joined up in an SMP configuration. At 3TB per card, this yields a maximum of 192TB of raw Flash capacity. As a reminder, PCIe is currently the fastest method to connect storage to compute. 5
Rear View Figure 4 PCIe Expansion Unit The use of PCIe expansion is quite a new development in the market, especially connecting them with 5m cables to a chassis. Fibre Channel is a better option to support distance between storage and compute resources (>100m), but bandwidth is much better for PCIe (x3 or x4). PCIe Flash cards are physically attached to a specific blade. They cannot be migrated automatically during a blade Cold Failover. Further details can be found in the Cold Failover section on page 7. Symmetric Multi-Processing (SMP) The SMP feature offers the capability to scale from one blade to either two or four blades. By joining-up the blades, all resources become part of one larger system including CPU; memory; I/O; and PCIe. This is not a feature usually seen in the x86 market. Figure 5 SMP Configurations 6
All hardware resources of the individual blades are controlled by the new Physical partition, either as a Bare-Metal machine, or in LPAR mode. The SMP feature requires a physical connector as shown in Figure 5. Extending a 1-blade physical partition to a 2-blades configuration requires a downtime of around 45 minutes. This is mainly because of shutdown time and the full memory check performed at blade start up. Cold Failover Otherwise referred to by HDS as the Compute Blade N+M Cold Stand-by, the Cold Failover feature allows a blade to automatically failover to another physical blade in the same chassis or another chassis following a hardware failure. Whilst software clustering technologies are available (for example, Oracle Clusterware or Veritas Cluster), they require an important investment in licensing, design and implementation. HDS Cold Failover works with any application without any modification to the OS or application. However, it only detects hardware failures. Application failures will have to rely on a traditional clustering technology. Figure 6 N+M Cold Failover If a blade or SMP configuration fails, the error is trapped by SVPs (chassis management interfaces) which then notifies HCSM (Hitachi Compute Systems Manager). This then triggers a failover to the same chassis (or another chassis) as long as the blades share the same SAN and networking configuration. The operation can take anytime between 10min to 45min depending on the size of the memory, as a full memory check is performed at blade start up. With PCIe Flash being an internal card, any information contained within that PCIe if used by a Flash Card will be lost during a failover. The blade s operating system (OS) needs to be recognised and certified for this feature to work. For instance, Red Hat 6.x is a certified OS, whereas version 5.6 is not. Please refer to the UCP s manual for a list of certified operating systems. 7
Oracle Enterprise Manager Plug-Ins Hitachi has created two Oracle Enterprise Manager (OEM) plug-ins in order to improve the integration with Oracle databases. Both plug-ins require a license. They are: RMAN Plug-In - While this is a useful feature that can deliver important time savings for large databases, the pre-requisites will not allow its usage in many cases. It can be replaced by a Data Guard architecture, where the production database is replicated to a secondary server using Oracle s data replication technology DataGuard Hitachi Storage Plug-In: Storage Configuration Viewer - Currently, the limitations (OS, and EM version) limit the potential application of this tool. Once these are addressed, this plug-in will be a highly useful tool for DBAs, offering a comprehensive view of the storage layer It should be noted that this is not a feature unique to UCP. These plug-ins can be used with any HDS Storage system. Unfortunately, at the time of writing (October 2013), both plug-ins are only supported with OEM 11g when OEM 12c has been available for a year and many customers are already switching to the latest version. The plug-in for the 12c version is a work in progress, and should be released shortly. The RMAN plug-in can however be executed from the command line and is still usable. Features Licenses LPAR License required Comments Up to 2 (4 in some systems) LPARs can be run without a license Hybrid I/O included N+M Failover Performance Management In HCSM Power Gauge Table 3 UCP Features LIcensing In HCSM. Controls the power consumption in the system 8
Typical Architectures UCP offers a wide range of possible architectures that can be used for Oracle (or any) workloads. This section covers the most common ones. Flash in SAN Oracle RAC When Oracle RAC is required, the disks need to be shared among all nodes, which is not possible using the blades internal PCIe Flash Cards. In this case, it is possible to add SSD or Flash Cards directly into the SAN. The bandwidth and response time will be lower than a pure PCIe configuration. If Flash is required, enterprise class SAN with the tomahawk interface will have to be used, like the HUS-VM or USP Storage arrays. Figure 7 Oracle RAC with UCP Oracle RAC spare blade One of the benefits of using Flash in SAN would be with an n-node RAC with N+M Cold Failover or cluster failover, in order to fully utilise licensed blades. RAC will failover if there is an issue, however, the surviving nodes will be over-utilised. In which case, it is useful to have the N+M Failover (or Cluster Failover) that can failover to a spare server, resuming the normal load on the RAC nodes. Figure 8 Oracle RAC Failover with UCP 9
Flash in Compute This allows the systems to be scaled vertically by using n-way SMP configuration (aggregating blades) and using Flash PCIe cards in the expansion. This architecture has some limitations however, as the Flash Cards are dedicated to a specific blade, and are considered as internal components and not shared ones like a SAN. Figure 9. shows a single blade connected to an expansion unit with four Flash Disks connected to a SAN array. (Flash + SAN architecture) It should be noted that hosting Online Redo Log files and Undo Tablespaces in Flash Disks can degrade performance, as the database uses them for sequential writes. Flash is not particularly well suited to sequential write operations. Ideally, a second ASM DiskGroup which uses hard disks or SSDs should be used for Online Redo Log files and Undo. Please refer to the Computacenter whitepaper entitled HDS UCP for Oracle How does it stack up? for further details. Flash for Buffer Cache Another way of using the available Flash Disks would be by increasing the size of the Buffer Cache. When an Oracle instance starts it creates memory structures for its usage. The basic memory structures associated with Oracle include: System Global Area (SGA), which is shared by all sessions Program Global Areas (PGA), which is private to each session The SGA is a group of shared memory structures, one of which is the Buffer Cache, used to speed up transactions by maintaining a copy of the data in memory. However, it is not always possible to provide an Oracle Instance with enough memory to cover all requirements. It is possible (since version 11g) to use Flash Disks to increase the buffer cache size. 10
Figure 10 Oracle Smart Flash Cache High availability with Flash As discussed previously, it is possible to use UCP in different configurations. Flash In SAN is one example where RAC provides a high availability architecture. However, in those configurations where Flash is being used to host part or all the database files, RAC configurations are not possible as the Flash Disks cannot be shared among different nodes. In this situation, the best possible solution is to use Oracle DataGuard in either maximum performance or maximum protection mode as shown in Figure 11 below. Figure 11 DataGuard Architecture Blade 1 and Blade 2 can be either on the same chassis, or in different chassis in different locations altogether, in which case this becomes a Disaster Recovery configuration. 11
Oracle Enterprise Manager Plug-Ins Both blades are used in production, as the standby databases are not consuming a lot of resources to apply the replicated data. This configuration allows full utilisation of the hardware and licenses acquired. However, in a failover situation, both databases will be running on the same blade, which will be resource constrained. In this case, it is possible to use the N+M Cold Failover capability of the platform to relocate the failed blade to a new set of hardware and then failback all databases to their original state. Another advantage of this configuration is the reduced license cost compared to Oracle RAC. Use of Dataguard is included in the Oracle database license, whereas RAC is a separately licensed feature. The cost savings here can be considerable and this is one of the main commercial advantages of UCP as a database platform. However, unlike RAC, DataGuard Failovers are (and should be) manual operations, which means a potentially longer service downtime. The architecture to put in place will heavily depend on the customer SLAs. Opportunities for UCP Transaction databases (OLTP) consolidation With LPAR technology it is possible to divide a server into multiple LPARs. Each LPAR can host a different OS and any version of the database, whereas some competing platforms, notably Oracle Exadata, require a minimum of Oracle database version 11gR2. Most customers are still using old versions of the database (9i, 10g) because of historical applications that are not certified (and may never be) against the latest version of the database. HDS UCP would be a good fit for hosting and virtualising different versions of the database in one platform. UCP, being one of the few certified virtualisation technologies for Oracle products, offers HDS a head start compared to other converged infrastructures. It is important to note that this virtualisation certification does not affect licensing of Oracle database on the platform, as outlined earlier in this document. Warehouse databases (DWH) The high initial purchase price of some converged infrastructure solutions specifically built for data warehousing has put some customers off pursuing this option. HDS UCP can provide a more cost-effective option as a dedicated warehouse machine by using Oracle Dataguard to deliver high availability rather than Oracle RAC. This significantly reduces the cost of licensing on UCP compared to other platforms that utilise RAC. In addition to cost, UCP s capability in scaling vertically will be of great value for large data warehouses as resources can be added with minimal effort at any time. Oracle products What has been said for databases can been applied to other Oracle products. This can be a platform for a range of Oracle products and not only just Oracle databases. 12
Summary Main features HDS is approaching the x86 market with technologies originally reserved for the UNIX server space and Mainframes. In Computacenter s opinion, UCP s key selling points include: LPAR Hardware virtualisation SMP Ability to scale vertically PCIe Expansion Extension far beyond two PCIe slots per blade It is also possible to include Flash directly into the SAN and make use all of the usual SAN features (snapshot, remote replication, etc). Flexibility and ease of use During our testing we found that most changes could be performed easily and without much effort. UCP is quite easy to understand and use as proven by all the tests and changes performed during our benchmarking and testing. With the provided hardware resources (Blades, SAN, PCIe expansion), there is an important number of possible architectures that can be implemented. This enables IT to respond to different customer scenarios and architect the right solution. Conclusion In our opinion, UCP can be a good fit for most Oracle estates, as it can be used for both consolidation and warehousing while offering great performances and flexibility of use. It is definitely worth considering as an Oracle platform, and probably for other workloads (e.g. SQL Server) although these have not been covered in this document. Further Information If you would like to discuss Computacenter s analysis of UCP in more detail, or if you would like to find out more about Computacenter s UCP for Oracle consultancy capabilities, please contact enquiries@computacenter.com A detailed description of how HDS UCP performed during our stringent benchmarking exercise is also available in a separate whitepaper we have written entitled HDS UCP for Oracle how does it stack up? The paper is available for download at www.computacenter.com About Computacenter Computacenter is Europe s leading independent provider of IT infrastructure services. We advise customers on their IT strategy implement the most appropriate technology from a wide range of leading vendors and manage their technology infrastructures on their behalf. At every stage we make our customers businesses sharper by removing cost, complexity and barriers to change across their IT infrastructures. Our corporate and government clients are served by offices across the UK, Germany, France, the Benelux countries, Spain and South Africa. We also serve our customers global requirements through our extensive partner network. 13
Computacenter (UK) Ltd Hatfield Avenue Hatfield Hertfordshire AL10 9TW United Kingdom T: +44 (0)1707 631000 F: +44 (0)1707 639966 www.computacenter.com Computacenter is a leading independent provider of IT infrastructure services and solutions. From desktop to datacenter, we help our customers minimise the cost and maximise the value of IT to their businesses. We can advise organisations on IT strategy, implement the most appropriate technology, optimise its performance, and manage elements of our customers infrastructures on their behalf. Computacenter operates in the UK, Germany, France and the Benelux countries, as well as providing transnational services across the globe. Computacenter 2013