WHITE PAPER Service Gateway Intel XEON Optimized Performance Comparison Report Intel Expressway Service Gateway Web Service/API Security Performance Comparison to IBM* DataPower XI50 Appliance server hardware utilized Intel s new XEON processor E5-2600 for ground breaking security performance Take advantage of continuous chip optimization improvements such as XEON E5-2600 with the easily upgradeable software appliance form factor. Intel Expressway Service Gateway outpaces IBM DataPower by 6X to 10X in a direct apples to apples comparison.
Table of Contents Introduction...1 Executive Summary...2 1. Expressway Service Gateway Overview...3 2. Performance Architecture...3 2.1 Intel Multi-Core Optimizations...3 2.2 Efficient XML Processing...3 2.3 Efficient Service Mediation Engine...3 3. Next Generation Service Gateway Architecture...3 4. Performance Testing...4 4.1 Methodology...4 4.2 Test Environment...5 4.3 Test Execution and Setup...5 4.3.1 Service Mediation...6 4.3.2 Service Governance...6 4.3.3 Traffic Pass-Through...6 5. Test Results...7 5.1 Traffic Pass-Through...7 5.2 Service Mediation Test Results...8 5.3 Service Governance Test Results...9 6. Cost Comparison... 10 Conclusion...11 About Expressway Service Gateway...11 Brand Note: Intel Expressway Service Gateway is available direct from Intel or branded as McAfee Services Gateway when procured as part of the McAfee Cloud Security Platform suite. Hereafter, the paper will refer to the Intel short name Intel ESG. Executive Summary Hardware appliances such as the IBM* DataPower XI50 are typically deployed in datacenters to solve integration problems around SOA and REST security, governance and mediation. While these types of highly specialized hardware appliances may solve a few specific problems, they are slow to evolve, difficult to repurpose, and incur higher maintenance and operational costs as compared to general purpose server software solutions. Furthermore, hardware appliances are at odds with major cost-saving datacenter trends such as multi-core and virtualization. As financial constraints on IT become more severe, new products are needed to avoid the compromise inherent in a hardware appliance between performance, and the flexibility and extensibility otherwise available in software solutions. In this new environment, a multi-core optimized software appliance or service gateway is superior in offering high performing security, governance, integration, and mediation capabilities at a lower total cost. Intel s Application Security and Identity Products group has released Intel ESG, a highly tuned, multi-core optimized software service gateway that provides the full security (runtime governance), integration and mediation functionality with a modern graphic design interface It makes application deployment on multicore environments significantly more scalable and manageable. Designed to be easily managed, just like hardware appliances, Intel s Expressway Service Gateway is deployable in minutes rather than days and requires zero coding for common service intermediary usage patterns. Intel ESG offers mediation capabilities that can deal with heterogeneous service and legacy environments and different messaging patterns required in service interactions. Intel ESG also offers common shared security enforcement for XML threat protection (XML Firewall), authentication, authorization, access control and auditing. Intel ESG provides superior performance for demanding security (runtime governance), and mediation use cases outpacing IBM DataPower in a direct apples-toapples comparison by 6X to 10X at a fraction of the total cost. In this paper we provide: A description of the Intel ESG product and architectural elements and how they are taking advantage of continuously evolving Intel processors, platforms and technologies in delivering unparalleled performance. Results demonstrating the superior performance of Intel ESG as compared to IBM DataPower XI50 for service mediation and security (runtime governance) use cases. A cost comparison showing the advantage of deploying Intel ESG as compared to the IBM DataPower XI50. 2
1. Intel Expressway Service Gateway Overview Intel ESG is a software-appliance designed to simplify and secure application architecture on-premise or in the cloud. It expedites deployments by addressing common security and performance challenges it accelerates, secures, integrates and routes XML, web services and legacy data in a single, easy to manage software appliance form factor. It pro vides the following benefits: Secure the Edge: The static network perimeter is gone. Regain control with a centralized policy enforcement point to authenticate, authorize, and govern service interactions with customers, partners, employees, and cloud providers (IaaS/PaaS) as they consume or deploy applications. Build Dynamic Apps: Innovate faster, gain a competitive edge, and securely expose mass customizable applications on-premise or in the cloud, regardless of the abstraction pattern (SOA, WOA), delivery method (App Store, Cloud) or protocol (REST, SOAP). Get More Go Neutral: Break free of purpose-built inflexible appliances tied to expensive vendor suites. When it comes to lower cost, multi-core optimized performance, ready middleware interoperability and vendor viability, protect your investment with Intel. Flexibility without Compromise: Simplify infrastructure by deploying on standard Intel Multi-Core servers. Address common SOA/REST service and security bottlenecks with specialized soft-appliances without compromising extensibility or virtualization. Rapid Deployment: Decoupled designtime (Services Designer) increases security for enterprise roles and allows customers to have multiple copies of the design environment without extra gateway purchases. The designer supports rapid prototyping of application execution without requiring a gateway instance and utilizes standard Eclipse which runs on low-cost laptops and hardware. The technology behind Intel s ESG is based on 10-years of research and development both within Intel and through the acquisition of Sarvega, Inc and has been field tested in some of the largest security and performance-sensitive deployments of distributed SOA and REST design patterns. 2. Performance Architecture 2.1 Intel Multi-Core Optimizations Intel ESG provides a software engine that is highly optimized for Intel Multi-Core architectures, including Intel Xeon 5500, 5600 and the newest record-breaking Xeon E5 based servers that deliver leadership performance, best data center performance per watt, and break through I/O innovation. Intel ESG helps bridge the gap between multi-core optimizations and business software by providing a runtime layer that efficiently scales with the Intel processor roadmap, bringing Moore s law to business computing without requiring a heavy investment in custom programming. 2.2 Efficient XML Processing Efficient XML Processing Intel ESG provides support for XML acceleration by building a number of highly optimized XML processing engines using an efficient binary representation called EventStream (ES). ES scales effectively with Intel Multi-Core architectures and platform enhancements without any special hardware. With this set of XML processing engines, ESG executes XML operations such as XPath evaluation, XSLT, XML Schema validation, XML- Content attack filtering, XML contentbased routing, and XML/WS-Security (including canonicalization) at wire speed, removing traditional XML processing bottlenecks. Intel is continuously evolving its EventStream architecture to provide increased scalability with the latest multi-core platforms. Finally, because Expressway doesn t depend on special hardware, it can be virtualized to achieve the same performance as a dedicated hardware server. 2.3 Efficient Service Mediation Engine While XML optimizations form an important cornerstone of Intel ESG s performance characteristics, they are only a portion of the performance equation. Aside from point optimizations such as faster transforms (XSLT) or faster schema validation (XSD), the real problem is finding a good way to mediate between the services that form the underlying components of the larger service or distributed application. For instance, any gain in XML acceleration may be offset by inefficiency in the many layers involved in a large service application with respect to application data I/O, either between local or remote services. To solve this overhead problem, Intel ESG offers a light-weight service mediation engine that interprets message policies in byte-code form with a zero-copy data interface. This design allows for efficient scaling across multiple threads and allows Intel ESG to offer predictable, linear scalability per core for thousands of incoming connections. 3. Next Generation Service Gateway Architecture The performance numbers and traffic behavior are a result of a continuously optimized Expressway runtime architecture. The new architecture in concert with the increased performance of the Xeon E5 class architecture resulted in considerable performance gains, eclipsing the previous architecture by well over 200% in both security and mediation tests. With the new Xeon E5-2600 series featuring a significant performance improvement and an appreciably smaller level of power consumption, Intel continues to be on the forefront of addressing the unprecedented demand for efficient, secure, and high-performing datacenter infrastructure. Reduced Jitter under Heavy Client Load The newly optimized architecture boasts a number of performance improvements that apply to Expressway running on all Intel Xeon servers. Performance optimizations include enhancements for XML processing leveraging the Intel SSE4.2 instructions, parallel execution, memory management and workflow optimizations. 3
Optimized Parallel Execution to deliver linear scalability with the number of CPU cores: Thread Handling: Improved thread handling for client input that improves CPU utilization and thread-to-core matching, allowing the application to more fully utilize the available cores. Cache Line Eviction: Optimizations to avoid false sharing of the cache line resulting in the reduction of cache line eviction which leads to better processing efficiency in a multi-core environment. Exponential Back-off: Implementation of exponential-back-off in spin lock contention resolution which results in increases parallelism in multi-core environments. Intel Architecture (IA) Lock Synchronization Primitives: Refinement of locking strategy to take the advantage of IA synchronization primitives over standard operating system synchronization primitives thereby increasing parallelism in multi-core environment. Memory Management to address memory contention with larger core counts: Scalable Memory Allocation: Reduced memory contention and switches from user space to kernel space through scalable memory allocation. Local Memory Allocator: Implementation of per thread local memory allocator for small memory objects to reduce global allocator contention which results better parallelism in multi-core environments. Workflow ( Policy Execution ) Optimizations to reduce processing overhead: Object Local Cache: Implementation of heavyweight object local cache to eliminate creation and destruction overhead thereby reducing processing overhead. 4 Copy-on-Write (Copy Avoidance): Optimization in document storage lifecycle management to enable copyon-write semantics and aggressive garbage-collection thereby reducing processing overhead. Smart Processing Semantics: Implementation of smart processing semantics in the workflow engine to eliminate redundant data processing thereby reducing processing overhead. Advanced Intel Architecture (IA) Support: Automatic detection and dynamic selection of advanced IA instructions which results in reduced processing overhead and maximal use of CPU capabilities. Efficient Binary Representation using Event-Stream-III (ES III): Continuing evolution of the patented XML Event Stream technology pioneered by Sarvega, Event Stream III further advances XML performance through structural and algorithmic optimizations based on data obtained from profiling and analysis of real world applications. Additionally, it leverages Intel SSE4.2 instructions available in the Intel Architecture (IA) CPUs for hardware accelerated XML processing. As a result, it provides faster XML Event Stream building, faster element iteration, and faster in-place modification than its predecessor Event Stream II. Some of the improvements to ES III include: Accelerated iterator processing and event stream building: Support for fixed-length (16 bytes) record sizes and separation of text information and structural information provides better memory efficiency. Accelerated update operation: Reduces trunk management on the stream and merges the main and variable stream storage locations. Additional improvements include lazy namespace scope resolution for faster variable evaluation and an improved copytree() internal function for faster in-memory node copying. Code Refinements: The ES-III data structure and processing library includes further code refinements including the optimization of the remove operation by using ESII_LINK, removal of redundant checks and unnecessary decoding. These improvements result in higher performance and better memory usage on multi-core environments. The next generation architecture improves the underlying XML representation, raw TPS and latency and also improves the overall traffic pattern and behavior. For example, the graphs show in here display a noticeable reduction of jittering in both message rate and latency. The results obtained here display very smooth curves that represent highly predictable processing behavior for throughput (messages per second), latency (ms), and CPU utilization. The numbers and curves show the progression and evolution of the Expressway product. Notice how the curves smooth out and become more predictable as the software evolves. 4. Performance Testing 4.1 Methodology Our testing methodology involves use cases with real-world applicability rather than synthetic benchmark that only highlight a few peak numbers. As such, we have designed an environment to tests three aspects of each service intermediary: (a) Messages per Second, (b) Latency, and (c) CPU Utilization. The first metric measures how many messages/transactions the intermediary can process per second and is sometimes abbreviated MPS or TPS. The second metric is the time interval in which a client request is received and responded, and the third metric is an indicator of how much headroom is remaining on the system this is commonly presented as a percentage as either CPU idle or CPU used.
While a very high peak number may look impressive, what really matters is the interplay between these three metrics to achieve a balanced, predictable and stable behavior under heavy load. It is not enough to look at a single metric without considering the other metrics. For instance, the total throughput (TPS) must be qualified against CPU utilization if the CPU utilization is low, it may be possible to increase throughput further by pushing more clients threads with no degradation in perceived latency. Also, if CPU utilization is high or nearly maxed out we would expect to see predictable latency (the time taken to complete a request) to increase as the load increases. It is important for the Service Intermediary to be capable of graceful handling of heavy client loads while maximizing the CPU utilization with predicable behavior. In other words, we expect latency to increase but the system to remain stable. Similarly, CPU utilization can be compared against latency. If two products have similar latencies between requests and one has more CPU headroom, the product with more headroom may be able to handle increased throughput while maintaining client service-levels. Figure 1: Environment Topology Table 1 Intel esg version Client Machine Service Intermediary CPU configuration Memory manufacturer V2.0 Dual Socket Intel Xeon @ 2.8Ghz V2.0 Dual Socket Intel Xeon CPU E5450 @ 3.00GHz V2.7 Dual Socket Intel Xeon CPU E5 @ 2.30GHz 6x2GB 1333 DDR3 RDIMMs (12G total) 4x2GB FBD PC2-5300 (8G total) 4x8GB (32GB total) Supermicro HP ProLiant DL360 G5 Supermicro Backend Server Operating System Red Hat Linux AS4 Red Hat Linux AS4 Red Hat Linux AS5 To achieve this balance, the test environment was designed to minimize network overhead while at the same time ramping up the number of requesting threads in a gradual manner to determine the optimal balance for all three metrics. 4.2 Test Environment The test environment was constructed as indicated in Figure 1. The service intermediary represents the product under test, which was either the IBM DataPower XI50 or one of the versions of Intel ESG as described in Table 1. The tests were run on an isolated network using a 1Gbit network switch connecting the client machine, intermediary, and back-end server. The client is a custom- built generic HTTP client designed to scale to a large number of concurrent client connections and the back-end server is a custom-built generic HTTP web server also designed for low latency and high volumes. Both the client and server are designed to be efficient in memory and CPU usage in order to prevent bottlenecking. For example, the web server simulates a service response and returns a predefined canned response no actual work or service call is executing on the backend server. 4.3 Test Execution and Setup In order to gauge the behavior of each system under load, we used ramping style tests that measured the latency and throughput at various concurrent client connection levels spanning from low single digits to a maximum of 450 threads. The IBM DataPower appliance we configured a single XML Firewall2 application with the application probe disabled. For Intel ESG we configured a workflow application using the Intel Services Designer GUI. No special tuning was done on the Intel ESG and in both cases each product used one NIC interface for the client side and another for the backend server side. The next two sections describe the processing steps for the two main cases: service mediation and service governance. 5
4.3.1 Service Mediation The service mediation use case involves a SOAP v1.1 purchase order message sent over HTTP that undergoes XML schema validation, XSL transformation and endpoint routing. The transformation alters the value a single node (shipping address) in the purchase order to simulate service mediation from one format to another. The response from the backend server is then schema validated for grammar correctness and returned to the calling client. Figure 2 shows a graphical illustration of the service mediation pipeline. 4.3.2 Service Governance The service governance use case involves applying a security policy at runtime on the response leg. In this case, the same SOAP v1.1 purchase order was sent to the intermediary for schema validation. The response from the back-end server includes a schema validation step, a WS-Security signature over the SOAP Body and then a WS-Security encryption step over the SOAP body. The signature algorithm used was 1024-bit RSA w/ SHA-1 and the encryption algorithm was 3DES-CBC with a 1024 bit RSA key-wrap. Figure 3 shows a graphical illustration of the service governance pipeline. 4.3.3 Traffic Pass-Through In addition to the two aforementioned tests, a third pass-through or service virtualization test (see Figure 4) was also conducted to determine peak passthrough rates. In a pass-through test the job of the service intermediary is to act as a pure HTTP message proxy. That is, it receives the SOAP v1.1 request and passes it along to the back-end server with no additional processing. To configure a pass-through test the IBM DataPower was configured using a manual front-side HTTP handler (no WSDL) and the Intel ESG was configured with a receive, invoke and reply policy pipeline. A pass-through test is a good baseline to determine the upper bound on the HTTP and network processing before additional processing steps are applied. In addition to basic service firewall protection, the passthrough scenario is useful in determining the potential of the connection handling and processing inherent in each product. Figure 2: Service Mediation Workflow HTTP XSD Validation XSLT XSD Validation Routing HTTP Service Client XSD Validation Service Provider Figure 3: Service Governance Workflow HTTP XSD Validation HTTP Service Client XSD Validation WS-Security/Sign XSD Validation Service Provider Figure 4: Pass Through Workflow HTTP HTTP Service Client Service Provider 6
5. Test Results The next three sections describe the throughput, latency and CPU utilization results for the pass-through test. 5.1 Traffic Pass-Through Figure 5 demonstrate the baseline proxy performance for each product at a different number of client threads. In all cases, the Intel ESG was able to handle considerably more passthrough traffic. In the extreme case of the Intel Xeon E5 platform, Intel ESG was able to handle more than 10 times the throughput as compared to the IBM DataPower XI50. It is natural to ask why IBM DataPower was only tested with 10 client threads in this case. We observed that the CPU utilization (shown in Figure 6) was already at 90% and furthermore, errors were observed with thread counts higher than ten 3. Next we look at the latency between requests in the pass-through case. As shown in Figure 6. Figure 6 shows the latency comparisons for the baseline pass-through case. In these graphs the latency is the wait time as perceived by each worker thread. Here we can see that once again, the IBM DataPower XI50 is only able to scale to 10 threads while Intel ESG on the Intel Xeon X5560 and Xeon E5 are handling 6 times the traffic at a lower latency. Moreover, we can see the difference in latency with the new Expressway runtime architecture and Xeon E5 platform: at high thread levels the latency has been cut by more than half. Intel ESG also shows the lowest latency in the 10 client test as well, demonstrating over one-third the wait time as compared to the IBM DataPower XI50. The next data point is CPU utilization for each product, which is shown as follows in Figure 7. TPS Figure 5: Pass-Through TPS 38353.43 40000.00 35000.00 30000.00 25000.00 20000.00 15000.00 10000.00 5000.00 Latency (m) 0.00 16.00 14.00 12.00 10.00 8.00 6.00 4.00 2.00 0.00 CPU E5 @ 2.30GHZ 360-Client 30432.49 CPU E5 @ 2.30GHZ 60-Client 28359.44 360-Client 22594.33 60-Client 17075.49 240-Client Product Configuration 13333.85 40-Client Figure 6: Latency 0.77 0.93 1.69 CPU E5 @ 2.30GHZ 60-Client 2.48 60-Client 2.58 2.84 DP XI50 Product Configuration 40-Client Figure 7: CPU Utilization 11126.00 5.47 CPU E5 @ 2.30GHZ 360-Client 9418.00 12.15 360-Client 3618.18 DP XI50 13.93 240-Client Figure 7 shows how much CPU is utilized during the pass-through tests. Here we can see that the available headroom on the IBM DataPower XI50 product is nearly exhausted at 10 clients and 3600 TPS while Intel ESG still has nearly half its potential headroom available in the case of the Intel Xeon 5560 Platform at 60 clients. The performance is even more efficient with the Xeon E5 even at 360 threads the CPU is still idle about a quarter of the time. Overall, the DataPower product shows the lowest available headroom at the lowest number of threads, indicating it is nearly maxed out. In the next section we will look at how each product scales over time for both the service mediation and service governance use cases. CPU (%) 100.00 90.00 80.00 70.00 60.00 50.00 40.00 30.00 20.00 10.00 0.00 25.29 CPU E5 @ 2.30GHZ 360-Client 40.76 CPU E5 @ 2.30GHZ 60-Client 52.00 60-Client 66.27 77.37 40-Client 84.11 90.00 90.00 360-Client 240-Client DP XI50 Product Configuration 7
5.2 Service Mediation Test Results The following figures compare the performance of the IBM DataPower XI50 to Intel ESG running on both Intel platforms for service mediation. Each section shows a plot over time comparing the TPS, latency and idle CPU as the number of client threads increase. Figure 8 shows the throughput measured messages per second (TPS) as the number of client threads increases. Here we can see Intel ESG on the E5450 platform hitting a peak throughput of about 6200 TPS and about 9600 TPS on the X5560 platform before leveling off. If we compare the previous Expressway architecture to the next generation software on the Xeon E5 we can see the numbers peak at about 18000 TPS. The leveling off behavior indicates that throughput levels are being maintained at the expense of latency (see Figure 8). That is, despite the number of clients, Intel ESG is behaving predictably. Moreover, the smooth logarithmic behavior of the next-generation Expressway architecture contrasts some of the peaking seen in the previous generation software. This contrasts the IBM DataPower XI50, which hits its peak at about ~2600 TPS before degrading to just under 1000 TPS with heavy load. In this case, the IBM DataPower XI50 is unable to scale to the higher thread levels. Figure 9 shows the latency in milliseconds between client requests for each product. Here we can see that Intel ESG has linear behavior with respect to the time between requests, even under heavy loads. Intel ESG s slight initial advantage in latency quickly expands over IBM DataPower XI50 as the client threads increase. The IBM DataPower XI50 begins with steeper linear behavior, but this turns erratic at about 200 threads, indicating difficulty in scaling stably at higher thread levels. It is also important to note that on the faster Intel platform (X5560) the latency slope is more gradual, indicating that Intel ESG is scaling with high predictability and stability based on platform improvements alone. Comparing further, we can see that the predictability and linear behavior continues its flattening trend with the next generation architecture on the Xeon E5 hardware as the line approaches a progressively smaller slope. Figure 10 shows the percentage of available CPU as the number of client threads increase. Here we can see that Intel ESG (E5450) and IBM DataPower XI50 are about on-par for available CPU (around 10%) for the first 200 threads. As the threads increase, however, IBM DataPower begins to have scalability problems. Even though more of the CPU becomes available it is unable to process the increased number of client requests and maintain throughput levels (See Figures 8 and 9). For Intel ESG on the Intel X5560 platform we can see a large increase in available CPU once again due to Intel software and platform improvements. The next generation architecture and E5 platform make the CPU behavior even more dramatic. The smooth logarithmic curve indicates that the updated runtime architecture is able to utilize the available processing power much more efficiently. This can be seen around thread 200 where the previous architecture remains static but the updated architecture on the Xeon E5 makes increased usage of the available processing power. 8 Throughput (TPS) Latency (ms) Figure 8: Throughput 20000 18000 16000 14000 12000 10000 Idle CPU % 8000 6000 4000 2000 0 0 50 100 150 200 250 300 350 400 450 Thread Count 350 300 250 200 150 100 50 0 0 50 100 150 200 250 300 350 400 450 Thread Count 80 70 60 50 40 30 20 10 Figure 9: Latency Figure 10: Idle CPU 0 0 50 100 150 200 250 300 350 400 450 Thread Count Intel ESG (X5560) Intel ESG (E5450) DataPower (XI50) Intel ESG (Xeon E5)
5.3 Service Governance Test Results The following figures compare the performance of the IBM DataPower XI50 to Intel ESG running on both Intel platforms for service governance (security). Each section shows a plot over time comparing the TPS, latency and idle CPU as the number of client threads increase. It should be noted that the service governance scenario is considerably more taxing to each system due to the addition of security operations. While the absolute throughput numbers are expected to be lower for all the systems under test, the overall behavior under heavy client load should remain consistent with the service mediation case. Figure 11 shows the throughput measured in messages per second (TPS) as the number of client threads increase. Here we can see Intel ESG on the E5450 platform hitting a peak throughput of about 2030 TPS and about 3700 TPS on the X5560 platform before leveling off. The next generation Expressway architecture on the Xeon E5 platform hits a peak of 8000 TPS, which is more than double the previous generation architecture and hardware. This contrasts the IBM DataPower XI50 which achieves a peak throughput of 1280 TPS before leveling off and then eventually degrading under heavy load. It should be noted that we obtained the lower bound performance advantage of about 6X by considering the sustained throughput of Intel ESG on the Xeon E5 platform as compared to IBM DataPower XI50. Figure 12 shows the latency in milliseconds between client requests for each product. Here we can see that Intel ESG has linear behavior with respect to the time between requests, even at heavy loads. In the latency comparison between Intel ESG on the E5450 and X5560 it is also important to see that the difference in slope is larger for the service governance case as compared to the latency slope shown in Figure 8. This suggests that platform improvements in the X5560 are more applicable to the service governance case and that Intel ESG is once again successfully scaling on the latest Intel platforms. The updated runtime architecture on the Xeon E5 takes this even one step further. Figure 12 shows a more progressive flattening of the slope, indicating lower latency over time, a direct result of the advanced software architecture more fully utilizing Intel Multi-Core optimizations. Figure 13 shows the percentage of available CPU as the number of client threads increase. While the DataPower XI50 has more available CPU than Intel ESG on the E5450, it is unable to make use of the extra processing power to scale its throughput or reduce its latency (see Figures 11 and 12). Moreover, the spike in free CPU for DataPower is due to excessive errors which begin at thread 169. In this case, the DataPower XI50 is unable to handle the increased number of threads and begins to lose requests; in other words, the spike in idle CPU is a symptom of overload rather than an indicator of increased headroom. Also, in the case of Intel ESG on the X5560 we can see a dramatic increase in the amount of idle CPU, which is further improved by the next generation runtime architecture on the Xeon E5. Throughput (TPS) Latency (ms) Idle CPU % 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 0 50 100 150 200 250 300 350 400 450 Thread Count 350 300 250 200 150 100 50 0 0 50 100 150 200 250 300 350 400 450 Thread Count 80 70 60 50 40 30 20 10 Figure 11: Throughput Figure 12: Latency Figure 13: Idle CPU 0 0 50 100 150 200 250 300 350 Thread Count Intel ESG (X5560) Intel ESG (E5450) DataPower (XI50) Intel ESG (Xeon E5) 9
6. Cost Comparison The following table shows an initial upfront cost comparison for Intel ESG versus IBM DataPower XI50 for a typical deployment consisting of 2 units deployed at 2 different physical locations, each with a single hot-standby unit (4 units total). While discounting may apply in both cases, Intel ESG s list price is only about 50% of the IBM DataPower XI50, despite its significant performance advantage. Moreover, since Intel ESG runs on standard Intel servers, it can be deployed on existing server hardware or virtual machines in the datacenter rather than requiring the acquisition of purpose-built appliances. Additionally, the Intel ESG soft appliance form factor allows datacenter to update server hardware independently riding on Intel s hardware performance evolution while maintaining system architecture continuity. Table 2: Initial Cost Comparison Product BM * Websphere DataPower XI50 7 Total Price Per-unit Price Per Unit Maintenance & Support Total Upfront cost (4 units) $100,000 $9,750 $439,000 Intel Expressway Service Gateway Production Units 8 $64,000 $12,600 Intel Expressway Service Gateway Hot Standby Units $34,000 Intel Expressway Service Gateway Total Price $5,800 $232,800 10
Conclusion This detailed apples-to-apples comparison demonstrates that the Intel ESG soft appliance running on Intel Multi-Core servers easily outperforms a custom built hardware appliance on real-world use cases. Not only can Intel ESG achieve higher peak throughput, it also demonstrates predictability, linear scalability, and the ability to take advantage of Intel microprocessor advancements and apply them directly to business processing without any specialized coding or custom configurations. We demonstrated a lower bound performance metric of 6X on the service governance use case and an upper bound throughput metric of 10X on the pass-through use case, with an average advantage of 6X improvement at different thread levels. This clear advantage coupled with the lower cost and increased flexibility of a software solution makes Intel ESG a compelling choice to increase performance and reduce costs and complexity for service integration in the datacenter. About Intel Expressway Service Gateway Intel ESG (also available as McAfee Services Gateway under the McAfee Cloud Security Platform suite) is a softwareappliance designed to simplify and secure application architecture on-premise or in the cloud. It expedites deployments by addressing common security and performance challenges it accelerates, secures, integrates and routes XML, web services and legacy data in a single, easy to manage software appliance form factor. About Intel Xeon Processor E5-2600 series The Intel Xeon processor E5-2600 is record breaking, delivering leadership performance, best data center performance per watt and breakthrough I/O innovation. It addresses the unprecedented demand for efficient, secure, and high-performing datacenter infrastructure. It features up to an 80% performance boost versus priorgeneration processors at consistentpower levels with an improved energy efficient performance of up to 50%. 11
For product comparison information: http://www.intel.com/go/identity Americas: 1-800-253-3696 All other Geographies: 1-978-948-2585 Contact us by email: intelsoainfo@intel.com Performance Test Details Intel SOA Expressway v2.0 on Intel Xeon Dual Processor Intel Xeon @ 2.80GHz Nehalem-EP Manufacturer: Supermicro Memory: 6x2GB 1333 DDR3 RDIMMs (12G total) Operating System: Red Hat Linux AS 4 Intel SOA Expressway v2.0 on HP ProLiant Platform: HP ProLiant DL360 G5 Dual Processor Manufacturer: HP Memory: 4x2GB FBD PC2-5300 (8G total) IBM * DataPower XI50 Firmware Rev: XI50.3.6.1.5 Build: 156262 Client: High performance generic HTTP Linux client Server: Standard low-latency HTTP web server 4X Quadcore: Genuine Intel CPU @ 2.40GHz Memory: 16G Red Hat Linux AS4 Default settings were used for both Intel SOA Expressway and the IBM * DataPower XI50 Intel SOA Expressway v2.7 on Intel Xeon E5 Dual Processor Intel Xeon CPU E5 @ 2.30GHz Manufacturer: Supermicro Memory: 4x8GB (12G total) Operating System: Red Hat Linux AS-5, 64-bit 1 IBM DataPower XI50 Firmware Rev: XI50.3.6.1.5, Build: 156262 2 For tests that required a private key, we ensured the key was loaded into the HSM (Hardware Security Module) to obtain the best performance rather than flash memory. 3 We found problems with higher numbers of threads with persistent connections enabled. Disabling persistent connections allowed the IBM DataPower XI50 product to process a larger number of worker threads without errors, but at a reduced throughput rate. 4 Processing errors were reported by the IBM DataPower XI50 for the service mediation case when thread levels reached 230. Intel SOA Expressway produced zero errors. 5 Processing errors were reported by the IBM DataPower XI50 for the service mediation case when thread levels reached 169. Intel SOA Expressway reported zero errors. 6 If we assume an average 8084 TPS for Expressway using the updated architecture on the Xeon E5 as compared to 1280 TPS for the IBM DataPower XI50 the ratio is approximately 6.31X 7 Source: IBM Websphere DataPower SOA Appliance Announcement, March 2006 (http://www.ibm.com/common/ssi/rep_ca/3/897/enus106-253/enus106253.pdf) 8 This price includes the price of a suitably configured server (Intel Xeon E5450 class) as well as a 3 year enterprise Linux license with support. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked reserved or undefined. Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice.do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel s Web site at www.intel.com. Copyright 2012 Intel Corporation. All rights reserved. Intel, the Intel logo, and Xeon are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Printed in USA Please Recycle 324094-004US