CLOUD PERFORMANCE TESTING - KEY CONSIDERATIONS (COMPLETE ANALYSIS USING RETAIL APPLICATION TEST DATA) Abhijeet Padwal Performance engineering group Persistent Systems, Pune email: abhijeet_padwal@persistent.co.in Due to its lower cost and greater flexibility, cloud has become the most preferred option for the deployments for any size of the applications and products in today s world. Through its Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) se rvices, cloud has attracted and benefitted the testing services of the applications especially the load and performance testing. Though cloud provides superior flexibility, scalability at lower cost over the traditional on-premises deployments, it has got its own limitations and challenges. If those limitations are not evaluated carefully they can severely impact overall projects and their budgets if not evaluated carefully. It is recommended to take holistic view while deciding about using cloud for any purpose by taking detailed look at pros and cons of cloud. This paper illustrate cloud in brief and a detail case study a load testing of a Retail application in cloud and how cloud s pros and cons worked in favor and against during the course of load testing and what actions needed to be taken to overcome those. The copyright of this paper is owned by the author(s). The author(s) hereby grants Computer Measurement Group Inc a royalty free right to publish this paper in CMG India Annual Conference Proceedings. 78
1. Introduction In recent years there have been revolutionary technology innovations which have changed the world where we live and the way we interact and do our business. These innovations have resulted in to a technology transformation which is happening at a rapid speed. Technology transformation is vital and has resulted in to a better and faster, serving to the business and the end users. One of the most talked about and which has reached the reality and established a new type of service delivery arena is the Cloud Computing! The services offered by the cloud are helping business to move in to an arena of reduced cost, highly available, faster, reliable and high margin services and products and that s why businesses are aggressively adapting cloud based services. Increasingly, businesses are moving their traditional on-premises deployments of their applications or products to the scalable cloud environment which gives an advantage of the low cost, high availability at low maintenance. Along with the production deployments, cloud has been also benefitted in the testing of the applications especially for load and performance testing through its Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) services. Cloud has found to be useful for hosting the load testing environments due to its ability to arrange high end servers, applications and number of load injectors with a higher flexibility and lower costs. However like any other service Cloud does have its own limitation and challenges over conventional on-premises deployments. For example Cloud doesn t provide accessibility to the low level hardware configuration parameters which are important during the activities such as tuning. And in this case tuning or optimization activities cannot be performed effectively on the cloud. Depending on the cases and type of use of cloud services, those limitations can be categorized. If one want to use cloud for load and performance testing and at its best then he must take a holistic view by considering the pros and cons of cloud environment and define an effective strategy to use it. 2. Cloud Computing Gartner definition for the cloud computing- A style of computing in which scalable and elastic IT-enabled capabilities are delivered as a service using internet technologies. [Gartner 2014] This definition itself describes the cloud computing in very simple words. A computing which is, o Scalable and elastic One can do dynamic provisioning of resources (on-demand) o Accessibility over the internet Accessible to the end users over the internet on wide range of devices, PC, Laptops, mobile etc. o Service-Oriented A service which is a value add to the end user for whom it is a black box. 2.1 Types of Cloud Services Based on these characteristics cloud services are classified in 3 main categories Infrastructure as a Service (IaaS) This is the most basic cloud-service model, where physical or virtual machines and other resources are offered by the provider and cloud users install operating-system images and their application software on the cloud infrastructure. Platform as a Service (PaaS) A computing platform, typically including operating system, programming language execution environment, database, and web server. Application developers and testers can develop, run and test their software solutions on a cloud platform without the cost and complexity of buying and managing the underlying hardware and software layers. Software as a Service (SaaS) 79
In the SaaS model, cloud providers install and operate application software in the cloud and cloud users access the software from cloud clients. Cloud users do not manage the cloud infrastructure and platform where the application runs. This eliminates the need to install and run the application on the cloud user's own computers, which simplifies maintenance and support. 2.2 Cloud Service Providers Amazon, Google, Microsoft Azure, Openstack and many other vendors provide different kind of service offerings in cloud arena. 2.3 Market Current Status and Outlook Due to the inherent characteristics of cloud which are beneficial for business and the attractive pricing models offered by the service providers Cloud based services have enormous demand. A Recent survey by the wellknown agencies shows that demand for cloud based services is getting stronger all the time. Grtner - Global spending on public cloud services is expected to grow 18.6% in 2012 to $110.3B, achieving a CAGR of 17.7% from 2011 through 2016. The total market is expected to grow from $76.9B in 2010 to $210B in 2016. The following is an analysis of the public cloud services market size and annual growth rates. [Cloud Market2013] Picture 1 Annual growth for cloud market 3. Case Study 3.1 About customer Customer is a leading software company delivering Retail Solutions to market leaders across the globe. These solutions include POS, CRM, SCM and ERP. 3.2 About Application Application is an enterprise class retail solution to manage the front end and backend operations within a retail store and controlling the stores from the head office through a single application. 80
Figure1 Application architecture App server (AS) is the core application located at Head office and responsible for managing all the stores and real-time processing and analyzing the data generated by the stores. AS is also responsible for transferring the software updates to the stores through its Update functionality. Operations is the core application at every store which is responsible for store management and maintaining the store level master and transactional data and exchanging it between billing counters and AS server. Operations takes care of the store operations starting from maintaining stock inventory, pricing, promotions, store level reports, online data transfer to AS server through Replication client component and receiving the patches from EAS server and transferring those to counters. Billing counter takes care of item information and billing of those. All the billing data generated by the counter is stored in Store DB which is finally replicated to AS server using the Replication client component at Operations. All the applications were developed in ASP.Net and the database was the SQL server. 3.3 Performance testing requirement This retail application has been deployed at various customers and working fine. However till recently maximum number of stores at any of the customers were 200. Recently the customer got a requirement where this retail solution would be deployed across 3000 stores. The customer had never done deployment at such a high scale and thus unaware of the whether the application would sustain 3000 stores, if not then what needs to be tune and what kind of hardware would be required. As a first step the customer decided to put the application under load of 3000 stores for various business workflows and see how it behaves. For load testing activity the customer came up with the 5 real life business scenarios which have been used more frequently and does the high amount of transactions. The customer had identified below 5 scenarios across AS, Operations and Billing counter as below, Scenario 1 Replication Replication of billing data from store to AS for 3000 stores. Scenario 2 Billing counter Multi user (minimum 25 parallel counters) performing billing transactions which include the Bill, Sales Return, Bill Cancellation, Lost Sales (in order of execution priority) with max line item not above 200 and minimum of 20 line items with Cash and Credit card as payment. Scenario 3 AS 81
Access the reports to be checked while data from Store (minimum 20+ stores) is being updated to AS. Scenario 4 Operations Access stock management functions with 1000 + line items namely with 5/10 users Scenario 5 Updates Download of patch for more than 100 stores simultaneously. Various patch sizes to be tested namely 50MB, 80MB, 100MB 4. Approach Scenario 1 i.e. Replication was on the high priority as it was most frequent operation between stores and central server and handles huge amount of data generated by the stores. Here after this white paper would illustrate the approach taken for load testing this scenario. 4.1 Scenario Replication of data from Store to Server for 3000 stores. Each store would have 100 billing counters and each counter generating bill with 200 line items. 4.2 Scenario Architecture Figure2 Replication scenario architecture This replication scenario has 3 sub activities, 1. Collation of billing data from all the counters and generate the xml message files. 2. Transfer the xml message files from store to server (replication client -> replication server). 3. Extract the xml files and store the extracted billing data on the head office database. It was decided to take pragmatic approach for simulation of the entire scenario. First simulate above mentioned each step in isolation and then go for the end to end mix execution. First candidate was the transfer of the xml files from replication client located on 3000 stores to the replication server on the head office. Rational behind selecting this particular step of the scenario on priority was, step 1 was the within a store process which would 82
have max to max 100 counters each store so the max load for this step at any given point would be not more than 100. Step 2 is the event when actual load of 3000 stores would come in to picture so it was decided to start with that particular step. 4.3 Test Harness setup To simulate this scenario a test harness was created which had 5 parts, 1. xml messages folders on injector machine 2. Vb based replication client (.exe) on injector machine 3. IIS and sql server based replication server 4. xml message folder on the head-office server and 5. Perfmon setup for monitoring the resource consumption on the AS as well as load injectors. Folder structure on the store and head office was as below, Picture2 Message folder structure on replication client and server XML messages which have to be transferred are placed in the OutBox folder on replication client on store side and messages which have been received are placed in the Inbox folder on replication server at head-office. Each store has 100 xmls messages of 2 MB size each in the outbox folder with the billing data of the 100 line items each. Replication client was a VB based.exe file which was executed through command line\.bat file by passing arguments as server IP and XML message folder name at client\store end. Command: start prjreplicationupload20092013-1.exe C:\ \ReplicationUpload\ReplicationUpload:10.0.0.35:S000701:100:S000701:20130812-235959(1) prjreplicationupload20092013-1.exe: application file name for 1 st store 10.0.0.35: server IP S000701: store folder at server end 20130812-235959(1): XML message folder at client end It was not feasible to setup and manage 3000 actual store machines to inject the load so it was obvious to simulate multiple stores from single load injector box. This was achieved by using windows batch utility. Multiple copies of EXE files were created by different names to represent number of store considered for data replication. Picture 3 Multiple copies of replication utility A batch file was created to execute all exes one after another in a sequence. 83
The next question was how to calculate the time taken for the entire messages file upload operation when multiple copies of replication clients are fired which are uploading xml messages to replication server simultaneously. Best way to calculate end to end data transfer time was to start with a first replication exe triggered to the last xml message file uploaded to the replication server. 5. Test Setup For server configuration it was decided to go ahead with the same configuration which has been used for the existing customers and based on the results of these test, perform server sizing and capacity planning activity. AS Configuration Operating System Web-Server IIS 8 Number of Cores 4 RAM Windows Server 2012 DataCenter 28 GB Network Card Bandwidth (Mbps) 10 Gbps Table1 AS Server configuration Database Server Configuration Operating System Web-Server IIS 8 Number of Cores 4 RAM Network Card Bandwidth (Mbps) Windows Server 2012 DataCenter 7 GB 10 Gbps Table2 DB Server configuration This hardware configuration was not available in house and needed to be either procured or rented out for this activity. Considering the short span of test execution phase it was decided to rent out this hardware from local market. 5.1 Load Injectors Finding out the size and required number of load injectors was tricky. As mentioned above it was not feasible to setup and manage 3000 actual store machines to inject the load and thus it was necessary to initiate the load of multiple stores from single load injector box. With this approach it was must to make sure that load injector itself should not be overloaded and number of injectors should be optimum so that the load injector management efforts are less and feasible. To come up with required number of injector, sample tests were conducted by simulation of the multiple copies of replication client from single injector using the windows batch file. Number of replication client was gradually ramp up till the point injector CPU reaches to 70%. Single injector with Intel P4 processor with 2 GB RAM supported 100 instances of replication client that means to initiate the load of 3000 stores 30 load injectors are required. These many machines were not available for load testing in local environment so an option was evaluated to reducing the number of injectors by increasing the hardware capacity. However this option was not commercially and logistically viable to arrange those high end machine machines. Considering this it was decided to go ahead with machine configuration which was used for sample test as it was the normal configuration so the availability and costing would be affordable. 84
5.2 Rented Vs Cloud base load injectors Here 2 options were at disposal, either go for renting of the load injectors as well as servers in local market or see if the test could be performed in virtual cloud environment. Costing was taken for rented option from local market and for cloud based virtual environment multiple vendors were evaluated such as Amazon cloud and Microsoft Azure. Total efforts of 15 days were originally planned for the execution of this particular scenario. For local renting minimum duration for rent was 1 month with cost of Client - $50 per month per machine App Server - $ 150 Per month per server Database - $50 per day per server In case of cloud, flexible on-demand costing option was available. For on-demand costing calculation a detailed usage pattern was defined for the load injectors and server for those 15 days. Machine Number of Instances Number of days required Usage Activity Setup machines 2 15 12hrs per day Environment setup and sample runs Load Injectors 30 5 12hrs per day Execution of 3000 stores Application Server 1 15 12 hrs per day Sample and actual runs Database Server 1 15 12 hrs per day Sample and actual runs Table3 Usage pattern for machines during design and execution of scenario1 Based on the above usage pattern cost of Amazon and Microsoft Azure setup were calculated and further compared with local renting option as below, Virtual Machines / Instance Microsoft Azure ($) Amazon ($) Load injectors 30 648 858 1500 Setup machines 2 86.4 547 100 AS Server 1 183.6 270 150 DB Server 1 442.8 98 50 Total 1360 2055 1800* Table4 Cost comparison between Azure, Amazon and local renting Local Renting ($) *Cost includes only hardware. OS on client and servers and SQL server licenses are separately charged. In clouds Microsoft Azure was a cheaper than Amazon which also had added benefit of 5 GB of free data upload and download from cloud which was just 1 GB in case of Amazon. Microsoft Azure also stood as a winner in cost comparison with local renting option. Apart from hardware cost local renting had another added cost of licenses of OS and SQL server. 85
5.3 Microsoft Azure Load Test Environment Figure3 Load test setup at Azure and local environment An isolated environment was setup in Azure cloud having replication server on AS, database server, 30 load injectors and 2 setup machines. Considering the high volume of transaction traffic 10GB LAN was setup for the load testing environment in Azure. This environment was accessed through the controlling client s setup in local environment over the RDP connection. To control and manage the 30 load injectors in Azure environment, 6 controlling local clients needed to be setup in local environment. From each controlling client 5 load injectors were accessed to setup and execute the test and capturing the result data. 6. Test Execution and Results Analysis 6.1 Initial Test Results After setting up the test environment, test execution was started with less number of stores. Based on the results of each test run, number of stores load was gradually ramped up. First test was conducted for the 100 stores which was successful. Then number of stores gradually increased during each test from 100, 200, 500 and 700, 800. Till 700 stores, all xmls files from stores were getting transferred to replication server however during the 800 stores test, number of stores started getting failed. Few more tests with 1000 and 1600 stores were also conducted for the analysis for the failures. Please refer below table for the summary of the results. Stores # Successful Failed Start Time End Time Total Time Stores # Stores # (HH:MM) (HH:MM) (mm:ss) Status 100 100 0 6:41:00 6:43:00 0:02:00 Pass 200 200 0 11:27:00 11:30:00 0:03:00 Pass 500 500 0 13:28:00 13:35:00 0:07:00 Pass 700 : Round 1 700 0 8:37:00 8:46:00 0:09:00 Pass 700 : Round 2 700 0 14:03:00 14:15:00 0:12:00 Pass 800 : Round 1 700 100 6:38:00 6:48:00 0:10:00 Fail 800 : Round 2 702 98 9:46:28 10:02:00 0:15:32 Faill 86
1000 : Round 1 954 46 12:57:00 13:12:00 0:15:00 Fail 1000 : Round2 906 94 12:28:00 12:45:00 0:17:00 Fail 1600 1300 300 7:49:00 8:05:00 0:16:00 Fail Table5 Test results summary of scenario1 on Azure It was observed that after 700 stores replication scenario behaviour was inconsistent. To ascertain the reason for this failure, resource consumption data on the replication server was further analysed. For this detailed analysis parameters for each hardware resource were identified, %CPU utilization, Available Memory, % Disc queue length, % processor queue length and network bandwidth. Table 6 Resource Utilization Analysis This analysis highlighted that when all the stores start replication activity, server disk becomes saturated and thus the processor and queue length for these resources builds up beyond the threshold values which results in to inconsistent behaviour and failures. Based on this analysis it was decided to upgrade both of the hardware resources if possible or atleast the disk speed which was the main culprit for the failures. Current configuration of these 2 resources was, number of cores 4 and disk speed 10k RPM. To do a stepwise scaling of these resources it was decided to upgrade both of these resources to CPU 6 cores and disk speed 15k RPM. These new hardware requirements were checked with Microsoft Azure if more numbers of cores and higher speed disk could be made available. It was found that number of cores could be upgraded to 8 however not the higher speed disk. Reason behind that was all the instances in the disk array were having the same speed and it was impossible for the Microsoft to arrange higher speed disk for our testing. This was the show stopper for further testing and an important revelation of the limitation of the cloud environment. Performance tuning activity, which requires lot of configuration changes at the underlying hardware layer, cannot not be performed efficiently in the cloud environment where the resources are being shared and cannot be changed. 6.2 Moving to Physical Server in local environment After this revelation of limitations at Microsoft Azure, it was decided to evaluate other options and even though those are costly compared to Azure. Those were Amazon cloud and rented physical servers in local environment. Amazon cloud had the option of higher disks as well as more number of CPU s. However based on the 87
experience of Microsoft Azure it was decided to rule out cloud option as even if Amazon is providing higher speed disk, they might have further limitations on other resources and their tuning. Considering the entire situation and limitations of cloud environment it was decided to rent out the server from the local market with below configuration. SERVER Operating System Operating System Type Windows Server 2012 DataCenter 64-bit Processor Intel Xeon CPU E5-2630 0@2.30Gz Web-Server IIS 8 Number of Cores 6 RAM 28GB Table7 AS server configuration for local environment Fortunately this hardware configuration was available with the local vendor however the next challenge was to arrange 30 load injectors which were not available in the load test environment and could not be made available by the vendor in the short time. Looking at the limited time in hand it was decided to use machines from other teams during the out of office hours to carry out the further tests. By overcoming all the challenges tests were carried out in the local environment and as anticipated 3000 stores replication worked without any hitches! Store # Successful Total Time Failed Stores # Stores # (in minutes) Status 100 100 0 1 Pass 200 200 0 2 Pass 500 500 0 3 Pass 700 700 0 4 Pass 1000 1000 0 8 Pass 1500 1500 0 15 Pass 2000 2000 0 16 Pass 2500 2500 0 21 Pass 3000 3000 0 29 Pass Table8 Test results summary for scenario1 on local environment 7. Challenges/ issues faced in Cloud during execution Apart from the limitation of the configuration changes in the cloud environment, there were few other challenges were faced during the course of the execution. Most of those were due to the fact that there were large number of load injectors to be managed and the mode of accessibility i.e. RDP was slow over the internet. Few of those are mentioned below, 7.1 Switching between injectors to initiate the test Due to the nature and design of the replication client, there was no central utility or application available to initiate the load from all 30 injectors automatically the way most load testing tools does. Test initiation had to be done manually by login in to individual boxes. To accurately simulate the real-time behaviour of replication scenario it was required to keep the high concurrency during the execution and to achieve this load should have been initiated from all the 30 injectors at the same time or atleast with a very short delay. To facilitate this, switching between injectors through controlling machine had to be very fast. This switching between load injectors over the 88
RDP connectivity was tedious, it would have been very easy if there would have been 30 controller machines each managing single injector but it was not the case in this particular scenario. 7.2 Test data setup Test setup included various task and one of the difficult one was creating folder setup for the 100 stores per injector. During each execution cycle it was required to use unique bill number for the billing data so the message folders were required to be updated before each test cycle with unique bill number in each store folder. Setting up folder structures on 30 client machines to simulate 3000 stores was tedious and time consuming task and performing it over RDP had added more complexity and time to it. 7.3 Monitoring It was also necessary to monitor the health of each load injector during the execution to make sure that those are not overloaded. Keeping an eye on resource consumption of 5 load injectors from a single controlling machine was challenging. 7,4 Data transfer These tests generated huge amount of result data including the resource utilization data. In absence of the applications such as Microsoft excel and slow speed of the RDP connectivity it was difficult to perform the analysis of this data on cloud machines itself so that data had to be downloaded for every test run. Downloading of large amount of data over the internet connection was a time consuming process and there were cost involved after the stipulated limit set by Microsoft for data transfer. 8. Conclusion and outlook For load and performance testing, cloud provides an edge over the conventional on-premises test setups. One can take advantage of the cloud to build the test environments within a very short period of time with a cost and logistical flexibility however with all these advantages there are few disadvantages or challenges with the cloud environment which could severely hamper the purpose. A limitation such as no access to the underlying hardware configuration parameters doesn t suit for the activities such as bottleneck identification, tuning and optimization. Managing cloud setup remotely and data transfer over internet adds few more complexities and delays in overall schedule. One can consider cloud for load and performance testing however it is recommended that all these pros and cons should be studied in details and that too in the context of the load testing requirement, see what could be possible and what cannot be performed on cloud and base on that define the strategy to perform load testing on cloud. References [Gartner 2014] http://www.gartner.com/it-glossary/cloud-computing/ [Cloud market 2013 ] Gartner survey http://www.forbes.com/sites/louiscolumbus/2013/02/19/gartner-predicts-infrastructure-services-willaccelerate-cloud-computing-growth/ 89