& Portal Performance Testing and Tuning GCP - IT Performance Practice By Zubair Syed (zubair.syed@tcs.com) April 2014 Copyright 2012 Tata Consultancy Services Limited
Overview A large insurance company Recorded High growth recently Has potential to capture larger market Business Challenges Changing business needs Dynamics in the market and offerings Existing systems and their limitations due to legacy technology Time constraints to launch offers to the market Solution Embarked on TEBT (Technology Enabled Business Transformation) program - 1 -
Challenges Performance Engineering Capacity of performance environment < 25% of production Incomplete performance NFRs Tool not identified, hence to be purchased for performance testing Initial go-live release, no history of application usage. Hence Requiring rigorous performance testing Identifying and fixing performance bottlenecks Time constraints for test prep and execution phases - 2 -
Physical Load Balancer Architecture Lead Management System Creation of Leads Update Leads Search Lead Reports Bulk Upload Call Center Reps External Partners Oracle HTTP Oracle HTTP JVM IBM HTTP JVM IBM HTTP Server 2 SOA Services Oracle DB Database CRM Portal CRM is used by client s call center reps Client offices located across the geography Located within their high bandwidth network Portal is the channel for client s partners who also generate leads for them Located outside client s premises Connected to client s network over an extended pipe (1 gig, NDSL etc.) - 3 -
Test Approach Performance Testing Tool IBM Rational Performance Tester (-web protocol for, HTTP for Portal) Scenarios Peak Load Test Peak day of the month scenario Endurance Test Average load for 6 hours business day Switch over Test Active Passive node switch over Workload OLTP + Reports + Bulk uploads Monitoring NMON, AWR, Application Logs Profiling Splunk, PMAT - 4 -
Response Time in Sec Response Time in Sec Results and SLAs SLAs 5 Transactions 4 seconds for search operations 4 4 seconds for create/insert 3 Reports generation in 30 seconds Results High response time (Portal) 2 1 0 2.4 2.1 1.7 1.2 Txn 1 Txn 2 Txn 3 Txn 4 Response Time (Sec) SLA (Sec) Resource utilization healthy No deadlocks in DB Portal Transactions 14 Cause Analysis EAI connection pooling Portal web pages 12 10 8 6 4 10.3 8.9 12.4 11.8 Queries 2 0 Txn 4 Txn 5 Txn 6 Txn 6 Response Time (Sec) SLA (Sec) - 5 -
Physical Load Balancer Finding # 1 Where is the issue, where Portal differs from? Both have same End-user functionality Common Load Balancer Common App tier & DB for both UIs Testing from within the LAN Oracle HTTP AOM Call Center Reps Oracle HTTP EAI OM Database SOA Services Connection pool In EAI IBM HTTP JVM Oracle DB CRM IBM HTTP Server 2 JVM Portal External Partners Solution and Best Practice: Configure connection pools EAI objects manager - 6 -
Physical Load Balancer Finding # 2 Slow Rendering HTTP Server Access Logs Where is the issue? Oracle HTTP Oracle HTTP IBM HTTP IBM HTTP Server 2 JVM JVM Oracl e DB Databas e Portal 1. Images stored on App server rather than web 2. Large Page size 3. Uncompressed data Images from App to Web Solution and Best Practices: Static objects and images should be hosted on web server Compress the data prior to sending it to the browser client Configure caching for static objects - 7 -
Extrapolation Disparate capacity between PERF and PROD No uniformity across the servers (approx. 25% of PROD) Limitation in terms of tool license (250 users) Challenges Testing to be done on lower capacity Build the confidence for PROD roll-out PERF results should prove that PROD capacity is scalable PERF Capacity (%) vs. PROD CPU RAM IBM Web Tier 33.33% 50.00% IBM App 66.67% 100.00% CRM Web Tier 33.33% 16.67% CRM App Tier 22.22% 22.22% Gateway Tier 100.00% 66.67% DB Tier 66.67% 16.67% Extrapolation Techniques Analytical models works for specific hardware configuration Linear extrapolation works until any performance bottleneck is reached - 8 -
Transactions Per Minute Resource Continued What model? One CPU unit: 1. Setup servers with one CPU units on each tier 2. Execute tests on single CPU units and validate against SLAs 3. Find the breakpoint where TPS starts degrading 4. Repeat same exercise on 2 CPU units and validate if principle works 50 45 40 35 30 25 20 15 10 5 0 25 50 75 100 125 150 175 200 225 250 Concurrent Users 100 90 80 70 60 50 40 30 20 10 0 TPS Resource Breakpoint criteria Note: One CPU unit does not necessarily mean 1 CPU on each server, if the database server needs at least 2 CPU cores to run then that is 1 CPU unit for DB. Response Time CPU Utilization 4 sec for search <= 75 Warning 5 sec for insert <= 90 Threshold - 9 -
Physical Load Balancer Continued One CPU unit (1 path) Transactions Per Minute Target Actual Breakpoint 1 Unit 17.50 17.50 48 2 Units 35 35 98 PROD 140 NA NA Call Center Reps External Partners Oracle HTTP Oracle HTTP IBM HTTP IBM HTTP Server 2 JVM JVM SOA Services Oracle DB Database CRM Portal Down Targeted to prove the hardware scalability with this model Assumed that application will scale linearly ( CRM being a proven architecture) Breakpoint on the PERF setup is close to 70% of PROD TPS, minimized linear scalability risk Best Practices Application is optimized prior to extrapolation exercise TCS research lab says; Mixed (Linear + Statistical) model predicts real-time scalability (PerfExt is a TCS tool that works on this principle, time limitations did not permit to explore this option) - 10 -
Open floor Q&A - 11 -
Thank You 12