Benchmark and Performance Analysis. of a. Large Centralized Tax Processing Solution

Benchmark and Performance Analysis of a Large Centralized Tax Processing Solution Vijay Jain Venkata Sai Jayanti Murty Background (Existing Architecture) Client server architecture Total 36 RCC RCC City A Oracle Replace Forms with 4.5 one becoming single centralized obsolete system Oracle Forms for the 4.5 country Customer needed consolidation as business strategy Oracle Reports 2.5 Oracle 8i database RCC City C Each RCC works in isolation RCC City B RCC Regional Computing Centre 2 1

Background (Proposed Architecture) NCC - National Computing Centre RCC City A Oracle HTTP server Oracle 10g AS Oracle 9i Forms & Report server RCC City B Benchmark to determine if this is technically feasible. RCC City C LBS s LBS s 3 Profile 52 million returns to be processed in 4 months 10000 assessing officers across 750 locations Floated a RFP for System Integration Application Benchmarking a pre-condition in RFP Benchmark to be done on one business transaction Processing of Income-Tax Returns TCSL led consortium responded to the RFP Carried out benchmark in HP Labs, China 4 2

Objectives of the Benchmark To verify server side performance To evaluate scalability targets Apply modeling techniques to measured parameters Recommend hardware configuration for the application. 5 Rules of the Benchmark Server Partition, Disk and Volume size frozen by customer Application/Database/Load Runner scripts frozen Think Time fixed at 3 seconds Deterministic Think Time No reorganization of tablespaces structure and database permitted No application optimization permitted Configuration parameter tuning of web, app, DB servers permitted Tests to be executed: 1000, 2000 users for 1 million transactions and 4000 users for 1, 2, 4 million transactions 6 3

Logical Diagram Load Controller Mercury LoadRunner 8.0 Load Generator #1 Load Generator #2 Load Generator #3 Load Generator #4 Load Generator #10 Load Balancer Radware WSD 3.0 Web Server #1 Web Server #2 Web Xeon Server 2 CPU/4GB #3 Web DL480 Server Win2K #4 Web Server #6 App & DB server partitioning as per RFP Application Server #1 Superdome 32CPU/128GB partition PA8800 1Ghz Oracle 9.2. Application rp8420 8CPU/32GB Server partition #2 Database Server #1 (RCC) Application PA8900 Server 1.1GHz #3 Database Server #2 (NCC) Application Server #4 Oracle 10gAS 9..0 Superdome 32CPU/128GB partition PA8800 1Ghz Oracle 9.2. XP12000 32GB Cache 146 GB 10K rpm 1 8 1 4 7 Business Transaction Benchmarked Main Screen Adjustment/ Demand Save Computed Records Open Refund Details Login R T 1 R 2 R 3 R 4 0 R 0 T 12.5% 1 T 2 T 3 T 6 T 4 R R 6 R 5 I 1 R7 T 9 T 8 T 7 T 5 R 9 R 8 Logout Close Print Results Open Print Results Save Refund Details Think Time Response Time Inter-iteration Time (T 0 +) T 1 + T 2 + T 3 + T 4 + T 5 + T 6 + T 7 + T 8 + T 9 + I = 3 sec R i < 1 sec for i = 0,..,9 Observed cycle time of 120 sec in reality 8 4

Test Configuration 1000 Users 1 Million Returns 16 CPU 32GB RCC 16 CPU 32GB NCC Database Server Application Server Web Server 9 Initial Glitches With 3 sec think time response time target surpassed at 500 users itself Heavy buffer busy, row lock waits Tuning and adding resources were of little help Raised the issue with customer 10 5

Business Transaction Benchmarked Main Screen Adjustment/ Demand Save Computed Records Open Refund Details Login R T 1 R 2 R 3 R 4 0 R 0 T 12.5% 1 T 2 T 3 T 6 T 4 R R 6 R 5 I 1 R7 T 9 T 8 T 7 T 5 R 9 R 8 Logout Close Print Results Open Print Results Save Refund Details Think Time Response Time Inter-iteration Time (T 0 +) T 1 + T 2 + T 3 + T 4 + T 5 + T 6 + T 7 + T 8 + T 9 + I = 3 sec R i < 1 sec for i = 0,..,9 Observed cycle time of 120 sec in reality 39 sec 11 Test Configuration 2000 Users 1 Million Returns 3 128 GB RCC 3 128 GB NCC Database Server Application Server Web Server 12 6

Test Results: 1000 & 2000 users Test Type CPU Used (RCC) Business TPS Completion Time (Hrs) Returns processed User Login Main Screen Compute Refund Details Print Result User Exit DB CPU % Apps CPU % Web CPU % 1000/ 1million 16 25 11:09 1 M Avg 0.1 1.1 ~45% ~15% 95 th pct 0.5 0.1 1.2 2000/ 1 million Throughput Utilization 32 48 05:43 1 M Response Times sec 1.0 ~50% Throughput scales almost linearly no apparent bottleneck 0.5 1.1 Meets RFP Criteria < 10% 13 4000 user test Results on App/DB configuration used for 2000 users Results on higher number of CPUs Scalability limiting analysis Further analysis of response time Base Configuration 3 128 GB RCC 3 128 GB NCC Database Server Application Server Web Server 15 7

4000 user results: 32CPU App, 3 RCC Test Type CPU Used (RCC) 4000/ 1 million 32 2000/ 1 million 32 Business TPS Completion Time (Hrs) Returns processed User Login Main Screen Compute Refund Details Print Result User Exit Throughput 47 06:03 1 M Response Time Avg 0.6 0.7 13.7 2.1 24.0 1.1 Utilization X* 95 th pct 1.6 2.1 17.1 4.5 29.3 1.2 48 05:43 1 M 1.0 DB CPU % ~50% ~50% Apps CPU % ~45% Web CPU % ~5% < 10% 16 4000 user test: Impact of adding DB CPUs No. of Users 2000 4000 4000 4000 No. of RCC CPU 32 32 48 56 No. of App Server 4 4 5 5 Business Throughput 48 tps 47 tps 47 tps 47 tps Response Times (seconds) User Login 0.5 2 Main Screen 2 same as for same as for Compute 0. 3 17 3s 3s Refund Details 0. 3 5 Print Result 29 User Exit 1 1 Utilization DB CPU % ~50% ~50% ~45% Apps CPU % ~45% Further Database Tuning also did not help 17 8

4000 user test: Network, Disk, Memory on DB server < 192Kbps on 1 Gbps lan Nothing to be alarmed about < 45% No disk, memory, network bottlenecks Database does not scale with CPUs Are there any bad SQLs, contention problems, wait events? Constant at 45% 19 SQL processing contribution Extract of Oracle Statspack report for 4000 user test SQL ordered by Gets for DB: RCC Instance: RCC CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value --------------- -------------- ----------------- --------- ---------- ----------- -------------- 454,871,006 87,688 5,187.4 46.8 2778.07 20102.85 3184176672 Module: f90runm@rp84201 (TNS V1-V3) SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE _CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_Y R,PAN,DT_FILED,NAME,RET_INC FROM SS_RETURN WHERE (SEQ_NO IN (SEL ECT a.seq_no FROM ss_return a WHERE A.RANGE_CD = :1 AND A.AO_NO = :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and Avg. Response time/execution= 20102.85/87688 =3 sec High physical and logical reads SQL ordered by Reads for DB: RCC Instance: RCC Avg. CPU Elapsd Main Physical Screen Reads Executions Reads per 0.7 Exec %Total Avg Rsp Time Time (s) for Time bus (s) txn = Hash 40.5 Value sec ------------------ -------------- ------------------ --------- ----------- ----------- --------------- Compute 13.7 14,416,210 87,688 164.4 83.6 3 sec 2778.07 due to 20102.85 this SQL 3184176672 Refund Module: f90runm@rp84201 Details (TNS V1-V3) 2.1 SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE Is it worth tuning? Print _CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_Y Result 24.0 R,PAN,DT_FILED,NAME,RET_INC FROM SS_RETURN WHERE (SEQ_NO IN (SEL ECT a.seq_no FROM ss_return a WHERE A.RANGE_CD = :1 AND A.AO_NO = :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and 20 9

What is a latch in Oracle DB Simple, low-level serialization mechanism to protect shared data structures in the system global area (SGA) Provide only exclusive access to protected data structures needed briefly Syntax Syntax Check Check Observed excessively high Latch contention Semantic Semantic Analysis Analysis Soft Soft Parse Parse Yes Was Was the the statement statement already already parsed parsed by by another another session? session? Latches are are applied in in these stages No Hard Hard Parse Parse 21 Database Wait Event Analysis Extract of Oracle Statspack report for 4000 user test: 30 min STATSPACK report for Snap Id Snap Time Sessions Curs/Sess Comment ------- ------------------ -------- --------- ------------------- Begin Snap: 2 31-Oct-05 18:36:38 4,009 118.7 End Snap: 3 31-Oct-05 19:06:39 4,006 120.7 Elapsed: 30.02 (mins) Top 5 Timed Events ~~~~~~~~~~~~~~~~~~~~~ % Total Event Waits Time (s) Ela Time ------------------------------ --------- ---------- -------- latch free 1,539,663 3,587,743 98.96 CPU time 28,487.79 db file sequential read 17,221,454 7,500.21 log file sync 46,102 773.02 enqueue 5,299 680.02 ----------------------------------------------------------------- Average Wait / DB txn = 2.3 * 5.1 = 11.73 sec Excessively high Latch contention (99% of total wait time) Wait Events for DB: RCC Instance: RCC Avg. Total Wait wait Waits Event Waits Timeouts Time (s) (ms) /txn ------------------------- ------- ---------- ---------- ------ -------- latch free 1,539,663 0 3,587,743 2330 5.1 5.4 db file sequential read 17,221,454 0 7,500 0 60.6 log file sync 246,102 0 773 3 0.9 enqueue 5,299 8 680 128 0.0 Latch Free wait during 2000 user test was 8.31% and Avg. wait time was 2 ms 22 10

Latch Wait s Contribution to Response Time X* 157.70 DB txns/sec for 47 business txns/sec 157.7/47 = 3.36 db txn/bus txn Snap Id Snap Time Sessions Curs/Sess Comment ------- ------------------ -------- --------- -------------- Begin Snap: 2 31-Oct-05 18:36:38 4,009 118.7 End Snap: 3 31-Oct-05 19:06:39 4,006 120.7 Elapsed: 30.02 (mins) Load Profile ~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 1,537,284.62 9,748.08 Logical reads: 539,210.76 3,419.19 Block changes: 8,615.05 54.63 Physical reads: 9,570.96 60.69 Physical writes: 982.97 6.23 User calls: 22,725.02 144.10 Parses: 14,450.63 91.63 Hard parses: 0.00 0.00 Sorts: 3,699.62 23.46 Logons: 0.00 0.00 Executes: 38,242.79 242.50 Transactions: 157.70 Main Screen Compute Refund Details Print Result Avg. 0.7 13.7 2.1 24.0 Contribution of Latch Wait Time to Response Time = Avg Wait Time/DB txn * No. of DB txn/business Txn = 11.73 * 3.36 = 39.41 seconds Average Response Time for 4000 users = 0.7 + 13.7 + 2.1 + 24.0 = 40.5 seconds 23 Parsing Analysis Extract of Oracle Statspack report for 4000 user test SQL ordered by Parse Calls for DB: RCC Instance: RCC % Total Parse Calls Executions Parses Hash Value ------------ -------------- ---------- -------------- 14,285,993 14,286,265 54.89 2588670467 Module: f90runm@rp84201 (TNS V1-V3) declare p varchar2(32767); begin p := GF_GLOBAL_POLICY(:sn, :on); :v1 := substr(p,1,4000); :v2 := substr(p,4001,4000); :v3 := substr(p,8001,4000); :v4 := substr(p,12001,4000); :v5 := substr(p,16001,4000); :v6 := substr(p,20001,4000); 2,939,514 2,939,618 11.29 2294365478 Module: f90runm@rp84201 (TNS V1-V3) declare p varchar2(32767); begin p := GF_MASTER_POLICY(:sn, :on); :v1 := substr(p,1,4000); :v2 := substr(p,4001,4000); :v3 := substr(p,8001,4000); :v4 := substr(p,12001,4000); :v5 := substr(p,16001,4000); :v6 := substr(p,20001,4000); 349,887 2,988,725 1.34 3106935379 Module: f90runm@rp84201 (TNS V1-V3) SELECT RCC_NUM From GS_EMP_RCC WHERE ORAUSER = USER Nearly 66% of of total parse calls are for for two functions :: GF_GLOBAL_POLICY & GF_MASTER_POLICY 24 11

Function Call Analysis Extract of Oracle TKPROF report of test WinterCorp 2005 2005 Report: declare p varchar2(32767); begin p := GF_GLOBAL_POLICY(:sn, :on); :v1 := substr(p,1,4000); :v2 Peak Peak := substr(p,4001,4000); OLTP worldwide on on Unix :v3 Unix := substr(p,8001,4000); = 8.6 8.6 million SQL SQL calls calls per per hour!!!! call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- -------- Parse 207 0.01 0.02 0 0 0 0 Execute 207 0.03 0.04 0 420 0 207 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- --------- total 414 0.04 247 function 0.07 calls per 0 business 420 0 207 transaction Database user calls per business transaction ( 435 * 47 = 20445 per sec) = 73.6 million per hour!!!! 1 session in tracefile. X* 435 user SQL statements in trace file. = 247 * 47 = 11,609 per sec 70 internal SQL statements in trace file. declare p varchar2(32767); begin p := 505 SQL statements in trace file. GF_MASTER_POLICY(:sn, = 41.8 million :on); per hour!!!! :v1 := 162 unique SQL statements in trace file. substr(p,1,4000); 145 SQL statements EXPLAINed using :v2 := substr(p,4001,4000); :v3 := schema: substr(p,8001,4000); AST.prof$plan_table Default table was used. call count cpu elapsed disk query current rows Table was created. Table was dropped. ------- ------ -------- ---------- ---------- ---------- ---------- -------- 5295 lines in trace file. Parse 40 0.01 0.00 0 0 0 0 Execute Number 40 of calls 0.01 cannot 0.01 be reduced 0 54 by parameter 0 40 tuning or adding hardware. Fetch 0 0.00 0.00 0 0 0 0 Only way out is change the application and hence reduce number of parses / ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total latches 80 / sec. 0.02 0.02 0 54 0 40 25 2000 to 4000 user analysis: 39 sec Think Time 2000 user 4000 user Average Rsp Time of Entire Business Txn Business Throughput Average Latch Wait Time 1.1 sec 48/sec 2 ms 40.5 sec 47/sec 2,330 ms If business throughput is not changing why does response time and wait time increase so drastically by doubling the number of users? Queueing Theory : Closed System under saturation R = N * D max Z where R = average response time of business txn N = number of users Z = think time D max = demand at bottleneck resource = visit count at bottleneck resource/business txn x N = 2000, R = 1.1, Z = 39 D max = 20 ms N = 4000, Z = 39, D max = 20 ms R = 4000 * 20ms 39 = 41 sec average service time at bottleneck resource 26 12

Determinants of Performance Is 39 sec meaningful? Application Tuning Time Spent in Components ConfigNo parameters changes well allowed tunedduring Adding benchmark. more in one Main box does bottlenecks not help identified. Workload Think Time in Transactions Infrastructure Resource Consumption 27 Think Time Analysis With 39 sec think time : 52 Million returns can be processed in mere 1.6 months Though production target is 4 months So what is the more reasonable think time? Considering : No. of hours/day = 5 Throughput required to process 52 Million returns = 36.11 TXNs/Sec Applying Little s Law : N = (R+Z) * X 4000 4? 36.11 Avg. Think Time = (4000/36.11) 4 = 106.77 Sec 28 13

Test Results: 95 sec Think Time Test Type 1000/ 1million 2000/ 1 million 4000/ 1 million 4000/ ½million (95) Throughput Business TPS 25 48 47 38 Completion Time (Hrs) 11:09 05:43 06:03 03:36 Returns processed 1 M 1 M 1 M ½M User Login Main Screen Compute Refund Details Print Result User Exit DB CPU % Apps CPU % Web CPU % Response Times sec 95 th percentile 0.5 0.5 2 0.5 2 0.1 17 5 29 1.2 1.1 1 1.0 Utilization ~45% ~50% ~50% ~45% ~45% ~45% ~35% ~15% ~10% ~5% ~5% 29 Conclusions Benchmark has proven that application scales linearly till 2000 users with the specified think time. Recommendations given for further scalability Analysis of the results and workload has proved that the workload defined for the benchmark is unrealistic System can meet the target of processing 52 million returns in 4 months, with reasonable think time and workload IMPACT : The RFP was scrapped and new requirements were laid down with more realistic workload and flexibility 30 14

Learnings Gauge the problem - Is it feasible to achieve target with resolution? Whenever doing benchmark, work on performance engineering model to derive whether it will work or not Evaluate contribution of wait event to response time while doing database analysis Every application has certain scalability limits, no matter what hardware you deploy upon. Adding hardware is not the solution for all the problems Choosing realistic workload is imperative to ensure its achievability and applicability of the results Unrealistic workload may cause cost and schedule overrun exponentially 31 Q & A 32 15