Understanding Server Configuration Parameters and Their Effect on Server Statistics Technical Note V2.0, 3 April 2012 2012 Active Endpoints Inc. ActiveVOS is a trademark of Active Endpoints, Inc. All other company and product names are the property of their respective owners. 2012 0
Content Introduction... 3 Time to Save Process... 3 Time to Obtain Process... 4 The Process Count Server parameter... 5 The Process Idle Timeout Server parameter... 5 Additional Performance Statistics in ActiveVOS v9.0... 6 Other important parameters influencing performance... 7 What can be tweaked... 8 Obtaining Troubleshooting Assistance... 9 About Active Endpoints...10 Copyright 2012, Active Endpoints, Inc. Page 2 of 10
Introduction This tech note describes certain key monitoring properties observed in the ActiveVOS Admin console > Monitor > Server Monitoring > Server Statistics. These historical engine statistics are collected by ActiveVOS inmemory and are aggregated in intervals. The default threshold period and the evaluation frequency for collecting/aggregating statistics is five minutes. If the period of default minutes is needed to be modified, then these are configurable in the ActiveVOS admin console at Admin > Configure Server > Monitoring Thresholds. Time to Save Process The Time to Save Process is strictly the amount of time required to serialize and write all process state data (activity state, variables, etc.) to the database. If you re seeing outliers (time to save that are orders of magnitude over the average), these can only be attributed to these two reasons: 1. The time required to obtain a database connection. Suggested investigation: Are there enough connections in the pool? Hint: The "Time to acquire database connection" would indicate this. If this parameter increases from time to time, then you can try increasing the size of the database connection pool, using your application server s settings. You should also refer to the application server s monitors to determine if there are any large pools of requests backing up. For Oracle WebLogic, refer to: - http://download.oracle.com/docs/cd/e12840_01/wl s/docs103/consolehelp/taskhelp/jdbc/jdbc_datasou rces/monitordatasourcestatistics.html - http://download.oracle.com/docs/cd/e12840_01/wl s/docs103/consolehelp/pagehelp/jdbcjdbcdatasour cesjdbcdatasourcemonitorstatisticstitle.html 2. The time required to write data to the database. a. Suggested investigation: Is there a particularly large process involved? Copyright 2012, Active Endpoints, Inc. Page 3 of 10
Time to Obtain Process The Time to Obtain process includes the time it takes to acquire a lock on a process as well as restore its state from storage if necessary. The time it takes to obtain a process is useful to determine if this operation is trending significantly higher under load situations. The monitoring includes maximum and average values. What s involved: 1. Server threads first attempt to acquire a lock on a process. If another thread is holding the lock, it needs to wait. This could be attributed to contention obtaining a process lock. The following might be the reasons for contention: a. The process is loaded on another node (i.e. the node in which the process was served initially is down for some reasons or the load balancer for some reasons choose to load the process in a different node. In that case, the load balancer settings would need to be verified.) b. Multiple requests are made to the same process. c. Someone is loading state in the process viewer (Active Process Detail) while requests for the same process are active. Should not impact much, but is likely to complicate the time to obtain the process based on the factors of size of the process, variables used, etc.) 2. If the process is already in memory on a given node, it will simply acquire the lock and we re done (i.e. the engine has acquired the process ). 3. If the process is not in memory when requested and needs to be loaded into memory, the engine needs to read it (its definition, activity state, variables, etc.) from the database. The process count and the process idle time out parameters are to be looked at. See explanation below. Copyright 2012, Active Endpoints, Inc. Page 4 of 10
The Process Count Server parameter The process count (a server configuration parameter, in admin > configure server > server properties) defines a threshold of how many processes should be loaded at once in memory. The Process Count is a soft limit and the server will wait for 5 seconds before allowing the process to enter memory. To avoid the 5 second delay, increasing Process Count will have the effect of keeping the process instance and its state in memory. This will have the effect of increasing memory requirements. Throughput is higher and response times are shorter when the process instance is already in memory and does not have to be brought in from disk, so increasing this value is desirable. To reduce demands on memory, the, free heap space should be monitored under a typical workload to see if this value can be increased. When there is room in the process manager, the process state will be loaded from the database. The performance impact is similar to those for Time to Save Process (# of connections available, size of process data, etc.) The Process Idle Timeout Server parameter You can find the Process Idle Timeout server configuration parameter, in admin > configure server > server properties. This setting specifies the time in seconds that a process instance must be idle, before it becomes eligible to be purged from memory (thus freeing up one of the process instance slots the number of which is set using Process Count) and subsequently persisted to the database. Too short a timeout can cause premature purging. Hints: Generally, increasing the process idle time (i.e. lag time) will help ensure that processes remain in memory to avoid reloading from the database. The downside is there may be idle processes taking up room in memory for longer than they need to. In high load situation, the idle time out must ideally be the recommended default number (10 seconds), to allow a swapping of the process which is already waiting to acquire a memory slot. However, the ideal setting for the process idle timeout is entirely dependent on the typical interval between requests for your particular set of processes. You want it to be slightly longer than the expected interval so the process is likely to be in memory Copyright 2012, Active Endpoints, Inc. Page 5 of 10
when a request arrives, but doesn t hang out in memory needlessly for long periods. Another aspect of process idle time is that processes in memory have journal entries associated with them and are eligible for recovery. The more processes in memory, the longer your recovery time will be on server restart. If you notice a steady increase in the memory heap usage, and the amount of free available memory is going down steeply, then it is advised to reduce the idle timeout seconds, so that when the process idle timer goes off, the process state is saved in full and the process leaves memory. Only the in-flight processes are counted towards the process count and process idle timeout factors. Any process is in a final state (i.e. completed/faulted), would not be stored in the memory and will not be counted for these factors. The only workload where a high setting (more than 50) of this value is appropriate is, when it is certain that *all* the process instances can be accommodated in memory simultaneously. An example of that would be an environment where the processes are very small, and are bound to complete very quickly. In these cases, for achieving maximum responsiveness and performance, both the process count and the process idle time out count can be increased to a very large vale (ex: process count can start with 10000, process idle time out can be set at 60). Additional Performance Statistics in ActiveVOS v9.0 In ActiveVOS version 9.0, we introduced many other performance and health monitoring capabilities. For instance, our 9.0 System Performance page includes the database connection pool statistics for application servers that provide a good level of monitoring for data sources. Please refer to our documentation on this topic at: http://infocenter.activevos.com/infocenter/activevos/v90/index.jsp?to pic=/com.activee.rt.bpeladmin.enterprise.help/html/svrug3-3-4.html Copyright 2012, Active Endpoints, Inc. Page 6 of 10
Other important parameters influencing performance There may be a times when requests to the services are not dispatched as quickly as they should and it is possible that this may lead to a backlog of requests which in turn will degrade the performance of the server. We describe below things to look for that may help you identify the need to tune the server. If there is an upward trend of the numbers of the following parameters, the most likely cause is a lack of work manager threads available to process requests. i) Under Server Monitoring > Server Statistics, does the "Work Manager Start Delay" increase over a period of time? The default current interval is 5 minutes. To change the interval update it on the Admin > Monitoring Thresholds page) ii) Are there many backed up requests i.e. Queued displayed at Monitor > Dispatch Service? Note that the Dispatch service was introduced starting v9.1, which helps throttling requests to the ActiveVOS engine. iii) When there is a problem with the performance use Monitor > System Performance and monitor metrics over a period of 5-10 mins. The main area to look at is follows: the Monitor > System Performance > Node Monitoring Work Manager, In-Memory Processes, and Unmatched Receives Work manager: - Are there many idle thread counts or is it 0 (which indicates that there are no new threads at the moment)? - A large Queued Request Count? Unmatched Receives section: - Many Timed-Out Messages? - Does the Average Message Waiting Time (ms) increase over time? iv) Are there timeouts reported in the server logs i.e. similar to: "org.activebpel.rt.bpel.impl.aebpelexception: Timeout waiting for reply from process ID 0 (xxxx). Copyright 2012, Active Endpoints, Inc. Page 7 of 10
at org.a",erlh21st,"org.activebpel.rt.bpel.impl.aebpelexceptio n: Timeout waiting for reply from process ID 0 (xxxx). What can be tweaked 1. Increase the Work Manager Thread Pool Max Count: You can try increasing the number of threads in the work manager. To control this, you can increase the max thread count from the default of 100, to a level high enough (say 300-400 to start with) to accommodate any foreseeable load and then adjust the dispatch manager settings to throttle the number of executing threads (Max Concurrent) to a point well below the max thread count so you don t run right up against the limit. 2. You can create additional dispatch service configurations for your services or for the system services, so that request to the services are stored and processed at a that can actually be handled by the server. The dispatch service also provides a way to limit the number of process instances that are calling the back end service simultaneously. Refer to http://infocenter.activevos.com/infocenter/activevos/v91/index.jsp?to pic=/com.activee.rt.bpeladmin.enterprise.help/html/svrug3-1-18.html 3. Look for evidence that a process spawns many other instances. If so, it is recommended to throttle requests for that particular process/service via dispatch configuration to a level that the server can handle the generated requests. Note that, subprocess invokes bypass the dispatch manager entirely and get dispatched directly to the target process so any throttling needs to be performed on the parent process. Copyright 2012, Active Endpoints, Inc. Page 8 of 10
Obtaining Troubleshooting Assistance If users still need more assistance, please submit a support request in the ActiveVOS support forums with the following information: 1. Server Configuration:- Type and Version of: ActiveVOS Application Server Database Operating System containing the application server JVM Any clusters involved? 2. Is the performance problem consistently reproducible and occurs even after restart? 3. What is the approximate number of total processes in the system when the problem occurs? If it can t be determined from the ActiveVOS console, users can query the AeProcess table to find the count and currently running processes (ProcessState=1) 4. Are there non-persistent processes involved in the application design? 5. Provide screen prints of the following from the ActiveVOS console Export of the configuration from ActiveVOS Home > Server Status Monitor > Dispatch service (main screen) Monitor > Dispatch Service > individual configurations including system default Monitor > Server Statistics, 3 screen prints captured every 5 mins. Monitor > System Performance, 3 screen prints captured every 5 mins. Application Server logs 8-10 Thread dumps captured in a span of 1 minute Copyright 2012, Active Endpoints, Inc. Page 9 of 10
About Active Endpoints Active Endpoints (www.activevos.com) ActiveVOS is the leader in serviceoriented BPM software for process automation. ActiveVOS empowers project teams to create business process management (BPM) applications using services, making their businesses more agile and effective. ActiveVOS promotes mass adoption of SOA-enabled BPM applications by focusing on accelerating project delivery time with a complete, affordable and easy-to-use system. Active Endpoints is headquartered in Waltham, MA with development facilities in Shelton, CT. To find out how Active Endpoints can help your business, visit http://www.activevos.com, call +1 781 547 2900 and press 1 for Sales, or email us at info@activevos.com. Copyright 2012, Active Endpoints, Inc. Page 10 of 10