PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General performance solutions Performance solutions for OO software Language independent solutions C++ solutions Java solutions Slide 2 Performance oriented design Performance s generalize and abstract the knowledge that performance specialists use Complement quantitative performance assessment; do not replace it Use performance s to design and/or improve the system; then use quantitative techniques to assess the effect of design alternatives on performance Principles for performance oriented design Performance control s Independent s Synergistic s Slide 3
Performance control s Help controlling performance of an evolving system Performance objective Instrumenting Define specific, quantitative, measurable performance objectives for each performance scenario Avoid vague or qualitative performance objectives (e.g., system shall be fast ) Instrument the system as you build it to enable measurement and analysis If you can t measure it, you can t control it Make data collection mechanisms part of the system s requirements and design Slide 4 Independent s Improve performance by reducing its computer resource requirements Centering Fixing-point Locality Processing vs. frequency Focus on the parts of software that have the greatest impact on performance Identify the dominant workload functions (e.g, use cases) and minimize their processing; use Fast path pattern For responsiveness, fixing should establish connections at the earliest feasible point in time, such that retaining the connection is cost effective Create actions, functions, and results that are close to physical computer resources used to produce them Special Temporal (i.e., time) Effectual (i.e., purpose or intent) Degree (i.e., intensity or size) Number of request received and the amount of work done per request Minimize the product of processing times frequency Slide 5 Synergistic s Improve performance via cooperation among processes competing for resources Shared resources Speed the load Parallel processing Share resources when possible When exclusive access is required, minimize the sum of the holding time plus the scheduling time Lock the entire database for update (minimizes scheduling time, maximizes holding time) OR Lock only the individual record (maximizes scheduling time, minimizes holding time) OR Lock a group of records Similar to Shared resources Minimize the number of processes that need the resource at a given time or minimize the amount of the resource that they need In some cases there may be overlap between the these two s Execute processing in parallel only when the processing speedup offsets communication overheads and resource contention delays Real concurrency processes execute simultaneously on different processors Apparent concurrency processes are multiplexed on a single processor Slide 6
Outline Principles for performance oriented design Performance testing Performance tuning General performance solutions Performance solutions for OO software Language independent solutions C++ solutions Java solutions Slide 7 Stress testing Stress testing is testing with high workload, to the point where one or more, or all resources are simultaneously saturated Intention of a stress test is to break the system, i.e., to force a crash Slide 8 Stress testing Stress testing does the following Distorts the normal order of processing, especially processing that occurs at different priority levels Forces the exercise of all system limits, thresholds, or other controls designed to deal with overload conditions Increases the number of simultaneous actions Forces race conditions Depletes resource pools in extraordinary and unthought sequences Slide 9
Stress testing Benefits Faults caught by stress testing tend to be subtle Faults caught by stress testing are often design flaws that may have implications in many areas When to stress test Whenever possible, early and repeatedly As a part of systems acceptance test Slide 10 Performance testing Objectives Show that the system meets specified performance objectives Determine the factors in hardware or software that limit system performance Tune the system Project the systems future load handling capacity Slide 11 Performance testing Performance testing presumes a robust, working, and stable system Faults that have an impact on the system s function have been removed Extreme example - If a fault crashes the system no rational performance testing can be done Faults that affect performance could range from poor design to poor implementation Slide 12
Performance testing Prerequisites Clear statement of performance objectives Workload to drive the experiment Controlled experimental process or testbed Instrumentation to gather performance related data Analytical tools to process and interpret the data Slide 13 Performance testing Problems with performance objectives There is no statement of performance objectives, or a statement is so vague that it cannot be reduced to a quantitative measure There is a clear quantitative statement of objectives, but it cannot be measured in practice Excessive resources and effort Excessive experiment duration There is a clear quantitative statement of objectives, but the objectives are unachievable at reasonable costs Slide 14 Performance testing Performance objectives depend on the domain; acceptable response time could be A few milliseconds for an antiaircraft missile control A few tens of millisecond for a nuclear reactor control A few seconds delay in getting a telephone dial tone Half a minute to answer DB query Slide 15
Complications and variations There is more than one type of workload Probability distribution for different workloads Different objective for each type of workload Example: a response time at 4 messages per second shall be less than 2 seconds, and a response time at 8 messages per second shall be less than 8 Performance may be intertwined with a quantitative reliability/availability specification Different workload-response time relations are allowed under different hardware/software failure conditions Slide 16 Complications and variations Analysis and measurement under time varying workload Consider different situations peak hour, average hour, peak day, etc. Slide 17 Stress and performance testing QA tasks Include workload generation as a major budget item Select workload generation methods; start workload generation development at the same time as software development Plan software instrumentation in support of performance testing as a part of system design; develop, publish, and discuss embedded software instrumentation as early as possible Slide 18
Stress and performance testing QA tasks Tie down workload statistics and parameters as early as possible in written form Start stress testing as early as possible; subject the system to stress whenever possible Include stress test as a part of the formal system acceptance test Accept no performance criteria that cannot be measured and validated with the allocated personnel, schedule, and budget Slide 19 Stress and performance testing QA tasks Plan performance testing as a specific, recognizable activity to be done on a testbed, and if necessary in the field Be sure to allow enough time for system to stabilize before attempting field performance testing Run performance tests intermixed with other system tests to detect faults that are performance related Slide 20 Outline Principles for performance oriented design Performance testing Performance tuning General performance solutions Performance solutions for OO software Language independent solutions C++ solutions Java solutions Slide 21
Performance tuning Why the system might need performance tuning? Application developed without using SPE Choices made for implementation alternatives were not optimal for the application Language used has features that significantly affect performance Unanticipated performance requirements need to be met Scalability objectives need to be met In general, a tuned system does not exhibit the level of performance as could have been achieved by considering performance issues from the beginning Slide 22 Performance tuning Tuning is usually done late in the software life cycle (implementation or deployment) Identify and focus on those areas that have the highest potential payoff There is not enough time or it is not cost effective to redesign the software Slide 23 Performance tuning 1. Prepare test plans - identify objectives to be achieved and measurements to be made Identify performance problems to be studies (e.g., use cases and scenarios) Identify the important workloads for scenarios (e.g, Web application requests, sensor sampling rates) and characterize their properties Define the data needed and the measurements and reports to produce it Identify measurement tools and specific procedures for their use Slide 24
Performance tuning 2. Conduct measurements studies Document the workload characteristics Collect system level data (e.g., CPU utilization, I/O rates, average I/O service time, communication line utilization, traffic rates, message sizes) Collect process level data (e.g., number of processes active, processes execution time, CPU usage by process, the amount of time processes are blocked and why, remote procedure calls, memory usage statistics) 3. Use the quantitative date obtained in steps 1 and 2 to identify bottleneck(s) Slide 25 Performance tuning 4. Evaluate the relative payoff of tuning the overall system versus tuning the software Changes made to the system (hardware or network configuration) are usually easier 5. Identify the processes that are the heaviest users ( heavy hitters ) to the bottleneck device(s) 6. Profile the heavy hitter processes to identify the hot spots within these processes 7. Identify performance solutions and quantify their risk (e.g., development effort, maintenance impact, cost, payoff) 8. Select appropriate solutions, implement them, and conduct performance tests Slide 26 Outline Principles for performance oriented design Performance testing Performance tuning General performance solutions Performance solutions for OO software Language independent solutions C++ solutions Java solutions Slide 27
Response time General performance solutions Fast path speed up 20% or less of code is executed as result of dominant workload functions Leave only the essential code Select optimal algorithms and data structures for typical case Remove unnecessary branching or context switches to eliminate disruptions of the hardware instruction pipeline Change the order in which data is stored to minimize the number of memory cache misses Some profiler based compiler optimizations identify and improve Fast paths Slide 28 General performance solutions Improve scalability Scalability is the ability of a system to continue to meet its response time or throughput objectives as the demand for software functions increases The change from linear to exponential increase is usually due to some resource in the system nearing 100% utilization You must know where the knee of the scalability curve falls for your system Add additional resources to remove the bottleneck Number of request per unit of time Slide 29 General performance solutions Algorithm and data structure choices Time versus Space trade-offs Trading space for time Compute the results of expensive functions once, store the results, and satisfy subsequent requests with a table lookup (example of Fixing-point ) Caching data that is accessed most often makes it cheapest to access (Embodies Centering, Fixing-point and Processing versus frequency s) If data access patterns are such that the last item retrieved in a query has a high probability of being requested again, it should be cached Move each item found in a list closer to the beginning Cache the timestamp of an HTTP request Use a background thread to generate and cache frequently used Web page bitmaps Cache the results of remote communications or database queries Slide 30
General performance solutions Trading space for time Postpone processing until the item is needed (Processing versus frequency ) Defer the calculation of the line width in a word processor until it is required, rather than calculating it after each character is entered Augment data structures with extra information, or change the structure so that it can be accessed easily (Fixing-point and Locality s) Use of doubly linked list Make fewer calls and process multiple requests with each call Slide 31 General performance solutions Trading time for space Use dense storage representations to decrease storage cost by increasing the time required to store and retrieve data File compression techniques Represent common sequences of operations compactly and interpret as required Implementation of a state machine as a table Hardware/Software platform dependencies Customizing code to the hardware/software platform may be needed to achieve performance objectives Limits portability Slide 32 Outline Principles for performance oriented design Performance testing Performance tuning General performance solutions Performance solutions for OO software Language independent solutions C++ solutions Java solutions Slide 33
Performance solutions for OO software Solutions at the level of code optimization Limited potential for performance improvement Could lead to code that is harder to understand, more difficult to modify, and less reusable Use as the last resort, only on the Fast path Slide 34 Language independent solutions Reducing unnecessary object creation and destruction Why is the object creation and destruction so expensive? When you create an object, any objects that it contains must also be created, any objects that those objects contain must also be created, and so one If the object is part of an inheritance hierarchy its creation will trigger the creation of all of its ancestors When all these objects are created, the memory to hold them must be allocated When the object is destroyed, all the additional objects that were created along with it must also be destroyed, and the memory that they used must be reclaimed Slide 35 Language independent solutions The amount of work required to create and destroy an object depends on two factors Complexity of the object Number of objects that it contains Number of ancestors that it has How the memory is allocated Objects created with new use memory from the heap Object declared as local variables use memory from the stack Heap memory is more expensive to allocate that stack memory because it is allocated dynamically rather than statically Slide 36
Language independent solutions Ways to reduce object creation and destruction overhead Create simpler objects and allocate them from the stack OR Apply the Processing versus frequency minimize the product of the number of times an object is created multiplied by the amount of work performed to create it Slide 37 Language independent solutions Reducing method invocation overhead What happens when the method is invoked? The arguments to be passed to the called method are pushed on the stack, along with the address of the instruction to be returned to Location of the calling method s local variables on the stack is also saved, so that they can be restored when the called method returns When the called method returns, the above process must be reversed to restore the processor state, and to update any arguments that were changed as a result of the invocation Slide 38 Language independent solutions Exact amount of work required for a method invocation depends on the hardware/software platform Method invocation consumes between 25 100 machine instructions One way to reduce the overhead for method invocation is to inline the called method (replace the method call by expanding the body of the called method within the caller) Saves the overhead of saving and restoring process state and loading the address of the target instruction Produces large blocks of code that may be optimized by the compiler Disadvantage inlining large methods that are called from many places can significantly increase the footprint of the software Slide 39
C++ solutions Inlining to reduce method invocation overhead Use the keyword inline Only a hint that the compiler can either follow or ignore Since any change of the inlined method will require recompilation of every class that uses that method, inlining in C++ is usually deferred until late in the development process Profile based inlining profile your code to find the methods that are called frequently The best candidates for inlining are small frequently invoked methods Slide 40 C++ solutions Objects that are created with new have to be destroyed explicitly using delete Memory leak if the object is not destroyed the memory remains allocated and is unavailable for reuse by new objects; this can lead to performance degradation and crash or hang failure More efficient multithreading C++ (like C) does not provide language level support for multithreading Use a thread library (POSIX pthreads implementations or the Portable Thread Library) Slide 41 Java solutions Inlining to reduce method invocation overhead Java does not have inline keyword Decisions are made by the compiler when an optimize option is selected Slide 42
Java solutions Reducing garbage collection overhead Objects are not explicitly destroyed in Java Java runtime environment provides automatic garbage collection Garbage collector runs on a low priority thread which executes only when the system is idle The user does not have control of when the garbage collection is performed When the system is out of memory, the garbage collector will execute when heap memory is requested (e.g, new), causing a noticeable pause in user applications Pauses due to garbage collection can cause erratic results for performance measurements on Java systems Slide 43 Java solutions More efficient multithreading Java provides language level support for multithreading If used properly multithreading can be valuable for improving both responsiveness and scalability There is a tradeoff due to the overhead required to create and coordinate the threads Frequent creation and destruction of threads can add overhead to an application both for thread creation and for garbage collection Solution create ThreadPool interface that holds a number of threads that are recycled as needed Use synchronized to prevent more than one thread from accessing the block of code at a time may become performance bottleneck Slide 44