Combining Static and Dynamic Impact Analysis for Large-scale Enterprise Systems The 15th International Conference on Product-Focused Software Process Improvement, Helsinki, Finland. Wen Chen, Alan Wassyng, Tom Maibaum McMaster Centre for Software Certification (McSCert) Department of Computing and Software McMaster University Hamilton, Ontario, Canada December, 2014 1 / 29
Outline 1 Large-scale Enterprise Systems Introduction Characteristics Changes Are Inevitable 2 Conventional Impact Analysis Introduction Static Analysis Dynamic Analysis 3 Combining Static and Dynamic Analysis 4 The Approach at a Glance 5 Empirical Study 6 References 2 / 29
Large-scale Enterprise Systems Introduction Enterprise systems (ES) are large-scale application software packages that support business processes, information flows, reporting, and data analytics in complex organizations. Types of ES include but not limit to: Example Enterprise Resource Planning (ERP) Systems Customer Relationship Management (CRM) Systems Supply Chain Management (SCM) Systems Oracle E-Business Suite, SAP ERP, Red Hat JBoss 3 / 29
Large-scale Enterprise Systems Characteristics Scalable. Complex. Critical. Costly. Example Total number of MODULES in SAP ERP: 241. Total number of CLASSES in Oracle E-Biz: 230 thousand. Total number of METHODS in Oracle E-Biz: 4.6 million. Large companies can spend $50 million to $100 million on software upgrades. 4 / 29
Large-scale Enterprise Systems Changes Are Inevitable System upgrade User requirement change Environment change Performance issue Other customized changes The latest IT Key Metrics Data from Gartner (gartner12, 2011) report that in 2011 some 16% of application support activity was devoted to technical upgrades, rising to 24% in the banking and financial services sector. While making changes, current blind testing strategy is very expensive and time-consuming, however, the actual effect of a change may in fact be minimal. 5 / 29
Large-scale Enterprise Systems Changes Are Inevitable A well-defined change impact analysis is required to: reduce risks of unintended changes reduce costs minimize human efforts focus testing help identify uncovered false negatives 6 / 29
Software Change Impact Analysis Introduction Software Change: Operations {add, modify, delete...} on software entities {function, field, logic, module, database objects...} Change Impact Analysis: Estimates WHAT will be affected in software and related documentation if a proposed software change is made (Bohner, 1996). 7 / 29
Software Change Impact Analysis Static Analysis Static analysis is to identify a subset of affected elements of the program by analysing the code abstract all possible software behaviors by graphs (call graph, dependency graph...) or other static representations Static analysis is safe and complete, but it often comes up with too large impact sets due to the over conservative assumptions: the actual dependencies may turn out to be considerably smaller than the possible ones. Additionally, it usually requires long execution time. 8 / 29
Software Change Impact Analysis Dynamic Analysis Dynamic analysis is to identify a subset of affected elements of the program by analysing runtime information collect dynamic information such as: event traces, test coverages, executions in the fields Dynamic analysis is precise and efficient, but it often comes up with incomplete analysis due to under-estimation. 9 / 29
Combining Static and Dynamic Analysis Aspect-oriented programming (AOP) The hierarchical modularity mechanisms in object-oriented languages are extremely powerful, but they are inherently unable to modularize all concerns of interest in complex systems. (Kiczales et al., 2001) Aspect-oriented programming (AOP) does for concerns that are naturally crosscutting what OOP does for concerns that are naturally hierarchical, it provides language mechanisms that explicitly capture crosscutting structure. (Kiczales et al., 2001) 10 / 29
Combining Static and Dynamic Analysis AspectJ AspectJ adds to Java a new concept, joint point, and some constructs: pointcuts pick out certain joint points in the program flow; After pointcuts pick out join points, we use advice to implement crosscutting behaviour. Advice brings together a pointcut (to pick out join points) and a body of code (to run at each of those join points); 11 / 29
Combining Static and Dynamic Analysis AspectJ Inter-type declarations in AspectJ are declarations that cut across classes and their hierarchies. They may declare members that cut across multiple classes, or change the inheritance relationship between classes; The definition of aspects is very similar to classes, which wrap up pointcuts, advice, and inter-type declarations in a a modular unit of crosscutting implementation. 12 / 29
Combining Static and Dynamic Analysis AspectJ Example Code Figure : Aspect Trace 13 / 29
Combining Static and Dynamic Analysis AspectJ Example Output Sample Figure : Output Sample for MGPAPP.class 14 / 29
Combining Static and Dynamic Analysis Benefits integrates with our safe static analysis (Chen et al., 2013); provides precise estimation of impacts; works at bytecode level; does not alter system behaviour in any ways; saves efforts in learning the application logic; efficient in both time and space (seconds/class and kilobytes/class). 15 / 29
The Approach at a Glance Analysis Overview input Atomic Changes (AC) Change Analysis Changes (C) Reverse Search Reachability Analysis Static Analysis Access Dependency Graph Static Impacts (S) subtract Potential False- Positives (PO) Alias Analysis output Dynamic Analysis Dynamic Impacts (D) union Impact Set (I) Enterprise System Figure : Analysis Process Overview 16 / 29
The Approach at a Glance Analysis Overview Steps in our approach include(chen et al., 2013) (Chen, Wassyng, & Maibaum, 2014): (i) Static analysis to abstract a representation of the target program P. A full dependency graph G is built for the system at functions level. (ii) Change analysis to identify direct and indirect changes. The identification of indirect changes may require String Analysis. 17 / 29
The Approach at a Glance Analysis Overview (iii) Graph searching algorithm is employed to extract a static impact set S. The static impact set S is conservative but safe, we will be cutting off false positives from within this set. (iv) Instrumenting the program P to collect a dynamic impact set D. The dynamic impact set D contains real execution information that we should keep in the static impact set S. 18 / 29
The Approach at a Glance Analysis Overview (v) Reachability analysis to filter out other unidentified paths in dynamic analysis that are false positives. Paths taken into account in this step are those that haven t been executed in dynamic analysis but have the potential of reaching a direct/indirect change. Paths filtered out in this analysis are considered as infeasible paths (mis-matched calls and returns). (vi) Pointer/aliasing analysis to further filter out unidentified paths. If there isn t any variable along a particular path aliased to any variable within a changed method, this path can be regarded as a false positive. Different from the infeasible paths identified in reachability analysis, paths filtered out in this analysis are feasible but won t be affected by the changes. Note that, step (v) and step (vi) are continued research appeared in (Chen et al., 2014). 19 / 29
Empirical Study Target system: Oracle E-Business Suite Version 11.5 Source of changes: Oracle patch # 5565583, 10107418, 14321241 Objective: identify the impact set of the patches Physical environment: Quad core 3.2GHz CPU, 32GB RAM, 64-bit Red Hat Linux Enterprise version 20 / 29
Empirical Study Cont d Figure : Oracle E-Business Suite System Architecture Modules: CRM, CSM, Financials, SCM, HRMS... 21 / 29
Empirical Study Cont d Oracle E-Business Suite V11.5: Number of classes: 195 999 Number of entities (functions and fields): 3 157 947 Patches will be affecting both application tier and database tier. Patches Size Number of direct changes Patch # 5565583 212MB 52 870 Patch # 10107418 10KB 0 Patch # 14321241 99MB 230 209 22 / 29
Empirical Study Empirical Results Oracle E-Biz Numbers Classes 195 999 Entities 3 157 947 Static dependencies 18 387 466 Dynamic dependencies 8 200 Reduced dependencies after 11 521 769 reachability analysis and aliasing analysis 23 / 29
Empirical Study Results Figure : Oracle E-Business Suite 11.5, Patch 5565583 and its impacts. 24 / 29
Empirical Study Empirical Results Patches 5565583 10107418 14321241 Size 212MB 10KB 99MB Number of direct changes 52 870 0 25 114 Affected functions 699 534 0 230 209 Affected functions % 22% 0% 7.3% Affected top functions 160 800 0 69 971 Affected top functions % 5.1% 0% 2.2% Static Analysis Dynamic Analysis Reachability and Alias Analysis 9.5 64.3 76.3 83.3 0 16.3 66.3 88.3 hours Mapping Build access Extract changes dynamic Build Instrumentation Compute CFG dependency and compute impacts and supergraph Gather results graph static impacts Compute aliasing information Figure : Analysis Time for Patch #5565583 25 / 29
Summary Achievements We have developed a multi-tasking, aspect-oriented instrumentor to adequately instrument large-scale systems and collect traces at bytecode level. We have successfully combined static analysis and dynamic analysis. Static analysis was used as the input to dynamic analysis, providing a safety guarantee whenever full potential impacts are needed. We have empirically demonstrated the practical applicability of the improved approach on a very large enterprise system involving hundreds of thousands of classes. Such systems are perhaps 2 orders of magnitude larger than the systems analyzed by other approaches. 26 / 29
Summary Future Work Running time still needs to be improved; The impacts identified by dynamic analysis was only a small portion of the static impacts (0.015%), though they were executed hundreds of thousands of times. Need to include customized code. 27 / 29
Bibliography Bohner, S. A. (1996). Software Change Impact Analysis. In Proceedings of the 27th annual nasa goddard/ieee software engineering workshop (sew-27 02). Chen, W., Iqbal, A., Abdrakhmanov, A., Parlar, J., George, C., Lawford, M.,... Wassyng, A. (2013). Large-scale enterprise systems: Changes and impacts. In Enterprise information systems (Vol. 141, p. 274-290). Springer Berlin Heidelberg. Chen, W., Wassyng, A., & Maibaum, T. (2014). Impact analysis via reachability and alias analysis. In U. Frank, P. Loucopoulos,. Pastor, & I. Petrounias (Eds.), The practice of enterprise modeling (Vol. 197, p. 261-270). Springer Berlin Heidelberg. Retrieved from http://dx.doi.org/10.1007/978-3-662-45501-2 19 doi: 10.1007/978-3-662-45501-2 19 IT Key Metrics Data 2012. (2011, December). Gartner, Inc. Kiczales, G., Hilsdale, E., Hugunin, J., Kersten, M., Palm, J., & Griswold, W. G. (2001). An overview of aspectj. In Ecoop 2001object-oriented programming (pp. 327 354). Springer. 28 / 29
Thank you! 29 / 29