Tool-Supported Application Performance Problem Detection and Diagnosis. André van Hoorn. http://www.iste.uni-stuttgart.de/rss/



Similar documents
Open-Source-Software als Katalysator im Technologietransfer am Beispiel des Monitoring-Frameworks

ExplorViz: Visual Runtime Behavior Analysis of Enterprise Application Landscapes

Performance Benchmarking of Application Monitoring Frameworks

Microservices for Scalability

Continuous Monitoring of Software Services: Design and Application of the Kieker Framework

Online Performance Anomaly Detection with

Capturing provenance information with a workflow monitoring extension for the Kieker framework

Self Adaptive Software System Monitoring for Performance Anomaly Localization

INSTITUT FÜR INFORMATIK

Utilizing PCM for Online Capacity Management of Component-Based Software Systems

Continuous Integration in Kieker

Application Performance Management (APM) Inspire Your Users With Every App Transaction. Anand Akela CA

HP Business Service Management (BSM) George Leschener BSM Solution Lead, MEMA

APM & DEVOPS CHALLENGES & ENABLERS. M. Hanin, Hannover,

Work Smarter, Not Harder: Leveraging IT Analytics to Simplify Operations and Improve the Customer Experience

A Ranger4 Guide to. Application Performance Management. Ranger

RIVERBED APPRESPONSE

Self-Adaptive Performance Monitoring for Component-Based Software Systems. Jens Ehlers

Monitoring and Log Management in Hybrid Cloud Environments

Using Dynatrace Monitoring Data for Generating Performance Models of Java EE Applications

SPEC Research Group. Sam Kounev. SPEC 2015 Annual Meeting. Austin, TX, February 5, 2015

I D C T E C H N O L O G Y S P O T L I G H T

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Optimizing your IT infrastructure IBM Corporation

Performance Monitoring of Database Operations

STEELCENTRAL APPRESPONSE

VMware Virtualization and Cloud Management Overview VMware Inc. All rights reserved

Towards a Performance Model Management Repository for Component-based Enterprise Applications

Intelligent Operations Management from Applications to Storage. VMware vrealize Operations

ESB Features Comparison

Architecture-Based Multivariate Anomaly Detection for Software Systems

IT Analytics and Big Data - Making Your Life Easier

Simplifying Lustre Operations with Chukwa and Hadoop

The Top 10 Reasons Why You Need Synthetic Monitoring

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Data Science Transforming Security Operations

CA APM 9.5 Application Performance Management for Cloud Introduction

CA Opscenter: Delivering Exceptional Customer Experience in the Application Economy

Actionable insight for IT BIG Data - HP Operations Analytics August 22, 2013

PLUMgrid Toolbox: Tools to Install, Operate and Monitor Your Virtual Network Infrastructure

BB2798 How Playtech uses predictive analytics to prevent business outages

Automatic Extraction of Probabilistic Workload Specifications for Load Testing Session-Based Application Systems

Warranty Fraud Detection & Prevention

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

Frequently Asked Questions Plus What s New for CA Application Performance Management 9.7

IDL. Get the answers you need from your data. IDL

Curriculum Vitae. Zhenchang Xing

SOA management challenges. After completing this topic, you should be able to: Explain the challenges of managing an SOA environment

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

HP Business Service Management 9.2 and

Big data platform for IoT Cloud Analytics. Chen Admati, Advanced Analytics, Intel

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

ROI Business Use Case. Cross-Enterprise Application Performance Management. Helps Reduce Costs & MTTR, Simplify Management, Improve Service Quality

Continual Verification of Non-Functional Properties in Cloud-Based Systems

Move beyond monitoring to holistic management of application performance

HP APPLICATION PERFORMANCE MONITORING

Toby Johnston SAP BI Platform Support Tool 2.0 Deep-Dive Session # 2523

Digital Business Requires Application Performance Management

1.1 Difficulty in Fault Localization in Large-Scale Computing Systems

MRV EMPOWERS THE OPTICAL EDGE.

Implement a unified approach to service quality management.

Riverbed SteelCentral. Product Family Brochure

A Vision for Operational Analytics as the Enabler for Business Focused Hybrid Cloud Operations

Application Performance Monitoring of a scalable Java web-application in a cloud infrastructure

Business Intelligence

Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends

Solution Brief TrueSight App Visibility Manager

Component-based Robotics Middleware

Support the Era of the App with End-to-End Network and Application Performance Visibility

DATAMEER WHITE PAPER. Beyond BI. Big Data Analytic Use Cases

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers

Statistics for BIG data

IT Infrastructure- Monitoring Tools

The Evolution of Load Testing. Why Gomez 360 o Web Load Testing Is a

Transcription:

Tool-Supported Application Performance Problem Detection and Diagnosis University of Stuttgart Institute of Software Technology, Reliable Software Systems Group http://www.iste.uni-stuttgart.de/rss/

Agenda 1 2 3 Introduction Performance Problems Kieker Open Source APM Framework Performance Problem Detection and Diagnosis with Kieker Temporarily not available 4 Conclusions and Future Work 2/28 2014/03/16

Performance (QoS) Problem Detection and Diagnosis An unexpected error occured Try again later Temporarily unavailable Temporarily not available Please visit us again later Service not available more capacity is on the way 3/28 2014/03/16 Service temporarily unavailable We are experiencing heavy demand

Performance (QoS) Problem Detection and Diagnosis An unexpected error occured How to (semi-)automate 1. the detection of performance problems? 2. pinpointing the root cause (i.e., diagnosis) Try again later 3. the proactive detection and diagnosis? Temporarily unavailable Temporarily not available Please visit us again later Service not available more capacity is on the way 4/28 2014/03/16 Service temporarily unavailable We are experiencing heavy demand

Application Performance Management APM dimensions according to Gartner (2014) 1. End-user experience monitoring 2. Application topology discovery and visualization 3. User-defined transaction profiling 4. Application component deep-dive 5. IT operations analytics based on, e.g., Complex operations event processing Statistical pattern discovery and recognition Unstructured text indexing, search and inference Multidimensional database search and analysis APM tools (selection) Commercial: Free/open-source: https://www.gartner.com/doc/288942 5/28 2014/03/16

Application Performance Management APM dimensions according to Gartner (2014) Example ITOA activity: 1. End-user experience monitoring https://www.gartner.com/doc/288942 Performance problem detection and diagnosis 2. Application topology Reactive discovery vs. proactive and visualization 3. User-defined transaction Manuel vs. profiling automatic (incl. recommendations) State-based vs. transaction-based 4. Application component deep-dive 5. IT operations analytics based on, e.g., Complex operations event processing Statistical pattern discovery and recognition Unstructured text indexing, search and inference Multidimensional database search and analysis APM tools (selection) Commercial: Free/open-source: 6/28 2014/03/16

Agenda 1 2 3 Introduction Performance Problems Kieker Open Source APM Framework Performance Problem Detection and Diagnosis with Kieker Temporarily not available 4 Conclusions and Future Work 7/28 2014/03/16

Open Source APM Framework Download : http://kieker-monitoring.net/ 8/28 2014/03/16

Model Extraction and Visualization (Selected Examples) Legacy System Analysis (Visual Basic 6) Distributed Monitoring (Java EE/SOAP) Legacy System Analysis (COBOL) 9/28 2014/03/16 3D Visualization of Concurrency

Visit http://kieker-monitoring.net/ Projects, publications, talks, tutorials Download User Guide Issue Tracking 10/28 2014/03/16

References: Internal/External Researchers, Industry 11/28 2014/03/16

Agenda 1 2 3 Introduction Performance Problems Kieker Open Source APM Framework Performance Problem Detection and Diagnosis (PPD&D) with Kieker Temporarily not available 4 Conclusions and Future Work 12/28 2014/03/16

PAD Time Series Analysis Introduction Mean Forecaster ARIMA Forecaster 13/28 Forecasts 2014/03/16 Tool-supported for Wikipedia application performance data problem detection (Kieker and diagnosis PAD example)

PAD: Online Performance Anomaly Detection (Bielefeld, 2012), (Frotscher, 2013) 14/28 2014/03/16

PAD (cont d) Time Series Extraction 15/28 2014/03/16

PAD (cont d) Time Series Forecasting 16/28 2014/03/16

PAD (cont d) Anomaly Score Calculation 17/28 2014/03/16

PAD (cont d) Anomaly Detection 18/28 2014/03/16

-based PPD&D Approaches (Bielefeld, 2012), (Frotscher, 2013) Based on time series analysis (various algorithms) PAD part of Kieker release Limited to problem detection No architecture consideration Case study at XING Incorporates architectural knowledge (e.g., deployment, calling dependencies) Focusing on offline analysis Cf. Rohr (2015) 19/28 2014/03/16 (Marwede et al., 2009)

(Ehlers et al., 2011, 2012) 20/28 2014/03/16 -based PPD&D Approaches Adaptive monitoring OCL-based decisions Cf. Okanovic et al. (2013) No dynamic (bytecode) instrumentation Proactive, hierarchical Inclusion of different statistical techniques (e.g., time series analysis, machine learning) Combination of multiple data sources (e.g., HDD SMART, log files) Tool-supported application performance problem detection and and diagnosis architectural knowledge (Pitakrat et al., 2013, 2014)

Agenda 1 2 3 Introduction Performance Problems Kieker Open Source APM Framework Performance Problem Detection and Diagnosis with Kieker Temporarily not available 4 Conclusions and Future Work 21/28 2014/03/16

Summary APM is an increasingly important topic PPD&D is one APM activity Mature APM tools exist Commercial tools Support monitoring for various platforms and technologies Only basic support for problem detection and diagnosis Kieker presented as an extensible open-source example Kieker-based approaches for problem detection and diagnosis 22/28 2014/03/16

Challenges for APM and PPD&D (Selection) 1. Laborious and Continuous APM Configuration APM configuration time consuming and error prone Problem intensified due to faster development cycles 2. Problem Detection and Diagnosis Alerting thresholds hard to determine and to maintain Manual diagnosis requires extensive expert knowledge APM experts are scarce goods 3. Detachedness of APM processes APM processes often system-specific Few possibilities to deposit/reuse expert knowledge 23/28 2014/03/16

Future Work: Expert-Guided Automatic Diagnosis of Performance Problems in Enterprise Applications 24/28 2014/03/16

Future Work: Expert-Guided Automatic Diagnosis of Performance Problems in Enterprise Applications 25/28 2014/03/16

Future Work: Expert-Guided Automatic Diagnosis of Performance Problems in Enterprise Applications 26/28 2014/03/16

Future Work: Expert-Guided Automatic Diagnosis of Performance Problems in Enterprise Applications 27/28 2014/03/16

The End My APM Wish List Transparency, Openness, Technology Transfer e.g., sharing of best practices (libraries, white papers) for frameworkspecific instrumentation, problem detection and diagnosis Interoperability e.g., common formats for instrumentation description, configuration, measurement data (traces) Reproducibility, Comparibility e.g., sample/benchmark applications and datasets, case studies SPEC RG DevOps Performance Working Group http://research.spec.org/devopswg/ http://kieker-monitoring.net 28/28 2014/03/16

References (Döhring, 2012) P. Döhring. Visualisierung von Synchronisationspunkten in Kombination mit der Statik und Dynamik eines Softwaresystems. Master s thesis, Kiel University, Oct. 2012. (Ehlers, 2012) J. Ehlers. Self-Adaptive Performance Monitoring for Component-Based Software Systems. PhD thesis, Department of Computer Science, Kiel University, Germany, 2012. (Ehlers et al., 2011) J. Ehlers, A. van Hoorn, J.Waller, and W. Hasselbring. Selfadaptive software system monitoring for performance anomaly localization. In Proceedings of the 8th ACM International Conference on Autonomic computing (ICAC 11). ACM, 2011. (Fittkau et al., 2014) F. Fittkau, A. van Hoorn, and W. Hasselbring. Towards a dependability control center for large software landscapes. In Proceedings of the 10th European Dependable Computing Conference (EDCC 14), IEEE, 2014. (Frotscher, 2013) T. Frotscher. Architecture-based multivariate anomaly detection for software systems, Master's Thesis, Kiel University, 2013. (Gartner, 2014) J. Kowall and W. Cappelli. Gartner's Magic Quadrant for Application Performance Monitoring 2014 (Marwede et al., 2009) N. S. Marwede, M. Rohr, A. van Hoorn, and W. Hasselbring. Automatic failure diagnosis support in distributed large-scale software systems based on timing behavior anomaly correlation. In Proc. CSMR 09. IEEE, 2009. 29/28 2014/03/16

References (cont d) (Okanovic et al., 2013) D. Okanovic, A. van Hoorn, Z. Konjovic, and M. Vidakovic. SLAdriven adaptive monitoring of distributed applications for performance problem localization. Computer Science and Information Systems (ComSIS), 10(10), 2013. (Pitakrat, 2013) T. Pitakrat. Hora: Online failure prediction framework for componentbased software systems based on kieker and palladio. In Proc. SOSP 2013. CEUR- WS.org, Nov. 2013. (Pitakrat et al., 2014) T. Pitakrat, A. van Hoorn, and L. Grunske. Increasing dependability of component-based software systems by online failure prediction. In Proc. EDCC 14. IEEE, 2014. (Richter, 2012) B. Richter. Dynamische Analyse von COBOL-Systemarchitekturen zum modellbasierten Testen ("Dynamic analysis of cobol system architectures for modelbased testing", in German). Diploma Thesis, Kiel University. 2012. (Rohr, 2015) M. Rohr. Workload-sensitive Timing Behavior Analysis for Fault Localization in Software Systems. PhD thesis, Department of Computer Science, Kiel University, Germany, 2015. (Rohr et al., 2010) M. Rohr, A. van Hoorn, W. Hasselbring, M. Lübcke, and S. Alekseev. Workload-intensity-sensitive timing behavior analysis for distributed multi-user software systems. In Proc. WOSP/SIPEW 10. ACM, 2010. 30/28 2014/03/16

References (cont d) (van Hoorn, 2014) A. van Hoorn. Model-Driven Online Capacity Management for Component-Based Software Systems. PhD thesis, Department of Computer Science, Kiel University, Germany, 2014 (van Hoorn et al., 2009) A. van Hoorn, M. Rohr, W. Hasselbring, J. Waller, J. Ehlers, S. Frey, and D. Kieselhorst. Continuous monitoring of software services: Design and application of the Kieker framework. TR-0921, Department of Computer Science, University of Kiel, Germany, 2009. (van Hoorn et al., 2012) A. van Hoorn, J.Waller, and W. Hasselbring. Kieker: A framework for application performance monitoring and dynamic software analysis. In Proc. ACM/SPEC ICPE 12. ACM, 2012. (Wulf, 2012) C. Wulf. Runtime visualization of static and dynamic architectural views of a software system to identify performance problems. B.Sc. Thesis, University of Kiel, Germany, 2010. 31/28 2014/03/16

BONUS/BACK-UP 32/28 2014/03/16

Source: http://techblog.netflix.com/2013/03/introducing-first-netflixoss-recipe-rss.html Netflix OSS Recipes Application (Motivating Example) Download: https://github.com/netflix/recipes-rss 33/28 2014/03/16

Source: https://github.com/netflix/recipes-rss/wiki/architecture Netflix OSS Recipes Application (cont d) 34/28 2014/03/16

Source: https://github.com/netflix/recipes-rss/wiki/architecture Problem Symptom 1: Increased Response Times 35/28 2014/03/16

Source: https://github.com/netflix/recipes-rss/wiki/architecture Problem Symptom 2: Service Unavailability X 36/28 2014/03/16

Problem Detection and Diagnosis Approaches Features Reactive vs. proactive Manual vs. automatic State-based vs. transaction-based etc. Statistical techniques Time series analysis Anomaly detection (incl. change detection) Machine learning etc. 37/28 2014/03/16

Example Kieker Plots for Netflix OSS Recipes Application Edge Middletier Backend call (Jersey) DelRSSCommand AddRSSCommand Outgoing/incoming REST calls (Jersey) 38/28 2014/03/16 GetRSSCommand