ΘPAD: Online Performance Anomaly Detection with Tillmann Bielefeld 1 1 empuxa GmbH, Kiel KoSSE-Symposium Application Performance Management (Kieker Days 2012) November 29, 2012 @ Wissenschaftszentrum Kiel Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 1 / 27
Agenda 1 Monitoring at XING 2 OPAD s Architecture 3 Evaluation 4 Results 5 Conclusion Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 2 / 27
Thesis Goals Author Tillmann Carlos Bielefeld Advisory Prof. Dr. Wilhelm Hasselbring Dipl.-Inform. André van Hoorn Dr. Stefan Kaes (XING AG) Case Study at XING AG 1 Design of online performance anomaly detection concept (ΘPAD) 2 ΘPAD implementation as plugin 3 ΘPAD integration with case study system 4 Evaluation @ Θnline Performance Anomaly Detection Diploma Thesis Tillmann Carlos Bielefeld Θnline Performance Anomaly Detection for Large-Scale Software Systems Diploma Thesis Christian-Albrechts-Universität zu Kiel PAD Tillmann C. Bielefeld: Online performance anomaly detection for large-scale software systems March 2012. Diploma Thesis, Kiel Univ. Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 3 / 27
Existing Logjam-based Monitoring @ XING Monitoring at XING Logjam-based monitoring already in place @ Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 4 / 27
Integration of ΘPAD in XING s Architecture Monitoring at XING Servers App Support Importer Log Database Logjam XWS (API) DB Background s logging/monitoring architecture Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 5 / 27
Integration of ΘPAD in XING s Architecture Monitoring at XING Servers App Support Importer Log Database Logjam XWS (API) DB Background ΘPAD Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 5 / 27
Example JSON Logging Message Monitoring at XING {... } "count": 5204.903527993169, "memcache_time": 6505.196318140181, "api_time": 2207.0271495891297, "db_time": 5004.8727338680155, "view_time": 3936.1623304929153, "total_time": 1586.8188192888886, "api_calls": 5546.250545491678 Input data received via AMQP and processed by ΘPAD Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 6 / 27
ΘPAD gets instantiated at startup of the Kieker server. Both main architectural components High-Level of Kieker, ΘPAD Monitoring Architecture and Analysis are used to route the measurements to the plugin. The data flow from input to output is illustrated in Figure 4.4. OPAD s Architecture Measurement Queue C «Adapter» AMQPBridge «artifact» Monitoring «artifact» Analysis «component» ΘPAD Plugin P Java Pipe Time Series Storage Alerting Queue X Figure 4.4: The coarse-grained architecture follows the linear data 1 AMQP messages transformed into Kieker monitoring records flow of the approach (see Chapter 3). The AMQPBridge adapter 2 ΘPAD: translates pipes-and-filters the monitored processing system s measurements of records to Kieker records 3 ΘPAD and therefore results passed makesto ΘPAD alerting reusable queueinand other time-series environments storage (NFR4). This graphic uses the AMQP notation of Figure 2.19. Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 7 / 27
ΘPAD Processing Steps OPAD s Architecture <<Reader>> Time Series Extraction Time Series Forecasting Anomaly Score Calculation Anomaly Detection Alerting (e.g., AMQP) Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 8 / 27
Step 1: Time Series Extraction ΘPAD Processing Steps (cont d) OPAD s Architecture <<Reader>> Time Series Extraction Time Series Forecasting Anomaly Score Calculation Anomaly Detection Alerting (e.g., AMQP) Continuous Time Discrete Time Series 5 2 1 8 1 2 11 0 10 20 30 2 3 4 X 4 2 } } }} f f f 7 8 5 6 } f Event on ES Discretization Function Time Series X Current Time select sum(value) as aggregation from MeasureEvent.win:time_batch( 1000 msec ) Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 9 / 27
Step 2: Time Series Forecasting ΘPAD Processing Steps (cont d) OPAD s Architecture <<Reader>> Time Series Extraction Time Series Forecasting Anomaly Score Calculation Anomaly Detection Alerting (e.g., AMQP) 4 3 3 4 6 5.79 1 2 3 4 5 6 W Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 10 / 27
Step 3: Anomaly Score Calculation ΘPAD Processing Steps (cont d) OPAD s Architecture <<Reader>> Time Series Extraction Time Series Forecasting Anomaly Score Calculation Anomaly Detection Alerting (e.g., AMQP) 1 0.5 0 6 1 1 0 0 2 1 2 3 4 5 6 Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 11 / 27
Step 4: Anomaly Detection ΘPAD Processing Steps (cont d) OPAD s Architecture <<Reader>> Time Series Extraction Time Series Forecasting Anomaly Score Calculation Anomaly Detection Alerting (e.g., AMQP) 1 0.5 0 1 2 3 4 5 6 Abnormal Score Normal Score Anomaly Threshold Anomaly Detected Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 12 / 27
ΘPAD Web Interface OPAD s Architecture Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 13 / 27
Evaluation Methodology: GQM Evaluation Goal Assess Practicality of Approach Questions How precise is the detection? How accurate is the detection? Metric Number of true positives Metric Number of false negatives Goal/Question/Metric (GQM) plan (excerpt) Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 14 / 27
Manual Identification of Anomalies Evaluation Methodology (cont d) Evaluation 0:00 12:00 23:00 API time Memcache time Other time DB Time View time Manual detection using the visualization tool 8 anomalies were detected Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 15 / 27
ΘPAD Results Evaluation Results aptt.fses.d1min.l15min aptt.fets.d1min.l1h aptt.fmean.d5min.l1h Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 16 / 27
ROC Curves (Introduction) Evaluation (cont d) Results 1 Random Guess True Positive Rate (TPR) 0.5 Better Worse 3 2 Evaluation Run 0 0.5 1 False Positive Rate (FPR) TPR = TP TP + FN = TP F FPR = FP FP + TN = FP NF (1) Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 17 / 27
ROC Curves (ΘPAD Results) Evaluation (cont d) Results 1 1 mean.d1min.l1h ses.d1min.l15min 0,8 0,8 0,6 0,6 2 0,4 1 0,4 0,2 0 0 0,5 1 1 mean.d20min.l2h 0,8 0,6 0,4 0,2 0 0 0,5 1 0,2 0 0 0,5 1 1 0,8 0,6 0,4 0,2 aptt.fws.d1h.l24 h 0 0 0,5 1 Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 18 / 27
Accuracy and Precision Evaluation (cont d) Results ACC 1,00 PREC 0,80 0,60 0,40 0,20 0,00 0,00 Accuracy 0,98% 0,07 0,13 0,20 0,27 Detection Threshold ACC = PREC = 1 2 0,14% TP + TN N Precision 0,33 0,40 0,47 0,53 0,60 0,67 0,73 0,80 0,87 0,93 1,00 TP POS = TP TP + FP. (2) = TP + TN TP +FP +FN +TN. (3) Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 19 / 27
Summary and Outlook for Large-Scale Software Systems Diploma Thesis Christian-Albrechts-Universität zu Kiel Author Tillmann Carlos Bielefeld Advisory Prof. Dr. Wilhelm Hasselbring Dipl.-Inform. André van Hoorn Dr. Stefan Kaes (XING AG) Case Study at XING AG Conclusion http://kieker-monitoring.net Θnline Performance Anomaly Detection Diploma Thesis Tillmann Carlos Bielefeld Θnline Performance Anomaly Detection PAD Tillmann C. Bielefeld: Online performance anomaly detection for large-scale software systems March 2012. Diploma Thesis, Kiel Univ. Outlook ΘPAD to be released as part of Kieker Follow-up theses on ΘPAD Contact Us till@empuxa.com Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 20 / 27
Demo Conclusion Win a free empuxa Hoodie! http://freebies.tielefeld.com Tillmann Bielefeld (empuxa GmbH) ΘPAD w/ Kieker Nov. 29, 2012 @ Kiel 21 / 27