Lokalisieren von Defekten in Software mit Hilfe von Data Mining



Similar documents
Graph Mining and Social Network Analysis

Real-time Targeted Influence Maximization for Online Advertisements

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

Hands-On Microsoft Windows Server 2008

Handout for three day Learning Curve Workshop

COC131 Data Mining - Clustering

Data Mining Classification: Decision Trees

Tech Tip: Understanding Server Memory Counters

Didacticiel Études de cas. Association Rules mining with Tanagra, R (arules package), Orange, RapidMiner, Knime and Weka.

Web Document Clustering

Cisco TelePresence Management Suite Extension for Microsoft Exchange Version 4.0.1

Definiens XD Release Notes

Schedule Risk Analysis Simulator using Beta Distribution

QoS Provisioning in Mobile Internet Environment

Cisco TelePresence Management Suite Extension for Microsoft Exchange Version 4.0

Analysis report examination with CUBE

Data Mining III: Numeric Estimation

HUAWEI OceanStor Load Balancing Technical White Paper. Issue 01. Date HUAWEI TECHNOLOGIES CO., LTD.

CFD analysis for road vehicles - case study

Binary Bug - Automatic Binary Trading

Commissioning IDENTControl Compact IC-KP2-2HB17-2V1D Profinet Protocol with a Siemens S7 PLC

Back Propagation Neural Networks User Manual

Optimizing Generation of Object Graphs in Java PathFinder

(COMPANY LOGO) CGMP COMPUTERIZED SYSTEM VENDOR AUDIT QUESTIONNAIRE

Shutting down / Rebooting Small Business Server 2003 Version 1.00

Statistical Data analysis With Excel For HSMG.632 students

SQL Server Replication Guide

Motivation: Smartphone Market

Lustre tools for ldiskfs investigation and lightweight I/O statistics

This presentation explains how to monitor memory consumption of DataStage processes during run time.

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Microsoft SQL Server OLTP Best Practice

T Non-discriminatory Machine Learning

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Best practices for efficient HPC performance with large models

ABAP SQL Monitor Implementation Guide and Best Practices

All-in-One Asset Management Tool

Customer Relationship Management by Semi-Supervised Learning

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Load balancing in SOAJA (Service Oriented Java Adaptive Applications)

Serendipity: Enabling Remote Computing among Intermittently Connected Mobile Devices

WebSphere Commerce V7 Feature Pack 3

Technical Report. The KNIME Text Processing Feature:

WINDOWS PROCESSES AND SERVICES

McAfee Network Security Platform

Introduction to DISC and Hadoop

Table of Contents. Chapter No. 1 Introduction 1. iii. xiv. xviii. xix. Page No.

Power efficiency and power management in HP ProLiant servers

EMC ApplicationXtender Server

Open source framework for data-flow visual analytic tools for large databases

Fast Edge Splitting and Edmonds Arborescence Construction for Unweighted Graphs

UniGR Workshop: Big Data «The challenge of visualizing big data»

Additional MBA Information. Business School Faculty of Management Sciences

Discretization and grouping: preprocessing steps for Data Mining

Diagram of Security. - define the attributes of Diagram of security that make it possible to evaluate security properties of modeled elements,

Social Media Mining. Data Mining Essentials

DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS

To get started, you will need the following items Product Key Router with firewall capability Network cables

GPI Global Address Space Programming Interface

FlowMergeCluster Documentation

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

Fuzzy Cognitive Map for Software Testing Using Artificial Intelligence Techniques

Managing Rack-Mount Servers

Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect

WHAT IS A BETTER PREDICTOR OF ACADEMIC SUCCESS IN AN MBA PROGRAM: WORK EXPERIENCE OR THE GMAT?

White Paper FPGA Performance Benchmarking Methodology

L1: Introduction to Hadoop

Bringing Big Data Modelling into the Hands of Domain Experts

The Edge Editions of SAP InfiniteInsight Overview

Information Technology Services will be updating the mark sense test scoring hardware and software on Monday, May 18, We will continue to score

WEKA Explorer User Guide for Version 3-4-3

NAND Flash FAQ. Eureka Technology. apn5_87. NAND Flash FAQ

KEYWORD SEARCH IN RELATIONAL DATABASES

Parallels Containers for Windows 6.0

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Automating with STEP7 in LAD and FBD

In this tutorial, we try to build a roc curve from a logistic regression.

Beam Loss Monitor Software Guide

The Math in Laser Light Math

PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor

Introduction to Apache Pig Indexing and Search

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery

Chapter 8 Software Testing

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

Transcription:

Lokalisieren von Defekten in Software mit Hilfe von Data Mining Klemens Böhm NAME INSTITUTE OF INSTITUTE, FOR FACULTY, PROGRAMME DEPARTMENT STRUCTURES (change AND DATA masterorganisation view) (IPD) Please insert a figure in the master transparency. KIT University of the State of Baden-Wuerttemerg and Photo: Stadt Karlsruhe National Research Center of the Helmholtz Association KIT University of the State of Baden-Wuerttemerg and National Laoratory of the Helmholtz Association www.kit.edu

A prolem has een detected and windows has een shut down to rpevent damage to your computer. Please click on to add your Title DRIVER_IRQL_NOT_LESS_OR_EQUAL Please click on to add your Title If this is the first time you`ve seen this stop error screen, restart your computer, If this screen appears again, follow these steps: Check to make sure any new hardware or software is properly installed. If this is a new installation, ask your hardware or software manufacturer for any windows updates you might need. If prolems continue, disale or remove any newly installed hardware or software. Disale BIOS memory options such as caching or shadowing. If NAME you OF need INSTITUTE, to use FACULTY, safe DEPARTMENT Mode to (change remove master or view) disale components, restart your computer, press F8 to select Advanced Startup Options, and then Select Safe Mode. Technical information: Please insert a figure in the master transparency. *** STOP: 0x000000D1 (0x0000000C, 0x00000002, 0x00000000, 0xF86B5A89) *** gv3.sys Adress F86B5A89 ase at F86B5000, DateStam 3dd991e Beginning dump of physical memory Physical memory dump complete. Contact your system administrator or technical support group for further assistance KIT 7.2.2011 University of the State Prof. Baden-Wuerttemerg Max Mustermannand - Title National Research Center of the Helmholtz Association www.kit.edu

Locating Bugs in Software Software is almost never shipped ug-free, even if tested extensively. Deugging is time consuming and expensive. Automated localisation would e of great help. Idea: Locate ugs with data mining techniques. Use an approach ased on weighted graph mining. 3Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 1/25

Outline 1. Motivation 2. Call Graphs 3. Defect Localization 4. Dataflow-Enaled Defect Localization 5. Evaluation 6. Conclusions 4Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 2/25

Call Graphs Program executions as call graphs: methods nodes method calls edges a c 5Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 3/25

Call Graphs Program executions as call graphs: methods nodes method calls edges a Bugs in the call graph: Structure affecting (existing approaches) E.g., a ug in an if-condition in a c 6Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 4/25

Call Graphs Program executions as call graphs: methods nodes method calls edges a Bugs in the call graph: Structure affecting (existing approaches) E.g., a ug in an if-condition in a c Call frequency affecting (new in our contriution) E.g., a ug in a loop-condition in c 7Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 5/25

Reduction of Call Graphs printout showdelete showdelete showsame showinsert showdelete showsame showinsert showinsert showinsert showdelete showdelete showdelete showsame showsymol showsymol printin showsymol printin showsymol printin showsymol showsymol showsymol printin showsymol showsymol showsymol Millions of method calls are very common. Reduction of call graphs is necessary. a a c 1 1 c 3 Example of reduced call graph. Weights have not een used in previous reductions. 8Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 6/25

Outline 1. Motivation 2. Call Graphs 3. Defect Localization 4. Dataflow-Enaled Defect Localization 5. Evaluation 6. Conclusions 9Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 2/25

How to Use Call Graphs? Existing idea: Look at program executions (correct ones as well as failing ones) represented as call graphs and analyse the graph structure (i.e., graphs without weights). Identify patterns typical of failing executions. Frequent sugraph mining. In a nutshell, this gives way to finding structure-affecting ugs. New aspect: Explicitly analyse call frequencies (edge weights) esides graph structures. 10 Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 7/25

Weighted Graph Mining No algorithm availale for weighted graph mining (as well as no prolem formulation). How to make use of edge weights? Preprocessing Discretisation of weights (We are currently investigating this.) Postprocessing (our approach already studied and presented susequently) Graph mining without weights Susequent detailed analysis of weights 11 Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 8/25

Finding Discriminating Edge Weights (1) Apply frequent sugraph mining to reduced call graphs of correct and failing program executions (ignore the weights in this step) (2) Consider only sugraphs occurring in oth correct and failing executions (3) Analyse the edge weights Example graph found y frequent sugraph mining: a c Average weight of a c in correct executions: 1 Average weight of a c in failing executions: 1 Average weight of c in correct executions: 3 Average weight of c in failing executions: 30 12 Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 9/25

Finding Discriminating Edge Weights (1) Apply frequent sugraph mining to reduced call graphs of correct and failing program executions (ignore the weights in this step) (2) Consider only sugraphs occurring in oth correct and failing executions (3) Analyse the edge weights Example graph found y frequent sugraph mining: a c Average weight of a c in correct executions: 1 Average weight of a c in failing executions: 1 Average weight of c in correct executions: 3 Average weight of c in failing executions: 30 13 Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 10/25

Entropy Based Scoring Assemle a tale which contains every edge in every frequent sugraph: a c a SG 1 SG 2 SG 1 SG 1 SG 2 a c c a Class Execution 1 1 30 6... failing Execution 2 1 3 7 correct Application of the Information Gain feature-selection algorithm, ased on entropy. Result: Ranking of the columns (edges) 14 Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 11/25

Integration of Structural Evidence reduced call graphs in failing and correct frequent sugraph mining (not considering weights) in failing only entropy ased scoring (ased on edge weights) structural scoring (ased on support) comination comined method ranking 15 Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 12/25

Integration of Structural Evidence reduced call graphs in failing and correct frequent sugraph mining (not considering weights) in failing only entropy ased scoring (ased on edge weights) structural scoring (ased on support) comination comined method ranking 16 Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 13/25

Integration of Structural Evidence in failing and correct entropy ased scoring (ased on edge weights) reduced call graphs frequent sugraph mining (not considering weights) in failing only structural scoring (ased on support) Usage of two kinds of evidence: frequency (left) and structure (right) Both types of ugs canelocated: call frequency and structure affecting ones comination comined method ranking 17 Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 14/25

Results Example output: METHOD SCORE 1 inputscan() 0.9833 2 showinsert() 0.9204 3 showdelete() 0.4876 4 oldconsume() 0.4876 5 addsymol() 0.2428 18 Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 15/25

Results Example output: METHOD SCORE 1 inputscan() 0.9833 2 showinsert() 0.9204 3 showdelete() 0.4876 4 oldconsume() 0.4876 5 addsymol() 0.2428 The ug has een instrumented in showinsert(). A software developer has to check two methods only. We do not try to explain/fix ugs, just to locate them! 19 Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 16/25

Outline 1. Motivation 2. Call Graphs 3. Defect Localization 4. Dataflow-Enaled Defect Localization 5. Evaluation 6. Conclusions 20 Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 2/25

Challenge: Dataflow-Affecting Bugs Not all defects affect the call graph. Example (failing execution): voidmain() 4 52 2 void a () void () 1 int c(12, 33) 1. Method calls c(12, 33). 2. Method c calculates a wrong return value 8. 3. Method prints value 8. Graph structure and weights remain unaffected. Call-graph mining cannot find the defect. Approach: Extend call graphs with dataflow information and analyse these extended graphs. 21 Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 17/25

Dataflow Information Availale Example dataflows (parameter and return values) of int c(int p1, int p2) from : In real executions, one method can e called million times. To deal with this data, we need to aggregate it. 22 Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 18/25

Our Approach: Discretisation with CAIM CAIM (class-attriute interdependence maximization) class-aware (correct, failing) parameter-free, i.e., it automatically determines a possily small numer of intervals, homogeneous wrt. the class (a) Exemplar call data. Exec. p1 p2 r class 1 2 43 12 correct 1 1 44 11 correct 1 3 4 9 correct 2 12 33 8 failing 3 23 27 6 failing 3 15 28 5 failing 3 16 23 7 failing 4 6 2 10 correct 4 11 47 13 correct () Intervals generated. Value Intervals p1 i 1 : [1; 11.5] i 2 : (11.5; 23] p2 i 1 : [2; 13.5] i 2 : (13.5; 38] i 3 : (38; 47] r i 1 : [5; 8.5] i 2 : (8.5; 13] (c) Discretised data. Exec. p1 p2 r 1 i 1 i 3 i 2 1 i 1 i 3 i 2 1 i 1 i 1 i 2 2 i 2 i 2 i 1 3 i 2 i 2 i 1 3 i 2 i 2 i 1 3 i 2 i 2 i 1 4 i 1 i 1 i 2 4 i 1 i 3 i 2 23 Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 19/25

Dataflow-Enaled Call Graphs (DEC Graphs) How to represent discretised dataflow data in call graphs? Attach tuples of weights (i.e., counters) to edges: (t; p i 1 1 ; ; p i n1 1 ; ; p i 1 m ; ; p i nm m ; r i 1 ;...; r i nr ) - t: total numer of calls (as efore) - p 1 ; ; p m ; r: parameter values and return value - i 1 ; ; i n x : intervals of the parameter/return values 52 voidmain() void a () 4 2 void () 1,0,1,0,1,0,1, 0 int c(int p1, int p2) Exec. p1 p2 r 1 i 1 i 3 i 2 1 i 1 i 3 i 2 1 i 1 i 1 i 2 2 i 2 i 2 i 1 3 i 2 i 2 i 1 3 i 2 i 2 i 1 3 i 2 i 2 i 1 4 i 1 i 1 i 2 4 i 1 i 3 i 2 Example call c: (t; p i 1 1 ; p i 2 1 ; p i 1 2 ; p i 2 2 ; p i 3 2 ; r i 1 ; r i 2 ) 24 Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 20/25

Defect Localisation with DEC Graphs 1. Assemle classified DEC graphs for all executions. 2. Perform frequent sugraph mining (CloseGraph). We use sugraphs as contexts to detect defects that occur in certain situations only. 3. Assemle a feature tale with all tuple elements from all edges in all frequent sugraphs separately: 4. Derive a ranking of the columns (and thus the methods) with entropy-ased feature selection (GainRatio). Various optimisations. 25 Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 21/25

Outline 1. Motivation 2. Call Graphs 3. Defect Localization 4. Dataflow-Enaled Defect Localization 5. Evaluation 6. Conclusions 26 Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 2/25

Evaluation of Defect Localisation with DEC Graphs A data-centric application: Weka (class DecisionStump) 16 instrumented defects, similar to related evaluations. Most dataflow-affecting ugs indirectly affect the call graph. 90 executions of every version with UCI data The result tale contains the ranking position, i.e., numer of methods one has to inspect to find a defect: With improvements only 1.5 methods on average (out of 30 actually executed methods). 27 Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 23/25

Conclusions A domain-specific data-mining prolem: software-defect localisation Contriutions: A domain and prolem-specific data representation: DEC graphs (discretisation) An analysis process for DEC graphs (sugraph mining, feature selection) Localization of dataflow-affecting ugs esides other ug classes Future improvements: Dataflows through gloal variales Scalale analysis of large software projects Parallel programs 28 Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 24/25

Acknowledgements Thanks to Frank Eichinger for letting me use his slides. 29 Motivation 7.2.2011 Klemens Böhm Lokalisieren von Defekten in Software mit Hilfe von Data Mining 25/25