Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control



Similar documents
Standardization of Components, Products and Processes with Data Mining

FREQUENT PATTERN MINING FOR EFFICIENT LIBRARY MANAGEMENT

Healthcare Measurement Analysis Using Data mining Techniques

The Scientific Data Mining Process

not possible or was possible at a high cost for collecting the data.

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

Data Mining Solutions for the Business Environment

Statistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Machine Learning and Data Mining. Fundamentals, robotics, recognition

What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Data Mining On Diabetics

Statistics for BIG data

Data Mining and KDD: A Shifting Mosaic. Joseph M. Firestone, Ph.D. White Paper No. Two. March 12, 1997

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, HIKARI Ltd,

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

Cost Drivers of a Parametric Cost Estimation Model for Data Mining Projects (DMCOMO)

Cleaned Data. Recommendations

Ethical Issues in Data Mining

Data Mining Analytics for Business Intelligence and Decision Support

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Data Mining System, Functionalities and Applications: A Radical Review

Three Perspectives of Data Mining

SPATIAL DATA CLASSIFICATION AND DATA MINING

Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

Knowledge Discovery from patents using KMX Text Analytics

An Introduction to Data Mining

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Using Semantic Data Mining for Classification Improvement and Knowledge Extraction

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Dynamic Data in terms of Data Mining Streams

Introduction. A. Bellaachia Page: 1

Traffic Prediction and Analysis using a Big Data and Visualisation Approach

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION ANALYSIS.

An Overview of Knowledge Discovery Database and Data mining Techniques

Financial Trading System using Combination of Textual and Numerical Data

Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data

Towards applying Data Mining Techniques for Talent Mangement

Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining -

Using big data in automotive engineering?

Data, Measurements, Features

Manufacturing Analytics: Uncovering Secrets on Your Factory Floor

Maschinelles Lernen mit MATLAB

Rule based Classification of BSE Stock Data with Data Mining

A Review of Data Mining Techniques

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

Use of Data Mining in the field of Library and Information Science : An Overview

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI

A Spatial Decision Support System for Property Valuation

How To Use Data Mining For Loyalty Based Management

Introduction. Background

How can we discover stocks that will

Data Isn't Everything

How To Solve The Kd Cup 2010 Challenge

What is Data Mining, and How is it Useful for Power Plant Optimization? (and How is it Different from DOE, CFD, Statistical Modeling)

Random forest algorithm in big data environment

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI

Threat intelligence visibility the way forward. Mike Adler, Senior Product Manager Assure Threat Intelligence

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Interactive Exploration of Decision Tree Results

Social Media Mining. Data Mining Essentials

Miracle Integrating Knowledge Management and Business Intelligence

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

Database Marketing, Business Intelligence and Knowledge Discovery

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Data Mining Applications in Fund Raising

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

CS Introduction to Data Mining Instructor: Abdullah Mueen

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

Text Mining: The state of the art and the challenges

Gerard Mc Nulty Systems Optimisation Ltd BA.,B.A.I.,C.Eng.,F.I.E.I

INVESTIGATIONS INTO EFFECTIVENESS OF GAUSSIAN AND NEAREST MEAN CLASSIFIERS FOR SPAM DETECTION

Chapter ML:XI. XI. Cluster Analysis

Why do statisticians "hate" us?

Data Mining Applications in Higher Education

Performance Study on Data Discretization Techniques Using Nutrition Dataset

Transcription:

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234; e-mail: a.bergmann@du.szmf.de Abstract Manufacturing enterprises have been collecting and storing more and more current, detailed and accurate production relevant data. The data stores offer enormous potential as source of new knowledge, but the huge amount of data and its complexity far exceeds the ability to reduce and analyze data without the use of automated analysis techniques. This paper provides a brief introduction into knowledge discovery from databases and presents the methology for data mining in time series. The relevancy of data mining for manufacturing shall be depicted. Keywords: data mining, time series, manufacturing, knowledge discovery 1. Introduction Virtually all manufacturing enterprises use powerful data acquisition systems to collect, analyze and transfer data from nearly all the processes of the organisation. This data may be related to machines, products, processes, maintenance, quality control, failure detection, etc. and is typically stored in databases at various stages. The use of databases and statistical techniques are well established in engineering, because intelligently analyzed data is a valuable resource, since it gains new insights and can provide significant competitive advantage. Traditionally statistical techniques were used in order to find patterns in manufacturing data. But machines with growing complexity and sensor equipment as well as recent proceedings in information technology, data acquisition systems and storage technology lead to an overwhelming amount of data, which is permanently increasing and contains hundreds of attributes. In order to model the system s behaviour accurately, these attributes have to be considered simultaneously. This complexity calls for new techniques and tools for the automated extraction of useful knowledge from huge amounts of raw data. Data Mining is about solving these problems by applying mathematical models to automatically discover patterns in data already present in databases. Data Mining is thereby one step in Knowledge Discovery in Databases (KDD), which denotes the entire process of turning low-level data into useful knowledge [1,2]. This process often includes the following important stages: The first step involves the understanding of the application domain, which is of great importance especially when analyzing manufacturing data. For successful generation of new knowledge a close collaboration of domain experts, data experts and data mining experts is needed. The goals and tasks of the data mining process have to be determined and all the factors, which might affect the manufacturing process under study, should be revealed and understood. The second step includes selecting the target data set. Since data mining can only uncover patterns already present in the data, the target dataset must be large enough to contain these patterns while remaining concise enough to be mined in an acceptable timeframe. The data sets are often stored in various databases which requires additional integration. Next, the data sets have to be preprocessed, comprising inter alia transformation, handling missing data and removing noise. The fourth step finally is data mining for the extraction of patterns from the data. This involves the selection and application of appropriate

(mathematical) data mining algorithms as well as the development of a model, which describes the pattern. This step is often accompanied by traditional statistical analysis and data visualisation. In a last step the extracted patterns have to be interpreted and verified. Here the evaluation from the domain experts is essential to really transfer the patterns into new knowledge. Usually some of the KDD steps need to be iterated several times to finally reach this goal. Once useful patterns are found and described, they allow to make (nontrivial) predictions on new data. Depending on the data and the intended outcome of the overall data mining process two main goals can be distinguished: Regression is concerned of computing an expression that predicts a numeric quantity, in classification the outcome to be predicted is a discrete class. A wide range of data mining techniques is available to serve these goals, each with its own advantages and disadvantages. Some of the most active applications of data mining have been in the area of marketing and sales for example customer relationship management or market basket analysis. In recent years it is widely used in bioinformatics, genetics and medicine. The use of data mining techniques in manufacturing began in the 1990s [3-5] and is currently a field of growing interest. In the following some typical data mining applications for manufacturing shall be presented. We place special emphasis on Data Mining in time series, since this is the common raw data output of a lot of sensors used in production settings to monitor the process, e.g. audio signals taken from a machine or measurements like temperature, pressure or tension. 2. Predicting Future Values Figure 1: Regression of time series.

Forecasting is of key importance in process and manufacturing engineering, since the prediction of component malfunctions based on the data collected from sensors attached at machines can save money due to prevented consequential damages. For the prediction of values powerful data mining regression techniques are available. Figure 1 shows a time series measured by a sensor together with a prediction based on the so-called support vector machines method, a famous and mighty data mining technique. Although the prediction is well reproducing the middle of the data, there are deviations at the end of the time series. This leads to the conclusion that regression alone is not sufficient to look into the future, since the model will only work well on parts of the data it has already seen. The reason is a wrong scenario: In reality, i.e. the application in the plant, only the first part of the measurement would be seen and the future should be predicted! In order to develop a more realistic scenario, a technique called windowing can be used, whose basic idea is explained in Figure 2. Figure 2: Windowing. The time series is divided into parts (windows) and the values inside the window are used to predict a value outside the window after a given time horizon. This procedure creates the first example of a new data set. By moving the window stepwise and holding the horizon constant, the whole data set to predict the future is created. The new data set with the historical data taken from the windows and the future value (compared to the window) can now be used for regression. The prediction will be a prediction for the future after the horizon has passed. The prediction performance usually decreases with larger prediction horizon.

Figure 3: Forecasting of time series. Figure 3 shows an example for application and explains the advantages of forecasting related to condition monitoring: Sensors, measuring the force on a machine detect an increase. A condition monitoring system cannot predict the future making it impossible to judge, if the increase is going to be critical and when. This drawback can be overcome by creating a model for the forecasting of the time series using the data mining techniques explained above. It is clearly visible that the force will stabilize and nothing has to be done. Figure 4: Principle of Data Mining system in the plant. An implementation in the plant could be realized like sketched in Figure 4. Sensors attached to the machines monitor the process. These sensors may measure all kind of technical time series, e.g. vibration, temperature, force, tension etc. An interface controls the data acquisition and stores the time series in a data base (DB). This is the

input data source for the data mining software, which computes the predictions and visualizes them for the user. 3. Identifying Machine Failure Beside predicting a numeric value of a time series, there is often the need to classify complete time series in order to perform a forecasting, e.g. to predict a machine failure. Imagine a saw cutting pieces. During each sawing cycle the vibrations of the saw are measured by a sensor and depending on these vibrations the question is: Will the saw break? One could simply use the first half of the data (before the crash) and create a model on that. This will, however, not result in a good prediction performance, since the crucial aspect lies in the structure of the series, not in the concrete values. Another big problem when mining in time series is the high dimensionality. Thus feature extraction should be applied to compress the time series, keeping the important information while removing noise and correlations. Not only the algorithm will speed up, it is often the only way to get good prediction results on complex data streams. For feature extraction there is a large amount of methods known from time series handling, including the discrete Fourier Transform (DFT), the Discrete Wavelet Transform (DWT), functionals like average, minimum, maximum, windowing and their combinations. Figure 5: Crash prediction using feature extraction. In Figure 5 an example for the case of predicting the crash of a saw is depicted. The time series are measured by a vibrational sensor. The blue-colored one is measured during a normal process, the red one right before a crash. To reduce the dimensionality and to discard noise, some features were extracted using e.g. DFT. In the feature space (right hand side) the two cases crash (red)/no crash (blue) can be classified easily. 4. Summary & Outlook Knowledge discovery in Databases (KDD) has a great potential in manufacturing, since it turns the raw data into useful knowledge. It can give deeper insights into the processes, allows for the prediction of machine failure and for preventive maintenance. Thus KDD can support quality and reduce costs due to damages or loss of production. There is a growing interest and a recent trend to use data mining [6] for manufacturing. On the other hand side it should be pointed out that data mining in manufacturing is a demanding task because of a lot of factors like complexity of the processes, (missing)

quality of the data and the need to develop precise models with a very high prediction performance. References 1. U.M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, From Data Mining To Knowledge Discovery: An Overview. In: Advances In Knowledge Discovery And Data Mining, eds. U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, AAAI Press/The MIT Press, Menlo Park, CA., pp. 1-34, 1996. 2. W.J. Frawley, G. Piatetsky-Shapiro, and C. Matheus, Knowledge Discovery In Databases: An Overview. In: Knowledge Discovery In Databases, eds. G. Piatetsky-Shapiro, and W. J. Frawley, AAAI Press/MIT Press, Cambridge, MA., pp. 1-30, 1991. 3. M.H. Lee, Knowledge Based Factory, Artif. Intell. Eng., 8, pp109-125,1993. 4. K.B. Irani, J. Cheng, U.M. Fayyad, and Z. Quian, Applying Machine Learning to Semiconductor Manufacturing, IEEE Expert, 8(1), pp. 41-47, 1993. 5. G. Piatetsky-Shapiro, The Data Mining Industry Coming of Age, IEEE Intell. Syst., 14(6), pp. 32-34, 1999. 6. J.A. Harding, M. Shahbaz, Srnivas, and A. Kusiak, Data Mining in Manufacturing: A Review, J. Man. Sci. Eng., 128, pp. 969-976, 2006.