Integration of Process Simulation and Data Mining Techniques for the Analysis and Optimization of Process Systems. Balazs Balasko



Similar documents
Theses of the doctoral (PhD) dissertation. Pannon University PhD School of Chemical and Material Engineering Science. Supervisor: dr.

SIMULATION AND CONTROL OF BATCH REACTORS

Data Mining Techniques for Process Development

A MATLAB Toolbox and its Web based Variant for Fuzzy Cluster Analysis

COMPUTER AIDED NUMERICAL ANALYSIS OF THE CONTINUOUS GRINDING PROCESSES

Monitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations

Machine Learning with MATLAB David Willingham Application Engineer

An Overview of Knowledge Discovery Database and Data mining Techniques

Comparison of K-means and Backpropagation Data Mining Algorithms

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

COMPUTER-AIDED PROCESS MODELLING

PhD Theses STUDY OF THE SOLVENT GRADIENT SIMULATED MOVING BED PREPARATIVE LIQUID CHROMATOGRAPHIC PROCESS. Written by Melinda Nagy

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY

The Masters of Science in Information Systems & Technology

Doctor of Philosophy in Computer Science

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

not possible or was possible at a high cost for collecting the data.

How To Use Neural Networks In Data Mining

INVESTIGATION OF COLOUR MEMORY

Quality Management Tools Of Chemical And Bio Industrial Data Systems And Procedures. Gergely Viczián

FRANCESCO BELLOCCHIO S CURRICULUM VITAE ET STUDIORUM

Data Mining Solutions for the Business Environment

Application of Data Mining Methods in Health Care Databases

OPC COMMUNICATION IN REAL TIME

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

About the NeuroFuzzy Module of the FuzzyTECH5.5 Software

Data Mining and Neural Networks in Stata

GYAN VIHAR SCHOOL OF ENGINEERING & TECHNOLOGY M. TECH. CSE (2 YEARS PROGRAM)

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Healthcare Measurement Analysis Using Data mining Techniques

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Dynamic Data in terms of Data Mining Streams

Course Syllabus For Operations Management. Management Information Systems

Fluency With Information Technology CSE100/IMT100

How To Get A Computer Engineering Degree

Visualization methods for patent data

Visualization of large data sets using MDS combined with LVQ.

Artificial Intelligence and Politecnico di Milano. Presented by Matteo Matteucci

Business Intelligence and Decision Support Systems

Sanjeev Kumar. contribute

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Quality Assessment in Spatial Clustering of Data Mining

Industry and education in electrical engineering

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

DEPARTMENT OF PETROLEUM ENGINEERING Graduate Program (Version 2002)

Big Data Analytics. Tools and Techniques

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Computer Information Systems

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Standardization of Components, Products and Processes with Data Mining

Big Data: Rethinking Text Visualization

Patent Big Data Analysis by R Data Language for Technology Management

Meta-learning. Synonyms. Definition. Characteristics

Final Year Projects at itm. Topics 2010/2011

Modeling and Design of Intelligent Agent System

A Contribution to Expert Decision-based Virtual Product Development

CURRICULUM VITAE PETROS KARVELIS

E-Learning Using Data Mining. Shimaa Abd Elkader Abd Elaal

A Spatial Decision Support System for Property Valuation

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

Introduction to MATLAB Gergely Somlay Application Engineer

Graduate School of Informatics

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

COURSE CATALOGUE

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

Preprocessing, Management, and Analysis of Mass Spectrometry Proteomics Data

An Automatic Optical Inspection System for the Diagnosis of Printed Circuits Based on Neural Networks

Credit Card Fraud Detection Using Self Organised Map

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

AN EXCHANGE LANGUAGE FOR PROCESS MODELLING AND MODEL MANAGEMENT

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

Using Data Mining for Mobile Communication Clustering and Characterization

Summary: Natalia Futekova * Vladimir Monov **

ARTIFICIAL INTELLIGENCE METHODS IN EARLY MANUFACTURING TIME ESTIMATION

Masters in Information Technology

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

Abdullah Mohammed Abdullah Khamis

Classification of Engineering Consultancy Firms Using Self-Organizing Maps: A Scientific Approach

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

Using Expert System in the Military Technology Research and Development

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

A Case Retrieval Method for Knowledge-Based Software Process Tailoring Using Structural Similarity

MEng, BSc Applied Computer Science

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Knowledge Discovery from Data Bases Proposal for a MAP-I UC

UniGR Workshop: Big Data «The challenge of visualizing big data»

Master of Business Systems

Transcription:

Theses of the doctoral (PhD) dissertation Integration of Process Simulation and Data Mining Techniques for the Analysis and Optimization of Process Systems Balazs Balasko University of Pannonia PhD School of Chemical and Material Engineering Science Supervisor Janos Abonyi, PhD Sandor Nemeth, PhD Department of Process Engineering Veszprem 2009.

1 Introduction and aim of the work Customers satisfaction and the economical challenge of modern technologies claim for a continuous optimization in every field of life. In chemical industry, products with tailored quality values have to be produced while specific costs have to be on a minimal level. Due to its high automation level, chemical industry can provide large amounts of data for these optimization purposes. Unfortunately, as mountains of data gets available, it gets even harder to find approaches to process and analyze such amounts of information, which definitely could have the potential for improvement by getting familiar with the underlying structure of the analyzed system. The problem arises from the phenomena that information sources are distributed along the company and there is no unifying framework whereto all these sources could be integrated. From other point of view, future chemical engineering task are characterized by challenges to continuously improve design, modeling and control techniques, thus improve the efficiency, effectiveness and reliability of all the chemical engineering activities. Non of these can be managed without the exhaustive application of process data, process models and (a priori and extracted) knowledge about the analyzed system. All these tools need to be applied in an integrated way centered around an integrated information environment. From the above statements comes the conclusion that there is always a need for systematic tools that help to integrate information sources and techniques to manage improvement and optimization purposes. Leading chemical companies like DuPont, Dow Chemical or Bayer Technology Services mean that model integrates the organization and as such, their approach leads to the concept of life-cycle modeling. This concept deals with a continuous, vertical and horizontal knowledge and information transfer across the whole company centered around models of different scales, i.e. it is based on hierarchical, multi-scale models where at each level, the appropriate model with the right information content is used. The original aim of my thesis was to develop tools and algorithms for process data analysis of multi-product systems within a research project of Tisza Chemical Group and Cooperative Research Center of Chemical Engineering Institute, but later on it was expanded with a solution approach of the above problem: establish an integrated information environment that collects and 2

stores data in a consistent way from heterogeneous sources and whereto developed tools and algorithms can be attached. Regarding these purposes, the thesis provides contributions from the following scientific areas: semi-mechanistic modeling, dynamic simulation, semiqualitative trend analysis, optimal experiment design. Most of presented solutions lie in the cross-section of engineering and informatics while all of them have some bio-inspired elements (neural network model, sequence alignment, evolutionary strategy)- forecasting that in the future, novel methodologies with such exhaustive application of synergies from different fields of science are expected to arise. 2 Experimental tools and methodologies The presented thesis contributes to modeling, simulation, data analysis and experimentation hence during implementation, techniques of the process engineering and data mining communities were applied and improved. The central data warehouse was realized in a MySQL c database, data transfer from the Process History Database module of the analyzed technology was managed by MS Excel. All the models, algorithms were implemented in MATLAB and Simulink software environment with an ODBC data warehouse connection. For particular solutions, MATLAB extensions of the SOM Toolbox, Statistics Toolbox, Data analysis Toolbox and Bioinformatics Toolbox were applied. 3

3 Theses 1. A process simulator achieved by integrating historical data based process data warehouse to models of the process and its control system effectively supports analysis and improvement of operating technologies. (Related publications: 6, 8, 12, 17, 19, 21) (a) It has been shown that in the near future of process improvement and optimization only dedicated solutions should exist where information sources are consistent, accessible and the data mining and simulation tools work in an integrated way in order to process effectively all these data, models and knowledge of the system. As center of such an integrated framework, I have established a process data warehouse for a Spheripol c -type technology and connected models of the technology, its control system model and data mining tools via a graphical user interface. (b) To present its effectiveness of such integrated information systems, I have developed a prototype of a process simulator attached to process data warehouse for a multi-product polymer producing plant. The simulator is built in a semi-mechanistic way based on multiscale hierarchical models of technology and its basic and advanced control (APC) system. As the operating APC was originally developed for steady-state operation and there exist frequent dynamic state transitions (product changes), the simulator was structured in order to be able to simulate transition strategies thus testify and qualify them as well. (c) I have successfully applied the above prototype system for estimation of product quality by a new semi-mechanistic product model extension and for extraction of cost-energy relation based on boxplots and quantile-quantile plots. 2. Time series analysis based on symbolic segmentation is well applicable for comparing process data trends. (Related publications: 6, 8, 12, 17, 19, 21) (a) Industrial data acquisition systems collect and store large amount of 4

time series of process variables and to follow and to judge these time series is a complicated task even for specialists of the given technology: while comparing two trends, subjective (experience) and objective (distance measure) elements are also needed. I have proposed a semi-qualitative solution where time series are segmented and transformed into symbolic sequences and these are compared by global sequence alignment - a technique in Bio-informatics. Time series are considered to be amino acid sequences and as such this well-known and widely applied technique could be adopted into the field of process engineering. (b) The developed tool has been extended by filtering function in order to be able to process noisy raw data inputs and has been successfully applied to qualify and to group product transitions of the polymer producing technology. A real advantage of my solution is that unlike unsupervised methods (clustering and classification), experimental knowledge of the operators can be explicitly incorporated into the segmentation process hence it supports comparing these trends. 3. Optimal experiment design supported by evolutionary strategy is an effective tool for iterative and interactive model development and parameter identification tasks. (Related publications: 1, 2, 10) Central question of the sequential experiment design method is how to select input profile or time series of a system during the iterative model development phase in order to have the system outputs be most informative regarding the model parameters. This problem can be solved by an iterative-sequential method called optimal experiment design (OED) where the applied extremum-searching algorithm has a key role. The original algorithm was further developed in two elements: (i) I have shown that at these steps, applying evolutionary strategy improves efficiency while (ii) collecting previous results in a database (data warehouse) and using their outcome in the current experiment serves as further improvement for the parameter identification process. In this way, model developments and parameter identification can be managed with less energy efforts and higher reliability. 5

4 Publications related to theses Articles in international journals 1. B. Balasko, J.Madar, F. Szeifert, J. Abonyi, Evolutionary Strategy in Iterative Experiment Design, Hungarian Journal of Industrial Chemistry, Special issue on Recent Advantages on Process Engineering, Vol.33. Nr. 1-2. 2005 2. B. Balasko, J. Madar and J. Abonyi, Additive Sequential Evolutionary Design of Experiments, Lecture Notes in Computer Science, Artificial Intelligence and Soft Computing ICAISC 2006 3. Balazs Balasko, Sandor Nemeth, Akos Janecska, Tibor Nagy, Gabor Nagy, Janos Abonyi, Process modeling and simulation for optimization of operating processes, Computer Aided Chemical Engineering, Volume 24, pp. 895-900, 2007 4. Balazs Balasko, Sandor Nemeth, Gabor Nagy and Janos Abonyi, Integrated Process and Control System Model for Product Quality Control Application to a Polypropylene Plant, Chemical Product and Process Modeling, Vol. 3 Iss. 1, Article 50, 2008 5. B. Balasko and J. Abonyi, What happens to process data in chemical industry: From source to applications - An Overview, Hungarian Journal of Industrial Chemistry, Vol. 35, pp. 75-84, 2007 6. B. Balasko, J. Abonyi, Symbolic Representation based Qualitative Trend Analysis for Process Transition Qualification and Visualization, Engineering Applications of Artificial Intelligence, 2009, submitted Articles in Hungarian journals 7. Balaskó B., Németh S., Abonyi J., Működő technológia optimalizálása az irányító rendszer modelljének felhasználásával, Acta Agraria Kaposvariensis, Vol. 10. Nr. 3., pp. 201-209, 2006 8. B. Balaskó, S. Nemeth, J. Abonyi, Time-Series Similarity - Application to Qualitative Process Trend Analysis, Acta Agraria Kaposvariensis, 2007 6

Refereed presentations 9. F.P. Pach, B. Balasko, S. Nemeth, P. Arva, J. Abonyi, Black-Box and First-Principle Model Based Optimization of Operating Technologies, In proc. of 5 th MATHMOD Conference, Vienna, 2006 10. B. Balasko, J. Madar and J. Abonyi, Additive Sequential Evolutionary Design of Experiments, 8 th International Conference on Artificial Intelligence and Soft Computing, Zakopane, 2006 11. Balazs Balasko, Sandor Nemeth and Janos Abonyi, Process Modeling and Simulation for Optimization of Operating Processes, 17 th European Symposium on Computer Aided Process Engineering, Bukarest, 2007 12. B. Balasko, Z. Banko, J. Abonyi, Analyzing Trends by Symbolic Episode Representation and Sequence Alignment, In proc. of 15 th Mediterranean Conference on Automation and Control, Athens, 2007 13. Balazs Balasko, Sandor Nemeth and Janos Abonyi, Application of integrated process and control system model for simulation and improvement of an operating technology, In proc. of 6 th European Congress of Chemical Engineers, Copenhagen, 2007 14. Laszlo Dobos, Balazs Balasko, Sandor Nemeth and Janos Abonyi, Energy and resource saving at operating plants based on the analysis of historical process data, In proceedings of Early-Stage Energy Technologies for Sustainable Future: Assessment Development, Application - EMINENT 2, Veszprém, 2008 15. Balazs Balasko, Sandor Nemeth, Janos Abonyi, Integrated Process and Control System Model for Product Quality Control - a Soft-sensor based Application, In proceedings of European Control Conference, Budapest, 2009, accepted Non-Refereed presentations 16. Abonyi János, Balaskó Balázs, Pach Ferenc Péter, Feil Balázs, Németh Sándor, Árva Péter, Adatbányászat működő technológiák optimálásában, Adatbányászati alkalmazások perspektívái, Veszprém, 2005 7

17. Balaskó B., Németh S., Abonyi J., Epizód alapú adatelemzési technika technológia-üzemeltetési stratégiák elemzésére, 34. Műszaki Kémiai Napok, Veszprém, 2006 18. Balaskó B., Németh S., Abonyi J., Működő technológia optimalizálása az irányító rendszer modelljének felhasználásával, V. Alkalmazott Informatika Konferencia, Kaposvár, 2006 19. B. Balasko, S. Nemeth and J. Abonyi, Qualitative Analysis of Segmeted Time Series by Sequence Alignment, 7th International Conference of Hungarian Researchers on Computational Intelligence, Budapest, 2006 20. B. Balasko, S. Nemeth and J. Abonyi, Hierarchical clustering of product transition strategies based on symbolic trend representation in a multiproduct process, 35. Műszaki Kémiai Napok, Veszprém, 2007 21. B. Balasko, S. Nemeth, J. Abonyi, Time-Series Similarity - Application to Qualitative Process Trend Analysis, VI. Alkalmazott Informatika Konferencia, Kaposvár, 2007 Other 22. Balazs Feil, Balazs Balasko, Janos Abonyi, Visualization of Fuzzy Clusters by Fuzzy Sammon Mapping Projection - Application to the Analysis of Phase Space Trajectories, Soft Computing - A Fusion of Foundations, Methodologies and Applications, Vol. 11 Nr. 5, pp. 479-489, 2007 23. T. Kenesei, B. Balasko, J. Abonyi, A MATLAB Toolbox and its Webbased Variant for Fuzzy Cluster Analysis, 7th International Conference of Hungarian Researchers on Computational Intelligence, Budapest, 2006 8