Theses of the doctoral (PhD) dissertation Integration of Process Simulation and Data Mining Techniques for the Analysis and Optimization of Process Systems Balazs Balasko University of Pannonia PhD School of Chemical and Material Engineering Science Supervisor Janos Abonyi, PhD Sandor Nemeth, PhD Department of Process Engineering Veszprem 2009.
1 Introduction and aim of the work Customers satisfaction and the economical challenge of modern technologies claim for a continuous optimization in every field of life. In chemical industry, products with tailored quality values have to be produced while specific costs have to be on a minimal level. Due to its high automation level, chemical industry can provide large amounts of data for these optimization purposes. Unfortunately, as mountains of data gets available, it gets even harder to find approaches to process and analyze such amounts of information, which definitely could have the potential for improvement by getting familiar with the underlying structure of the analyzed system. The problem arises from the phenomena that information sources are distributed along the company and there is no unifying framework whereto all these sources could be integrated. From other point of view, future chemical engineering task are characterized by challenges to continuously improve design, modeling and control techniques, thus improve the efficiency, effectiveness and reliability of all the chemical engineering activities. Non of these can be managed without the exhaustive application of process data, process models and (a priori and extracted) knowledge about the analyzed system. All these tools need to be applied in an integrated way centered around an integrated information environment. From the above statements comes the conclusion that there is always a need for systematic tools that help to integrate information sources and techniques to manage improvement and optimization purposes. Leading chemical companies like DuPont, Dow Chemical or Bayer Technology Services mean that model integrates the organization and as such, their approach leads to the concept of life-cycle modeling. This concept deals with a continuous, vertical and horizontal knowledge and information transfer across the whole company centered around models of different scales, i.e. it is based on hierarchical, multi-scale models where at each level, the appropriate model with the right information content is used. The original aim of my thesis was to develop tools and algorithms for process data analysis of multi-product systems within a research project of Tisza Chemical Group and Cooperative Research Center of Chemical Engineering Institute, but later on it was expanded with a solution approach of the above problem: establish an integrated information environment that collects and 2
stores data in a consistent way from heterogeneous sources and whereto developed tools and algorithms can be attached. Regarding these purposes, the thesis provides contributions from the following scientific areas: semi-mechanistic modeling, dynamic simulation, semiqualitative trend analysis, optimal experiment design. Most of presented solutions lie in the cross-section of engineering and informatics while all of them have some bio-inspired elements (neural network model, sequence alignment, evolutionary strategy)- forecasting that in the future, novel methodologies with such exhaustive application of synergies from different fields of science are expected to arise. 2 Experimental tools and methodologies The presented thesis contributes to modeling, simulation, data analysis and experimentation hence during implementation, techniques of the process engineering and data mining communities were applied and improved. The central data warehouse was realized in a MySQL c database, data transfer from the Process History Database module of the analyzed technology was managed by MS Excel. All the models, algorithms were implemented in MATLAB and Simulink software environment with an ODBC data warehouse connection. For particular solutions, MATLAB extensions of the SOM Toolbox, Statistics Toolbox, Data analysis Toolbox and Bioinformatics Toolbox were applied. 3
3 Theses 1. A process simulator achieved by integrating historical data based process data warehouse to models of the process and its control system effectively supports analysis and improvement of operating technologies. (Related publications: 6, 8, 12, 17, 19, 21) (a) It has been shown that in the near future of process improvement and optimization only dedicated solutions should exist where information sources are consistent, accessible and the data mining and simulation tools work in an integrated way in order to process effectively all these data, models and knowledge of the system. As center of such an integrated framework, I have established a process data warehouse for a Spheripol c -type technology and connected models of the technology, its control system model and data mining tools via a graphical user interface. (b) To present its effectiveness of such integrated information systems, I have developed a prototype of a process simulator attached to process data warehouse for a multi-product polymer producing plant. The simulator is built in a semi-mechanistic way based on multiscale hierarchical models of technology and its basic and advanced control (APC) system. As the operating APC was originally developed for steady-state operation and there exist frequent dynamic state transitions (product changes), the simulator was structured in order to be able to simulate transition strategies thus testify and qualify them as well. (c) I have successfully applied the above prototype system for estimation of product quality by a new semi-mechanistic product model extension and for extraction of cost-energy relation based on boxplots and quantile-quantile plots. 2. Time series analysis based on symbolic segmentation is well applicable for comparing process data trends. (Related publications: 6, 8, 12, 17, 19, 21) (a) Industrial data acquisition systems collect and store large amount of 4
time series of process variables and to follow and to judge these time series is a complicated task even for specialists of the given technology: while comparing two trends, subjective (experience) and objective (distance measure) elements are also needed. I have proposed a semi-qualitative solution where time series are segmented and transformed into symbolic sequences and these are compared by global sequence alignment - a technique in Bio-informatics. Time series are considered to be amino acid sequences and as such this well-known and widely applied technique could be adopted into the field of process engineering. (b) The developed tool has been extended by filtering function in order to be able to process noisy raw data inputs and has been successfully applied to qualify and to group product transitions of the polymer producing technology. A real advantage of my solution is that unlike unsupervised methods (clustering and classification), experimental knowledge of the operators can be explicitly incorporated into the segmentation process hence it supports comparing these trends. 3. Optimal experiment design supported by evolutionary strategy is an effective tool for iterative and interactive model development and parameter identification tasks. (Related publications: 1, 2, 10) Central question of the sequential experiment design method is how to select input profile or time series of a system during the iterative model development phase in order to have the system outputs be most informative regarding the model parameters. This problem can be solved by an iterative-sequential method called optimal experiment design (OED) where the applied extremum-searching algorithm has a key role. The original algorithm was further developed in two elements: (i) I have shown that at these steps, applying evolutionary strategy improves efficiency while (ii) collecting previous results in a database (data warehouse) and using their outcome in the current experiment serves as further improvement for the parameter identification process. In this way, model developments and parameter identification can be managed with less energy efforts and higher reliability. 5
4 Publications related to theses Articles in international journals 1. B. Balasko, J.Madar, F. Szeifert, J. Abonyi, Evolutionary Strategy in Iterative Experiment Design, Hungarian Journal of Industrial Chemistry, Special issue on Recent Advantages on Process Engineering, Vol.33. Nr. 1-2. 2005 2. B. Balasko, J. Madar and J. Abonyi, Additive Sequential Evolutionary Design of Experiments, Lecture Notes in Computer Science, Artificial Intelligence and Soft Computing ICAISC 2006 3. Balazs Balasko, Sandor Nemeth, Akos Janecska, Tibor Nagy, Gabor Nagy, Janos Abonyi, Process modeling and simulation for optimization of operating processes, Computer Aided Chemical Engineering, Volume 24, pp. 895-900, 2007 4. Balazs Balasko, Sandor Nemeth, Gabor Nagy and Janos Abonyi, Integrated Process and Control System Model for Product Quality Control Application to a Polypropylene Plant, Chemical Product and Process Modeling, Vol. 3 Iss. 1, Article 50, 2008 5. B. Balasko and J. Abonyi, What happens to process data in chemical industry: From source to applications - An Overview, Hungarian Journal of Industrial Chemistry, Vol. 35, pp. 75-84, 2007 6. B. Balasko, J. Abonyi, Symbolic Representation based Qualitative Trend Analysis for Process Transition Qualification and Visualization, Engineering Applications of Artificial Intelligence, 2009, submitted Articles in Hungarian journals 7. Balaskó B., Németh S., Abonyi J., Működő technológia optimalizálása az irányító rendszer modelljének felhasználásával, Acta Agraria Kaposvariensis, Vol. 10. Nr. 3., pp. 201-209, 2006 8. B. Balaskó, S. Nemeth, J. Abonyi, Time-Series Similarity - Application to Qualitative Process Trend Analysis, Acta Agraria Kaposvariensis, 2007 6
Refereed presentations 9. F.P. Pach, B. Balasko, S. Nemeth, P. Arva, J. Abonyi, Black-Box and First-Principle Model Based Optimization of Operating Technologies, In proc. of 5 th MATHMOD Conference, Vienna, 2006 10. B. Balasko, J. Madar and J. Abonyi, Additive Sequential Evolutionary Design of Experiments, 8 th International Conference on Artificial Intelligence and Soft Computing, Zakopane, 2006 11. Balazs Balasko, Sandor Nemeth and Janos Abonyi, Process Modeling and Simulation for Optimization of Operating Processes, 17 th European Symposium on Computer Aided Process Engineering, Bukarest, 2007 12. B. Balasko, Z. Banko, J. Abonyi, Analyzing Trends by Symbolic Episode Representation and Sequence Alignment, In proc. of 15 th Mediterranean Conference on Automation and Control, Athens, 2007 13. Balazs Balasko, Sandor Nemeth and Janos Abonyi, Application of integrated process and control system model for simulation and improvement of an operating technology, In proc. of 6 th European Congress of Chemical Engineers, Copenhagen, 2007 14. Laszlo Dobos, Balazs Balasko, Sandor Nemeth and Janos Abonyi, Energy and resource saving at operating plants based on the analysis of historical process data, In proceedings of Early-Stage Energy Technologies for Sustainable Future: Assessment Development, Application - EMINENT 2, Veszprém, 2008 15. Balazs Balasko, Sandor Nemeth, Janos Abonyi, Integrated Process and Control System Model for Product Quality Control - a Soft-sensor based Application, In proceedings of European Control Conference, Budapest, 2009, accepted Non-Refereed presentations 16. Abonyi János, Balaskó Balázs, Pach Ferenc Péter, Feil Balázs, Németh Sándor, Árva Péter, Adatbányászat működő technológiák optimálásában, Adatbányászati alkalmazások perspektívái, Veszprém, 2005 7
17. Balaskó B., Németh S., Abonyi J., Epizód alapú adatelemzési technika technológia-üzemeltetési stratégiák elemzésére, 34. Műszaki Kémiai Napok, Veszprém, 2006 18. Balaskó B., Németh S., Abonyi J., Működő technológia optimalizálása az irányító rendszer modelljének felhasználásával, V. Alkalmazott Informatika Konferencia, Kaposvár, 2006 19. B. Balasko, S. Nemeth and J. Abonyi, Qualitative Analysis of Segmeted Time Series by Sequence Alignment, 7th International Conference of Hungarian Researchers on Computational Intelligence, Budapest, 2006 20. B. Balasko, S. Nemeth and J. Abonyi, Hierarchical clustering of product transition strategies based on symbolic trend representation in a multiproduct process, 35. Műszaki Kémiai Napok, Veszprém, 2007 21. B. Balasko, S. Nemeth, J. Abonyi, Time-Series Similarity - Application to Qualitative Process Trend Analysis, VI. Alkalmazott Informatika Konferencia, Kaposvár, 2007 Other 22. Balazs Feil, Balazs Balasko, Janos Abonyi, Visualization of Fuzzy Clusters by Fuzzy Sammon Mapping Projection - Application to the Analysis of Phase Space Trajectories, Soft Computing - A Fusion of Foundations, Methodologies and Applications, Vol. 11 Nr. 5, pp. 479-489, 2007 23. T. Kenesei, B. Balasko, J. Abonyi, A MATLAB Toolbox and its Webbased Variant for Fuzzy Cluster Analysis, 7th International Conference of Hungarian Researchers on Computational Intelligence, Budapest, 2006 8