Practical Machine Learning and Data Analysis



Similar documents
An Overview of Knowledge Discovery Database and Data mining Techniques

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

not possible or was possible at a high cost for collecting the data.

Oracle Real Time Decisions

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Social Media Mining. Data Mining Essentials

AdTheorent s. The Intelligent Solution for Real-time Predictive Technology in Mobile Advertising. The Intelligent Impression TM

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Introduction. A. Bellaachia Page: 1

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

DATA MINING TECHNIQUES AND APPLICATIONS

Data Mining for Fun and Profit

Database Marketing, Business Intelligence and Knowledge Discovery

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

INTRUSION PREVENTION AND EXPERT SYSTEMS

Information Visualization WS 2013/14 11 Visual Analytics

Gerard Mc Nulty Systems Optimisation Ltd BA.,B.A.I.,C.Eng.,F.I.E.I

Data Mining: Overview. What is Data Mining?

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

CoolaData Predictive Analytics

Data Mining mit der JMSL Numerical Library for Java Applications

Data mining as a tool of revealing the hidden connection of the plant

OUTLIER ANALYSIS. Data Mining 1

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Fourth generation techniques (4GT)

How To Use Neural Networks In Data Mining

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

The Scientific Data Mining Process

Introduction. Chapter 1

WHITEPAPER. Creating and Deploying Predictive Strategies that Drive Customer Value in Marketing, Sales and Risk

DATA PREPARATION FOR DATA MINING

Text Analytics using High Performance SAS Text Miner

Sentiment analysis using emoticons

(Refer Slide Time: 01:52)

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Customer Classification And Prediction Based On Data Mining Technique

L25: Ensemble learning

Introduction to Data Mining

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Pentaho Data Mining Last Modified on January 22, 2007

Data Mining System, Functionalities and Applications: A Radical Review

Data Mining. SPSS Clementine Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

ANALYTICS IN BIG DATA ERA

Model-based Testing: Next Generation Functional Software Testing

Data Mining Applications in Manufacturing

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

The Edge Editions of SAP InfiniteInsight Overview

Unsupervised Data Mining (Clustering)

TIBCO Spotfire Guided Analytics. Transferring Best Practice Analytics from Experts to Everyone

Artificial Intelligence in Retail Site Selection

Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product

Test Data Management Best Practice

Maschinelles Lernen mit MATLAB

An Implementation of Active Data Technology

ETPL Extract, Transform, Predict and Load

IFS-8000 V2.0 INFORMATION FUSION SYSTEM

INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER

Microsoft Azure Machine learning Algorithms

Alarm Response Management

SPPA-D3000 Plant Monitor Technical Description

Data Mining Solutions for the Business Environment

Fairfield Public Schools

Data Mining and Visualization

Appendix A: Science Practices for AP Physics 1 and 2

Software Engineering Reference Framework

Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

Best Practices for Controller Tuning

Data Mining Algorithms Part 1. Dejan Sarka

Mind Commerce. Commerce Publishing v3122/ Publisher Sample

Intrusion Detection via Machine Learning for SCADA System Protection

Industrial IT Process Data Management. Advanced IT Tools for Building Information Systems in the Process Industry

Big Data. Fast Forward. Putting data to productive use

EST.03. An Introduction to Parametric Estimating

DYNAMIC FUZZY PATTERN RECOGNITION WITH APPLICATIONS TO FINANCE AND ENGINEERING LARISA ANGSTENBERGER

Utilizing Domain-Specific Modelling for Software Testing

Solar, storage and mining: New opportunities for solar power development. By Thomas Hillig (THEnergy) and James Watson (SolarPower Europe)

Hexaware E-book on Predictive Analytics

Chapter 11. MRP and JIT

control manual Optimal maintenance decisions for asset managers by andrew k.s. jardine and neil montgomery 44 Industrial Engineer

Data Preprocessing. Week 2

Sanjeev Kumar. contribute

Calculation of Risk Factor Using the Excel spreadsheet Calculation of Risk Factor.xls

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

An Introduction to. Metrics. used during. Software Development

Exploratory Data Analysis for Ecological Modelling and Decision Support

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

IAI : Expert Systems

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Perspectives on Data Mining

On the Deficiencies of Active Network Discovery Systems

Transcription:

Data Analysis of Industrial Data Practical Machine Learning and Data Analysis Swedish Institute of Computer Science

Outline Data Analysis of Industrial Data 1 Data Analysis of Industrial Data Analysing Industrial and Commercial Data The Data Analysis Process The Data Preparation and Merging Problem 2 Analysing Industrial Data Some Common Difficulties in Data Preparation Examples 3 A For Data Preparation An Overview of the Data Preparation Description of Data Preparation Operations 4 Modelling and Validation Deployment and General Recommendations

Data Analysis of Industrial Data Data Analysis and Data Mining Analysing Industrial and Commercial Data The Data Analysis Process Data Preparation and Merging The purpose of data analysis and data mining is to extract answers and useful patterns, such as regularities and rules, in data. These patterns can then be exploited in making predictions, diagnoses, classifications, anomaly detection, hypothesis testing, process structure detection etc. Several terms describe essentially the same process: Data analysis, data mining, knowledge discovery in databases (KDD) etc. To some degree, data mining and knowledge discovery can be described as practical application of Machine Learning methods on (often large) data sets.

Data Analysis of Industrial Data Analysing Industrial and Commercial Data The Data Analysis Process Data Preparation and Merging Working Industrial and Commercial Applications (1) Virtual sensors, i.e. an indirect measurement computed based on values that are easier to access. Predictive maintenance and weak point analysis through e.g. maintenance and warranty databases. Incremental step-wise diagnosis of equipment such as car engines or process plants. Intelligent alarm filtering and prioritization of information to operators of complex systems.

Data Analysis of Industrial Data Analysing Industrial and Commercial Data The Data Analysis Process Data Preparation and Merging Working Industrial and Commercial Applications (2) Fraud and fault detection in e.g. data communication systems and ebusiness. Sales and demand prediction, e.g. in power grids or retail. Speed-up trough model approximation in control systems. Clustering and classification of customers, e.g. for targeted pricing and advertising, and identification of churners, i.e. customers likely to change provider.

Data Analysis of Industrial Data The Data Analysis Process Analysing Industrial and Commercial Data The Data Analysis Process Data Preparation and Merging Several attempts have been made to describe the complete analysis process, e.g. process models such as CRISP-DM and SEMMA, as well as several textbooks. The complete process usually includes 1 Problem understanding. 2 Data preparation and understanding. 3 Modelling. 4 Evaluation. 5 Model deployment. We will first focus on the second step, data preparation and understanding.

Data Analysis of Industrial Data Data Transformation and Modeling Analysing Industrial and Commercial Data The Data Analysis Process Data Preparation and Merging Quite often, the exact choice of model / learning system is not critical. Once preprocessing has turned data into something reasonable, simple models may be sufficient. With limited amounts of independent data, the number of free parameters must be kept low.

Data Analysis of Industrial Data Preparation and Transformation of Data Analysing Industrial and Commercial Data The Data Analysis Process Data Preparation and Merging The process of collecting and preprocessing data as well as understanding the domain and the problem is a huge and the by far largest part of a data analysis project. Preparation of data for analysis is often very time consuming since Available data has been collected for various other purposes and is unlikely to fit the new needs perfectly. Data bases need to be merged, using different encodings and ways of identifying items. Efficient methodologies and tools are critical.

Data preparation Data Analysis of Industrial Data Analysing Industrial and Commercial Data The Data Analysis Process Data Preparation and Merging Data preparation is the process of constructing well structured data sets for modelling. The data preparation process includes Acquisition of relevant data. Cleaning data from noise, errors and outliers. Resampling, transposition and other transformations. Dimensionality and volume reduction. Merging several data sets into one.

Data Analysis of Industrial Data Descriptions of Data Preparation Analysing Industrial and Commercial Data The Data Analysis Process Data Preparation and Merging The guidelines given in current data mining process descriptions and textbooks are described on a too general level to be useful in practice. More detailed descriptions exist, but Do not provide a practical guide on what steps to perform and when. Do not provide more than hints on how to create an application that supports these steps. Focus on corporate and business related data rather than industrial data.

Data Analysis of Industrial Data Analysis of Industrial Data Analysing Industrial Data Some Common Difficulties in Data Preparation Examples Generally a mixture of logs from sensors, operations, events and transactions. Several completely different data sources, modes and encodings are not uncommon. Sensor data will reflect that sensors drift, deteriorate, and are replaced and serviced. Anomalies often depend on earlier parts of the process. Still, data analysis is critical when Trying to increase production process efficiency. Explaining phenomena that are not completely understood.

Interpretation Data Analysis of Industrial Data Analysing Industrial Data Some Common Difficulties in Data Preparation Examples Correct typing of parameters is essential for effective modelling. Plain-text encoding can cause severe interpretation problems. Often several encodings for one type of parameter. Encodings may be surprisingly complex and unusable for direct modelling. Many choices on representation can only be made with a good understanding of the domain or with the help of a domain expert.

Data Analysis of Industrial Data Noise, Errors and Anomalies Analysing Industrial Data Some Common Difficulties in Data Preparation Examples Industrial and commercial data usually contain a large amount of noise and errors. Some of these errors may be corrected using domain knowledge or other information sources. This can be very important in data bases with few effective samples. Remaining errors should usually be removed to ensure correct modelling. Noise can sometimes be reduced using redundancy in parameters. Noise can appear in the most surprising forms and for very unusual reasons.

Missing values Data Analysis of Industrial Data Analysing Industrial Data Some Common Difficulties in Data Preparation Examples Real databases more commonly than not contain missing or invalid values. These may be explicitly encoded, but often use a rather ad-hoc representation. Missing values are handled in the same way as invalid values: The record can be removed. The value can be derived from other data. The value can be set to a default value. Missing or partial data can sometimes be found via external sources.

Data Relevance Data Analysis of Industrial Data Analysing Industrial Data Some Common Difficulties in Data Preparation Examples Sensors drift and are replaced. Sales patterns show periodic behaviour. There may be sampling bias in the data set. A large amount of parameters is no guarantee for successful modelling. What we actually might need to measure may still not be included. Validation is often very complicated due to a large and complex state space. We may have very few effective samples.

Data Analysis of Industrial Data Examples of difficult encodings / errors Analysing Industrial Data Some Common Difficulties in Data Preparation Examples Attribute 1 Attribute 2 Attribute 3 Attribute 4 Attribute 5 2.28272 2002080612220500 10.47 Nej red 2.28269 2002080612220622 15.39 Valsbyte red 2.28253 2002080612220743 12.66 hasp temp 680 blue 2.28242 2002080612220886-120.0 Hasp-temp green 22.8232 2002080612221012-120.0 kyln blue 2.28197 2002080612221136-120.0 Kylning rde 2.28157 1858111700000000 13.49 karkylning på alltid Blue 2.28212 1858111700000000 25.85 Ja red 2.28233 2002080612221631 22.98 ej i dragläge red...............

Data Analysis of Industrial Data The Data Preparation Process An Overview of the Data Preparation Description of Data Preparation Operations Generally include a number of steps, some of which have to be iterated several times. Data preparation is often revisited quite far into the analysis and modelling process. To handle this, a methodology will include a number of Distinct operations. Explicit iterations. Intended to provide a structured context to support the data preparation process.

Data Analysis of Industrial Data An Overview of the Data Preparation Description of Data Preparation Operations Overview of the Data Preparation Figure: Overview

Preparatory Steps Data Analysis of Industrial Data An Overview of the Data Preparation Description of Data Preparation Operations Figure: Preparatory Steps

Data Analysis of Industrial Data Type Analysis and Data Loading An Overview of the Data Preparation Description of Data Preparation Operations Analysis of type, range, and format of parameters is often surprisingly difficult. The analysis may be complicated by e.g. Damaged or missing data. Ad-hoc encodings. Should serve as input to the repair and recoding step.

Data Analysis of Industrial Data Repair and Recoding An Overview of the Data Preparation Description of Data Preparation Operations Data generally contain deficiencies and anomalies that can be rectified. An entry for a particular parameter being out of range or missing may mean that a subset of all other parameters are useless. Automatic detection of possible errors and data cleaning are highly useful.

Selection Data Analysis of Industrial Data An Overview of the Data Preparation Description of Data Preparation Operations A refinement of choices made in the selection of data sources. After the type analysis and repair steps we are in a better position to asses quality and usefulness of data. Reducing the sizes of data sets might be considered.

Merging Data Analysis of Industrial Data An Overview of the Data Preparation Description of Data Preparation Operations Often the most complex step of data preparation. Common merge problems include Different sample rates. Transposition. Event / series representations. Differences and errors in textual representations can cause severe problems.

Final Steps Data Analysis of Industrial Data An Overview of the Data Preparation Description of Data Preparation Operations Often a good idea to revisit data selection at this point. Suitable point for data partitioning. Introduction of domain knowledge using calculated parameters.

Data Analysis of Industrial Data Generic Operations An Overview of the Data Preparation Description of Data Preparation Operations The methodology outlined earlier specifies what needs to be done to the data. What remains to describe is how to perform the necessary preparations. We can identify a small set of generic operations, with a focus on data transformation, of which all data preparation tasks are special cases. These operations can be composed into a reproducible sequence of transforms, making it possible to move between preparation steps in a controlled manner. The set of operations is for all practical purposes complete, and contains overlap in the functionality of the operations.

Data Analysis of Industrial Data An Overview of the Data Preparation Description of Data Preparation Operations Visualisation, Inspection and Other Supporting Operations Very important for data preparation and analysis, but outside the scope of this presentation. Not only for discovering structure and dependencies, but to get a more basic understanding of what the parameters represent and on what form. Examples of useful plots are scatter plots, parallel coordinates plots, histograms, and simple series plots. Anomaly detection and structure analysis are critical operations.

Data Analysis of Industrial Data The Modelling Phase Modelling and Validation Deployment and General Recommendations Highly dependant on the choice of model family / parameterisation. Practical models often Have a relatively low number of free parameters. Take considerable amounts of domain knowledge into account. Are highly specialised for the specific domain and application. Validation is critical.

Data Analysis of Industrial Data The Limits of Data-Driven Methods Modelling and Validation Deployment and General Recommendations Data-driven models by nature need access to relevant historical data. Relevant historical data is often not available in significant amounts due to Significant amounts of noise and errors in data. Specifics of the domain. The fact that the system is new and has not generated any data. Knowledge about the domain (e.g. physical structure) must be incorporated into the models.

Validation Data Analysis of Industrial Data Modelling and Validation Deployment and General Recommendations Critical to guarantee that performance will be satisfactory. Can be very difficult due to dependant data. Choice of validation mechanism is often application dependant, but in general leave-one-out cross validation is recommended if possible.

Deployment Data Analysis of Industrial Data Modelling and Validation Deployment and General Recommendations Not necessarily a step that has to be performed, as the goal of the data analysis process was e.g. to test a hypothesis or increase domain knowledge. Usually done into existing application infrastructure, and not quite as often into stand-alone applications. Requires a new level of detailed testing on real-world data.

Data Analysis of Industrial Data General Recommendations Modelling and Validation Deployment and General Recommendations Thoroughly understand the problem you are working on and try to understand the process that generated the data. Take extreme care with validation. Test the application on as much real-world data as possible.

Data Analysis of Industrial Data Data preparation and transformation is critical for the quality of the data analysis. Data preparation and understanding is time consuming and is usually the largest part of a data analysis project. Models are often relatively simple and specialised for the application.

Introduction Creating a demand prediction system A sales prediction module Sales prediction for supply chains Anders Holst Swedish Institute of Computer Science, Anders Holst Sales prediction for supply chains

Introduction Creating a demand prediction system A sales prediction module Supply chain management and sales prediction Demand prediction Sales prediction for Supply Chain Management Reliable estimates of demand is very important for effective supply chain management. We will discuss a decision support system for production, sales, and marketing managers. Increasing the usefulness of the prediction by providing rich information to the user, e.g. uncertainty measures and information on the basis of the prediction. Integration of related data sources such as tracking data., Anders Holst Sales prediction for supply chains

Predicting demand Introduction Creating a demand prediction system A sales prediction module Supply chain management and sales prediction Demand prediction Predict customer demand expressed as expected future sales. Enhance the flow of products in the supply chain Earlier order placements Less production for stock Better production scheduling Adapting marketing strategies Useful in several places within a company, but perhaps mainly for sales managers, sales agents, marketing managers and the persons responsible for production planning., Anders Holst Sales prediction for supply chains

Introduction Creating a demand prediction system A sales prediction module A decision support system Application and modeling considerations Deployment considerations Sales predictions are in most cases not suitable for automated decision making Predictions are uncertain, only a human can decide whether they are plausible or not. The predictions should be used for decision support. When creating a prediction system for decision support, we need to Increase trust in the prediction - not always perfect predictions but useful anyway. Provide tools for increased understanding., Anders Holst Sales prediction for supply chains

Introduction Creating a demand prediction system A sales prediction module Constructing a usable tool Application and modeling considerations Deployment considerations Provide graphs of prediction development. Provide uncertainty measure. Explanations of predictions. Explanations of the data the prediction is based on. No Black Box : The prediction model should be (to some degree) understandable for the user., Anders Holst Sales prediction for supply chains

Introduction Creating a demand prediction system A sales prediction module The sales prediction problem Application and modeling considerations Deployment considerations Although difficult, it is often possible to predict future sales. Faces some difficult problems: Very noisy data. Lack of relevant data. Non-representative historical data: Changes in product lines. Changes in market behaviour., Anders Holst Sales prediction for supply chains

Introduction Creating a demand prediction system A sales prediction module Modeling considerations Application and modeling considerations Deployment considerations Keep the model complexity low. In the model, regard recent sales (or other data) as more important than older sales. To make use of historical sales data, group the articles (e.g. categories and segments). Use all available informative data., Anders Holst Sales prediction for supply chains

Introduction Creating a demand prediction system A sales prediction module What data can we use? Application and modeling considerations Deployment considerations Historical sales data. Very relevant (We can t do much without it!) Might be difficult to find relevant historical sales data. Other information, e.g. sales activities, advertisements, campaigns, weather etc. Encoding problems. Availability of data. Marketing investments and brand tracking., Anders Holst Sales prediction for supply chains

Introduction Creating a demand prediction system A sales prediction module Placing the prediction engine Application and modeling considerations Deployment considerations Predictions should preferably Be available to many or most users. Use all data available, including the latest information. Be integrated in the users normal work tool. Indicates placement as a centrally located server. Always access to the latest data. No need for direct interaction with the prediction engine., Anders Holst Sales prediction for supply chains

Sales scenarios Introduction Creating a demand prediction system A sales prediction module Application and modeling considerations Deployment considerations The sales campaign / seasonal scenario Sales agents visit a predefined number of potential customers during a sales campaign. Defined start and end date. The continuous order scenario Orders arrive continuously at a varying rate., Anders Holst Sales prediction for supply chains

Introduction Creating a demand prediction system A sales prediction module Continuous order prediction Test results Module interface 25000 Prediction Actual Sales 20000 15000 10000 5000 0 415 420 425 430 435 440 445 450 455 460, Anders Holst Sales prediction for supply chains

Introduction Creating a demand prediction system A sales prediction module Season based sales prediction Test results Module interface 700000 600000 Prediction Lower Error Upper Error Total Sales 500000 400000 300000 200000 100000 0 2360 2380 2400 2420 2440 2460 2480 2500, Anders Holst Sales prediction for supply chains

Introduction Creating a demand prediction system A sales prediction module An example interface Test results Module interface, Anders Holst Sales prediction for supply chains

Introduction Creating a demand prediction system A sales prediction module Reliable, useful sales prediction is possible, but precision may vary. Demand prediction is a useful tool for SCM even with limited precision., Anders Holst Sales prediction for supply chains

Introduction The paper mill test case Discussion Emulating process simulators with learning systems Anders Holst Björn Levin Swedish Institute of Computer Science, Anders Holst, Björn Levin Emulating process simulators with learnings systems

Introduction The paper mill test case Discussion Finding optimal production parameters Optimal production parameters in process industries Models of process behaviour There is quite often a need to find optimal production parameters in process industries. Many of the settings are scheduled some time in advance, e.g. to produce different qualities. Near-optimal parameter schedules can possibly be found using an optimiser that iteratively tests different scenarios in a simulator, gradually converging to an optimal solution. Unfortunately, although the simulator in question is faster than real time, it might still not be fast enough., Anders Holst, Björn Levin Emulating process simulators with learnings systems

Introduction The paper mill test case Discussion Modeling process behaviour Optimal production parameters in process industries Models of process behaviour Two different ways to model process behaviour: First principles simulator of some sophistication. Approximate the input-output mapping in the process with a mathematical function without considering the physical path. The latter can be done with a learning system., Anders Holst, Björn Levin Emulating process simulators with learnings systems

Introduction The paper mill test case Discussion Learning systems and simulators Optimal production parameters in process industries Models of process behaviour There is a number of implementation choices for the learning system model: Model the actual outputs of the process or associate simulator states to objective function values directly. Use either real process data or generate data with a simulator for training., Anders Holst, Björn Levin Emulating process simulators with learnings systems

Introduction The paper mill test case Discussion Problem description Test results The Jämsänkoski paper mill, overview TMP 1 5 Refiner Line s @ 115 t/day ea ch 4 Reject Refine rs Bleaching Pulp Storage Mixing Tanks Wet Brok e PM4 121,000 t/yr 900 m/mi n 532 cm wide 57 80 g/m 2 Dry Brok e Mixing Tanks PM5 245,000 t/yr 1350 m/min 835 cm wide 49 56 g/m 2 Wet Brok e Dry Brok e TMP 2 5 Refiner Line s @ 190 t/day ea ch 3 Reject Refine rs Bleaching Pulp Storage Mixing Tanks PM6 325,000 t/yr 1600 m/min 930 cm wide 39 60 g/m Wet Brok e Dry Brok e, Anders Holst, Björn Levin Emulating process simulators with learnings systems

Introduction The paper mill test case Discussion The selected sub-system Problem description Test results TMP 1 5 Refiner Lines @ 115 t/day each 4 Reject Refiners Bleaching Pulp Storage Tank 5 TMP 2 5 Refiner Lines @ 190 t/day each 3 Reject Refiners Bleaching Pulp Storage Tank 2 Tank 3 Tank 4 PM6 Tank 8 Tank 7 Tank 6, Anders Holst, Björn Levin Emulating process simulators with learnings systems

Introduction The paper mill test case Discussion Problem description Test results The optimisation problem and data generation The cost function is constructed so that electricity costs for running the refiners are minimized while maintaining consistency of schedules and tank levels. Essentially, the problem can be reduced to predict a number of tank levels at the end of the optimisation horizon from initial tank levels and production schedules., Anders Holst, Björn Levin Emulating process simulators with learnings systems

Data generation Introduction The paper mill test case Discussion Problem description Test results Data was generated by manually modeling the joint distribution of the inputs and the initial state of the simulator, taking random samples from this distribution and generating data in the simulator based on these. Control sequences were modeled using Markov processes, simple distributions were generally used for other parameters. In total, around ten million samples were generated, representing about six years of operations., Anders Holst, Björn Levin Emulating process simulators with learnings systems

Introduction The paper mill test case Discussion Models and data representation Problem description Test results Three different kinds of models tested: k-nearest Neighbour Mixtures of Naive Bayes models Multi-Layer Perceptrons Initial tests with step-by-step recursive predictions were not promising. Data was re-coded as events., Anders Holst, Björn Levin Emulating process simulators with learnings systems

Test results Introduction The paper mill test case Discussion Problem description Test results The test results for the event representation were not as good as expected. So what goes wrong?, Anders Holst, Björn Levin Emulating process simulators with learnings systems

Introduction The paper mill test case Discussion Oscillation of tank levels Problem description Test results 73.47 89.38 37.19 49.96 45.22 21.18 77.66 18.4 85.69 19.57 43.63 19.71 20.6 55.85 15200 16499 Figure: An example of the characteristic oscillations of tank levels A modified event representation was used (see results)., Anders Holst, Björn Levin Emulating process simulators with learnings systems

Introduction The paper mill test case Discussion Substituting simulators using learning systems Test results show that this is difficult on real-world problems. Are learning systems a suitable solution? Most real processes are described by a system of non-linear differential equations. Such systems will display chaotic behaviour. The time horizon for accurate predictions is likely to be short. This may not be a problem, e.g. if mean values or aggregates can be predicted. Dividing the problem into smaller sub-problems, each solved with learning systems, may be effective, but usually require domain knowledge., Anders Holst, Björn Levin Emulating process simulators with learnings systems

Introduction The paper mill test case Discussion The idea of replacing a slow simulator with a faster learning system is certainly appealing, and can potentially result in much faster parameter optimisation systems. However, it is by no means an easy process: Generating suitable training and validation data can be laborious and difficult, which also applies to finding a suitable representation. A learning system might not be suitable for approximating a process simulator., Anders Holst, Björn Levin Emulating process simulators with learnings systems