Automation of Input Data Management

Transcription

1 THESIS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Automation of Input Data Management Increasing Efficiency in Simulation of Production Flows ANDERS SKOOGH Department of Product and Production Development CHALMERS UNIVERSITY OF TECHNOLOGY Gothenburg, Sweden 2011

2 Automation of Input Data Management Increasing Efficiency in Simulation of Production Flows ANDERS SKOOGH ISBN ANDERS SKOOGH, Doktorsavhandlingar vid Chalmers tekniska högskola Ny serie nr 3285 ISSN X Department of Product and Production Development Chalmers University of Technology SE Gothenburg Sweden Telephone + 46 (0) Cover: Figure illustrating the need for efficient supply of shop floor data to simulation models. The figure includes original pictures from Volvo Cars Newsroom (available from and Johansson and Zachrisson (2006). Chalmers Reproservice Gothenburg, Sweden 2011

3 Automation of Input Data Management Increasing Efficiency in Simulation of Production Flows ANDERS SKOOGH Department of Product and Production Development Chalmers University of Technology ABSTRACT Production is of significant importance for the social welfare and economic growth in societies worldwide. In Europe, more than 30% of all job opportunities are related to the manufacturing industry. Improvements of material flows in production are of extra importance for reducing the system losses and increasing the robustness of production systems. Unfortunately, the most powerful tools for analyzing dynamic aspects are associated with extensive data requirements and, thus, inefficient procedures for keeping models up-to-date. This thesis addresses the input data management procedure for one such tool, namely discrete event simulation (DES). The purpose of the thesis is to enable daily use of DES to support production engineers in their work with increasing efficiency, sustainability and robustness of production systems. The aim is to reduce the time-consumption for input data management and thereby facilitate the supply of recent production data to DES models. The thesis is divided into two parts, treated as interrelated studies, addressing one research question (RQ) each. Part One (RQ1), mapping the industrial state-of-the-art of input data management, is mainly based on qualitative methods including interviews and questionnaires with DES practitioners. The results show that collection of raw data, identification of available data sources, and data analysis and preparation are the three most time-consuming activities. There is still limited use of automatic support systems and the data are often manually collected, processed and supplied to models by means of spreadsheet interfaces. Findings in Part One also show that automated connections to external databases are important for future sustainability analyses using DES. Part Two (RQ2), proposing and evaluating an approach for automated input data management, is mainly based on the analysis of existing industrial data sources (archive analysis). This review aims to identify the functionalities necessary to automatically transform production data (raw data) to information for a DES model. A demonstrator, called the GDM-Tool, is developed and tested in three independent case studies. The results show that the proposed automated approach reduces the time-consumption for input data management by approximately 75%. There are still difficulties in input data management for DES, partly due to the limited access to detailed production data. Therefore, the author recommends that industrial and academic partners increase efforts necessary to facilitate continuous raw data collection and, by extension, also automated data processing. In cases where enough data are available, the proposed solution (RQ2) enables more frequent updates of DES models and provides production engineers with a powerful tool for increasing efficiency of production systems on a daily basis. Keywords: Input data management, Discrete Event Simulation, Sustainable Production.

4

5 ACKNOWLEDGEMENTS It is a common misapprehension that the work and efforts behind a PhD thesis are rather a oneman show. Sometimes, during times with lots of writing, I would definitely agree on that statement. The fact is, though, that even the smallest touch of distance is enough to recall that this work would be impossible without the support provided by helpful people and organizations with whom I have associated. Firstly, I would like to thank my two supervisors Professor Johan Stahre and Associate Professor Björn Johansson. I am very grateful for you offering me the possibility to specialize within the very interesting area of production. Today, production is more important and recognized than none of us believed just a few years ago. Of course, you are also a substantial part of the entire process behind this thesis! I would also like to recognize the organizations supporting my research financially. VINNOVA (Swedish Governmental Agency for Innovation Systems) has funded two projects serving as a basis for this thesis: FACTS (Conceptual Factory Development) and DFBB (Digital Factory Building Blocks). Furthermore, these projects would not have been initiated without the support and participation from our industrial project partners: AB Volvo, Faurecia, Haldex, Scania, VBG Group, and Volvo Car Corporation. In addition, ProViking financed a scholarship making it possible to perform parts of my research at NIST (National Institute of Standards and Technology, USA), which resulted in one of the papers appended to this thesis. ProViking has also substantially contributed to my professional network of PhD students through their excellent research school. I strongly appreciate the help and support provided by Edward Williams (University of Michigan and PMC, Dearborn, Michigan, USA). He is always interested in my research and consistently shares his excellent skills when reviewing and proofreading my publications. My only problem is to figure out how to return all favors. Björn has already been mentioned but deserves some extra appreciation because of his enthusiasm and always positive attitude. We have spent lots of time together during several business trips. Despite an extensive number of work hours, I always get back home with many memorable experiences. Björn is, however, by no means the only colleague at the Department of Product and Production Development contributing to our excellent and inspiring work environment. I do not want to list all names, but I hope to have the privilege of working with you for many years to come. Many thanks also to the colleagues at KTH, Skövde University and Swerea IVF for great cooperation in the research projects listed above. Last but not least, my family and friends deserve all my gratitude. You make me enjoy life and have fun, which is important in order to refuel with energy. Furthermore, my parents and parents-in-law have supported with baby-sitting et cetera, making it possible to combine the completion of this thesis with parental leave. Finally, to my beloved wife Sofia and my wonderful son Victor: Thanks for sharing your lives, I love you! Anders Skoogh Göteborg, October 2011

6

7 APPENDED PUBLICATIONS Publication I Publication II Publication III Publication IV Publication V Publication VI Skoogh, A., and B. Johansson A Methodology for Input Data Management in Discrete Event Simulation Projects. In: Proceedings of the 2008 Winter Simulation Conference, eds. S.J. Mason, R. Hill, L. Moench, O. Rose, T. Jefferson, and J.W. Fowler, A. Skoogh initiated the paper, performed the data collection and the data analysis. He wrote the paper with assistance of B. Johansson and presented the paper at the Winter Simulation Conference in Skoogh, A., and B. Johansson Mapping of Time-Consumption During Input Data Management Activities. Simulation News Europe, 19(2): A. Skoogh initiated the paper, performed the data collection and the data analysis. He wrote the paper with assistance of B. Johansson. Skoogh, A., T. Perera, and B. Johansson. Submitted. Input Data Management for Simulation - Industrial Practices and Future Trends, Simulation Modelling Practice and Theory. (Submitted for publication.) A. Skoogh initiated the paper, designed the survey and participated in the data collection. He wrote the paper together with T. Perera and B. Johansson. Skoogh, A., B. Johansson, and L. Hansson Data Requirements and Representation for Simulation of Energy Consumption in Production Systems. In: Proceedings of CIRP Manufacturing Systems A. Skoogh initiated the paper, participated in the data collection and performed the data analysis. He wrote the paper with assistance of B. Johansson and L. Hanson. Further, he presented the paper at CIRP Manufacturing Systems in Skoogh, A., B. Johansson, and J. Stahre. Submitted. Automated Input Data Management: Evaluation of a Concept for Reduced Time-Consumption in Discrete Event Simulation. Simulation: Transactions of the Society for Modeling and Simulation International. (Under 2 nd review.) A. Skoogh initiated the paper, designed the presented approach and the demonstrator together with M. Johansson*, J. Balderud** and A. Olofsson**. He wrote the paper together with B. Johansson and J. Stahre. * guest researcher at NIST, USA ** two very competent software engineers Skoogh, A., J. Michaloski, and N. Bengtsson Towards Continuously Updated Simulation Models: Combining Automated Raw Data Collection and Automated Data Processing. In: Proceedings of the 2010 Winter Simulation Conference, eds. B. Johansson, S. Jain, J. Montoya-Torres, J. Hugan, and E. Yücesan, A. Skoogh initiated the paper. He designed and evaluated the approach together with J. Michaloski and N. Bengtsson. All three participated in writing the paper and A. Skoogh presented it at the Winter Simulation Conference in 2010.

8

9 ADDITIONAL PUBLICATIONS BY ANDERS SKOOGH Skoogh, A., and B. Johansson Time-Consumption Analysis of Input Data Activities in Discrete Event Simulation Projects. In: Proceedings of the Swedish Production Symposium Ng. A., M. Urenda Moris, J. Svensson, A. Skoogh, and B. Johansson FACTS Analyser: An innovative tool for factory conceptual design using simulation. In: Proceedings of the Swedish Production Symposium Skoogh, A., J.-P. André, C. Dudas, J. Svensson, M. Urenda Moris, and B. Johansson An Approach to Input Data Management in Discrete Event Simulation Projects: A Proof of Concept Demonstrator. In: Proceedings of the 6th EUROSIM Congress on Modelling and Simulation, eds. B. Zupančič, R. Karba and S. Blažič. Johansson, M., B. Johansson, A. Skoogh, S. Leong, F. Riddick, Y.T. Lee, G. Shao, and P. Klingstam A Test Implementation of the Core Manufacturing Simulation Data Specification. In: Proceedings of the 2007 Winter Simulation Conference, eds. S.G. Henderson, B. Biller, M.-H. Hsieh, J. Shortle, J.D. Tew, and R.R. Barton, Skoogh, A Methods for Input Data Management Reducing the Time-Consumption in Discrete Event Simulation. Thesis for the Degree of Licentiate of Engineering, Chalmers University of Technology. Alin, D., J. Andersson, M. Andersson, A. Isaksson, A. Skoogh, and E. Helander Examining the Relation Between EPEI-Time and Productivity Using Discrete Event Simulation. In: Proceedings of the 2009 Swedish Production Symposium, (2): Bengtsson, N., G. Shao, B. Johansson, Y.T. Lee, S. Leong, A. Skoogh, and C. McLean Input Data Management Methodology for Discrete Event Simulation. In: Proceedings of the 2009 Winter Simulation Conference, eds. M.D. Rossetti, R.R. Hill, B. Johansson, A. Dunkin and R.G. Ingalls, Johansson, B., M. Mani, A. Skoogh, and S. Leong Discrete Event Simulation to Generate Requirements Specification for Sustainable Manufacturing Systems Design. In: Proceedings of PerMIS'09. Boulonne, A., B. Johansson, A. Skoogh, and M. Aufenanger Simulation Data Architecture for Sustainable Development. In: Proceedings of the 2010 Winter Simulation Conference, eds. B. Johansson, S. Jain, J. Montoya-Torres, J. Hugan, and E. Yücesan, Michaloski, J., B. Raverdy, B.E. Lee, F. Proctor, S. Venkatesh, N. Bengtsson, and A. Skoogh Push Button Discrete Event Simulation For Analysis of Factory Floor Operations. In: Proceedings of ASME 2010 International Mechanical Engineering Congress & Exposition. Gustafsson, B., and A. Skoogh Design and Problem Oriented Education Based on the Application of Knowledge developing Chalmers' Master s Programme in Production Engineering. In: Proceedings of the Swedish Production Symposium Andersson, J., A. Skoogh, B. Johansson, and S. Leong Environmental Activity Based Cost Using Discrete Event Simulation. In: Proceedings of the 2011 Winter Simulation Conference, eds. S. Jain, R.R. Creasey, J. Himmelspach, K.P. White, and M. Fu. (Accepted for publication.) Lindskog, E., L. Lundh, J. Berglund, Y.T. Lee, A. Skoogh, and B. Johansson A Method for Determining the Environmental Footprint of Industrial Products Using Simulation. In: Proceedings of the 2011 Winter Simulation Conference, eds. S. Jain, R.R. Creasey, J. Himmelspach, K.P. White, and M. Fu. (Accepted for publication.)

10 TABLE OF CONTENTS 1 INTRODUCTION INPUT DATA MANAGEMENT IN SIMULATION PURPOSE AIM RESEARCH QUESTIONS FOCUS DELIMITATIONS THESIS STRUCTURE METHOD WORK PROCEDURE RESEARCH DESIGN Surveys Multiple Case Studies DATA COLLECTION Interviews Questionnaires Archive Analysis Electrical Power Measurements DATA ANALYSIS Interview Analysis Statistical Analysis VALIDATION FRAME OF REFERENCE THE DIFFERENCE BETWEEN DATA AND INFORMATION Data Requirements in DES Data Categories DATA COLLECTION Manual Gathering Continuous Data Collection DATA PROCESSING Data Representations Input Modeling Software Support in Data Processing DATA INTERFACES AND STANDARDS CMSD STEP AP AutomationML THE INPUT DATA MANAGEMENT PROCEDURE Methods for Rapid Input Data Management Crucial Activities with Regard to the Time-Consumption AUTOMATED INPUT DATA MANAGEMENT Methodologies A & B Methodology C Methodology D SOFTWARE ARCHITECTURE FOR GENERIC DATA MANAGEMENT... 30

11 4 RESULTS INDUSTRIAL STATE-OF-THE-ART RQ Publication I Publication II Publication III Publication IV INTERIM DISCUSSION RQ Data Input Activities Level of Automation in Input Data Management Methodological Discussion Connection to RQ AUTOMATED INPUT DATA MANAGEMENT RQ Publication V Publication VI Additional Case Study in the Automotive Industry INTERIM DISCUSSION RQ Required Functionality Evaluation of the Proposed Concept Methodological Discussion DISCUSSION IS AUTOMATION ALWAYS FEASIBLE? RESEARCH CONTRIBUTION AND POSSIBLE INDUSTRIAL APPLICATIONS FUTURE RESEARCH CONCLUSIONS REFERENCES APPENDED MATERIALS... 71

12 LIST OF FIGURES Figure 1: Production is important! The production area contributes major parts of the employment opportunities and economic turnover in the European Union (MANUFUTURE Eurostat 2006) Figure 2: The different parts of the input data management process. Pictures from Volvo Cars Newsroom ( and Johansson and Zachrisson (2006) Figure 3: Delimitations for research and time measurements in the input data management process Figure 4: Outline of the thesis, following the structure of interdependent studies (Wilkinson 1991) Figure 5: Research procedure: publications mapped against theoretical domains and empirical case studies Figure 6: Data triangulation for validation of the GDM-Tool in a multiple case study design Figure 7: The scope of data, information and knowledge included in this thesis Figure 8: Example of a scatter-plot Figure 9: KS tests use the difference between the empirical and fitted distributions to evaluate their compliance Figure 10: Four approaches to input data management using various levels of automation (Robertson and Perera 2002) Figure 11: Each plug-in performs a user-specified request. Plug-ins can be selected to suit the desired functionality at a specific company, and the structure also allows further development of needed functionality Figure 12: Activities in input data management, structured as a best-practice methodology Figure 13: The time-consumption of each individual data input activity Figure 14: The use of different data sources in manufacturing industry Figure 15: Current methodology and level of automation in input data management among manufacturing companies Figure 16: Illustration of machine state cycles and their corresponding variations in power levels Figure 17: Overview of the proposed concept for automated input data management Figure 18: Illustration of the difference between configuration and automation modes in the GDM-Tool.. 47 Figure 19: User interface developed to demonstrate the functionalities required for automated input data management Figure 20: Dialog box for executing a data update using automation mode in the GDM-Tool Figure 21: Flow chart of the production line Figure 22: A simple user view of the DES model developed in the commercial simulation package ARENA Figure 23: Example of methodology c for automated input data management (Robertson and Perera 2002) Figure 24: A P-P plot exemplifying the goodness-of-fit functionality provided by the statistics plug-in Figure 25: Example of common data operations applied in a typical configuration for obtaining the MTTR from an ACS

13 LIST OF TABLES Table 1: A mapping of the research methods used in the appended publications Table 2: Compilation of data definitions Table 3: Compilation of information definitions Table 4: Categorization of data based on availability and collectability (Robinson and Bhatia 1995) Table 5: Activities required for transforming data to information (Davenport and Prusak 1998) Table 6: Examples of traces and bootstraps as alternatives to statistical distributions Table 7: Table mapping the relations between functional modeling elements, required input data and model entities Table 8: Data table displaying the time-consumption during data input activities in all projects evaluated in Publication II Table 9: Average power utilization per machine, distributed between the different machine state cycles. 41 Table 10: Standard deviations for the average power utilization between individual machine state cycles Table 11: Comparison of the time-consumption between the traditional industrial approach and the GDM- Tool in Publication V Table 12: A comparison between simulation outputs from traditional and automated input data management Table 13: Comparison of the data quality between automated and manual input data management Table 14: Required functionalities in an efficient solution for automated input data management Table 15: Compilation of the results from the validation of the proposed concept of automated input data management performed in three test cases

14 LIST OF ACRONYMS ACS = Automated Collection System CA = Controllability Analysis CAD = Computer Aided Design CAEX = Computer Aided Engineering Exchange CBS = Corporate Business System CMSD = Core Manufacturing Simulation Data CNC = Computer Numerical Control DES = Discrete Event Simulation DFBB = Digital Factory Building Blocks ED = Enterprise Dynamics ELCD = European reference Life Cycle Database ERP =Enterprise Resource Planning GDM = Generic Data Management IDEF = Integrated computer aided manufacturing DEFinition ISO = International Organization for Standardization IT = Information Technology KS = Kolmogorov-Smirnov LCA = Life Cycle Assessment MDA = Manufacturing Data Acquisition MES = Manufacturing Execution System MLE = Maximum Likelihood Estimation MRP = Material Requirement Planning MTBF = Mean Time Between Failures MTTR = Mean Time To Repair NIST = National Institute of Standards and Technology OEE = Overall Equipment Efficiency OLE = Object Linking and Embedding OPC = OLE for Process Control PLC = Programmable Logic Controller PLM = Product Lifecycle Management RQ = Research Question SCADA = Supervisory Control And Data Acquisition SISO = Simulation Interoperability and Standards Organization SME = Small and Medium Enterprises STEP = Standard for the Exchange of Product model data SQL = Structured Query Language TBF = Time Between Failures TTR = Time To Repair UML = Unified Modeling Language UPLCI = Unit Process Life Cycle Inventories WSC = Winter Simulation Conference XML = Extensible Markup Language

15 CHAPTER 1 INTRODUCTION 1 INTRODUCTION Production is of major importance for the welfare and development of societies. Compared to other lines of business, production offers more employment possibilities and contributes more to economic turnover than any other area. This is exemplified by the statistics in Figure 1, showing that 30% of the jobs and 42% of the economic value added to the European Union in 2006 stem from the production area. Figure 1: Production is important! The production area contributes major parts of the employment opportunities and economic turnover in the European Union (MANUFUTURE Eurostat 2006). A central factor for competitive companies contributing to a sustainable society is to rapidly adapt to changes in consumption patterns. When it comes to capacity issues, this means responding to increased demands with short ramp-up times. Unfortunately, this adaptation is often handled by significant investments instead of increasing utilization of existing equipment. Consequently, the Overall Equipment Efficiency (OEE), as an indicator of production efficiency, is only around 50% on average in Swedish industry (Ingemansson 2004). Naturally, to increase competitiveness, it is very important to focus on reducing the inherent losses by complementing investment projects with continuous improvement of production flows. Discrete Event Simulation (DES) is a powerful tool for such improvements, especially when regarding dynamic aspects of production systems (Ingalls 2002, McLean and Leong 2001). This is necessary in order to effectively reduce balancing losses and system losses (Wild 1975), the latter caused by varying processing times and unplanned disturbances. However, despite its potential, DES is often renounced in favor of static production analysis tools or qualified guesses, which is due to the extensive time-consumption of dynamic simulation studies (McNally and Heavey 2004). This leads to less detailed analyses, and by extension, to production systems designed for ideal circumstances, disregarding variation and disturbances. The reason for the extensive time-consumption is arguably the input data management process, which takes time due to its importance for model quality and the specifically detailed data requirements DES (Moon and Phatak 2005, Robertson and Perera 2002). 1

16 CHAPTER 1 INTRODUCTION Therefore, strategic work towards more efficient input data management is required, including structured guidelines for data collection and IT solutions to automate parts of the process. Automation would enable continuous update of the data required in simulation models and facilitate DES to be used on a daily basis, as a desktop resource for production engineers. Such a desktop resource provides possibilities to reduce balancing and system losses by continuous and fact-based action, e.g. in production planning, management of systems constraints, and rebalancing. In addition, recent research shows that an increased use of DES helps companies to reduce the environmental impact of production and, thus, improves their sustainability performance (Solding, Petku, and Mardan 2009; Heilala et al. 2008). 1.1 INPUT DATA MANAGEMENT IN SIMULATION Here, input data management is defined as all activities required to obtain quality-assured, and simulation-adapted, representations of all relevant input parameters for a simulation model - in other words, the process of managing the input data required in simulation models. Input data management starts with the collection of raw data and ends with providing processed data as information to a simulation model, typically in a standardized file or a customized spreadsheet. Thus, the actual process of supplying the final information into the simulation model, often realized by an automated connection between the data interface and the simulation model, is excluded. The entire process is visualized in Figure 2. The three areas of input data management all consist of one or several activities, e.g. data correction as a part of data processing. The activities can in turn be broken down in tasks, for example to remove irrelevant data points. Tasks specifically performed in data processing use data operations, for instance filtering of data points based on date and/or time. Data Collection Automated collection Manual gathering Estimation Data Processing Understand Correct and conform Condense Interfacing Manual input Customized interface Standardized interface Real world Raw data Information Simulation model Figure 2: The different parts of the input data management process. Pictures from Volvo Cars Newsroom ( and Johansson and Zachrisson (2006). As indicated in the background, input data management is a time-consuming step contributing as much as 40% of the project time in simulation studies (Trybula 1994). The reasons are that many different aspects and parameters of production resources are included in detailed models, and that stochastic representation of simulation parameters requires lots of raw data samples. For example, it is desirable to collect more than 200 real-world measurements when representing machine breakdown patterns, such as Mean Time Between Failures (MTBF) and Mean Time To Repair (MTTR), in dynamic simulations (Perrica et al. 2008). In addition, substantial care is required due to the importance of data for model quality (McNally and Heavey 2004). Thus, the expression Garbage in Garbage out (GIGO) is often used among simulation specialists (Robertson and Perera 2002). The positive aspect of this problem is that improvements in the input data management process hold great potential to reduce the time-consumption of entire simulation studies. 2

17 CHAPTER 1 INTRODUCTION At present, the input data management process almost exclusively implies some manual participation. This includes manual methods for data gathering as well as human involvement in data processing, e.g. data cleansing using formulas in MS Excel and data condensation supported by distribution-fitting software or similar functionality provided by simulation applications. Robertson and Perera (2002) identified only one case of a completely automated connection to computer applications within the Corporate Business System (CBS) for the bulk of data. The reason is arguably the lack of simulation data in these applications, especially data describing the dynamics of production systems (Moon and Phatak 2005). Moreover, finding all necessary input data in a condensed form, suitable for simulation, is very unlikely. In current research and development on input data management, the main focus includes elevating the level of automation by integration of simulation models and major data sources within the CBS (Robertson and Perera 2002, Randell and Bolmsjö 2001). These data sources are typically exemplified by Enterprise Resource Planning (ERP) systems and Product Lifecycle Management (PLM) systems. Such integration certainly holds potential but faces several challenges due to the diversity of simulation tools (Semini, Fauske, and Strandhagen 2006) and the lack of detailed data mentioned above. 1.2 PURPOSE The purpose of this thesis is to enable continuous use of DES as a desktop resource for production engineers. By extension, this will lead to more efficient production flows, which are well balanced with regard to varying processing times and robustly designed to reduce the negative effects of production disturbances. Better utilization of existing resources will of course reduce the need for major investments in new capacity. The challenge is to efficiently manage the extensive demand for data and information for dynamic simulations. 1.3 AIM The aim of this thesis is to enable reduction of the time-consumption for input data management in simulation of material flows in production, i.e. the time required for the process to manage the input data. This work is mainly focused on the processing of raw data to information, with the assumption that automation of data input activities is an appropriate alternative. Standards for raw data collection and data interfaces, enabling efficient supply of updated information to simulation models, are briefly evaluated. Throughout the input data management process, the quality of data must be similar to the result of a common industrial approach. One additional aim is to contribute to the development of industrial tools effecting automated input data management for DES. A demonstrator will serve as an aid to convey the thoughts behind the presented concepts and exemplify how the research can be realized in industry. However, the presented demonstrator, called the GDM-Tool (Generic Data Management), is not intended to be a commercial product itself; it should rather be used as inspiration for further development by major DES users or other software vendors in the area of production data management. 3

18 CHAPTER 1 INTRODUCTION 1.4 RESEARCH QUESTIONS Two research questions (RQ) have been identified as specifically important for satisfying the purpose and aim of this thesis. The answer to the first one describes industrial practice in the process of preparing input data for simulation models, and the subsequent question concerns how the efficiency of activities in this data input process can be increased. RQ1. RQ2. What is the current industrial state-of-the-art for the input data management process? Describing the industrial state-of-the-art is an important reference and starting point when developing new methods and tools for the input data management process, as in RQ2. Important activities have to be identified and specifically those having highest impact on the total time-consumption. Further, since an automated solution is presumed, the current level of automation for the data input activities has to be identified. How can efficient and automated input data management, for simulation of material flows in production, be realized? In the design of a concept for increased efficiency in input data management, it is crucial to identify important functionalities for automating the data input activities. The proposed concept is demonstrated as a software solution and its general applicability is tested in three case studies. The thesis also evaluates whether and how much the timeconsumption can be reduced compared to a traditional industrial data input procedure (a reference procedure). 1.5 FOCUS The results of this thesis are based on research within manufacturing companies, mainly in the automotive and aerospace industries. Production data typically come from production lines with automated and semi-automated work stations, performing machining or assembly operations. This includes for example milling, drilling, turning and material handling operations as well as assembly tasks supported by hand-held tools. However, despite the focus on manufacturing companies, and the listed operations, it is supposed that the designed and evaluated methods can easily be applied also in other types of industries. Another aspect to mention is that the research is strongly related to DES for analysis of material flows in production. This is shown throughout the aim, research questions, and case descriptions and, consequently, it is impossible to claim and promise a wider application area. However, it is likely that the outlined and evaluated solutions to data management also work for other tools with similar data requirements, such as line balancing, scheduling, and production monitoring applications. 1.6 DELIMITATIONS A central word in this thesis is data and throughout the text it refers to quantitative data, e.g. processing times and repair times. Thus, logical relations, such as routing rules and priority decisions on the shop floor, are disregarded. That kind of information is assumed to be handled during model building. Consequently, in order to use DES as a desktop resource, the automated solution for input data management has to be complemented with routines for keeping the simulation model itself updated when changes in the production system occur. Data quality is not specifically addressed in this research. The aim of all described methods and tools is to reduce the time-consumption during data input activities. Nevertheless, it is clearly 4

19 CHAPTER 1 INTRODUCTION stated that the data quality must be similar to a traditional approach to input data management in industry. During the design of automated input data management (RQ2), the focus has been on increasing efficiency of the processing of raw data to information, i.e. data processing (section 1.1). The measured time-consumption (Δt) is therefore delimited to start from available sets of raw data and stop when the simulation information is exported to a simulation interface; see Figure 3. The actual collection of raw data (e.g. sensors and databases) and the development of standards for data interfacing towards the simulation models are, thus, delimited. However, existing solutions have been included in the evaluation of the concept and tested together with the proposed demonstrator for automated input data management. Raw data Data input processing activities Δt Interface Simulation model Figure 3: Delimitations for research and time measurements in the input data management process. Because of the limited time, the evaluation of automated input data management (for RQ2) is delimited by three case studies from two types of industries, automotive and aerospace. Specifications of necessary data operations as well as the circumstances for validation are therefore dependent on conditions from these cases. Moreover, when comparing the timeconsumption for automated input data management and the traditional industrial approach, the influence and possible variation due to different users are delimited. Each specific data input procedure is measured once, with a single user, because simulation engineers with enough experience in input data management are few in the case study companies. 1.7 THESIS STRUCTURE This thesis follows the structure of interdependent studies, described by Wilkinson (1991). The first part includes the research performed to answer research question 1 (RQ1) and contains the Publications I, II, III, and IV. After an interim discussion, the results lead to the initialization of a second study. Naturally, this part contains the research related to research question 2 (RQ2), which includes Publications V, VI, and an additional previously unpublished case study. The complete outline is visualized in Figure 4. Notice that the word study here is not equal to case study used later in the thesis. 5

20 CHAPTER 1 INTRODUCTION Figure 4: Outline of the thesis, following the structure of interdependent studies (Wilkinson 1991). 6

21 CHAPTER 2 METHOD 2 METHOD This thesis originates in systems theory and the overall objective is to describe how system entities such as people, machines and IT solutions interact as one unit. This implies that the system as one unit might have different qualities and characteristics than the sum of the individual objects (Wallén 1996). For example, disturbances on individual machines can occur without affecting the performance of an entire production system, especially on machines which are not current bottlenecks. The research questions aim to map and improve the interaction and division of tasks between engineers and their support tools in order to increase efficiency in the analysis of production systems. An empirical approach (Flynn et al. 1990) is applied to study the behavior of production systems, their support functions and the effects of methods and tools proposed in this thesis. Hence, information for state-of-the-art descriptions (RQ1) and design criteria for the demonstrator of automated input data management (RQ2) are collected from industrial simulation projects. Additionally, test implementations and evaluation of the demonstrator are performed in real-world industrial case studies. The selection of research methods is strongly inspired by Flynn et al. (1990) and their thorough description of empirical research in operations management. A fundamental objective of this research is to ensure that proposed methods and tools are applicable and value-adding for industry. It is considered very important to really understand the industrial work procedures regarding input data management and to closely study the effects of the suggested improvements. Therefore, the author has cooperated with industrial partners (FACTS and DFBB projects (Chalmers PPU 2011)) and participated in the mapping of current practice as well as development, implementation and validation of the automated support systems. Such an approach is called Action Research (Coughlan and Coughlan 2002). The combination of empirical and action research is often criticized for being close to industrial development, which should be performed by the companies themselves. A key aspect is that academic research has to ensure that generated knowledge is generic, which requires the use of recognized academic methods, transparent descriptions enabling repeated tests, close and iterative connection to theory, and validation in multiple case studies (Flynn et al. 1990, Gummesson 2000). Applied research based on empirical findings often incorporates methods originating from social science. Therefore, in contrast to classical research on a technical faculty (closer to positivism), qualitative elements are increasingly applied in operations research due to the importance of organizational aspects. This fact has of course had its impact on the methods presented later in this chapter. The qualitative methods are used to collect experiences and opinions from the human resources in the studied organizations in order to ensure correct understanding of current work procedure and desired efficiency of proposed improvements. The mix of traditional quantitative methods and the qualitative elements described above is encouraged by several authors stating that research methods should be selected on the basis of the research purpose and aim, instead of old research traditions (Bryman and Bell 2007, Danemark et al. 1997). Table 1 contains a compilation of the selected methods. 7

22 DES validation approaches Interview coding Descriptive statistics Participant observations Archive analysis Physical measurements Interviews Questionnaires Multiple Case Studies Case study Survey CHAPTER 2 METHOD Table 1: A mapping of the research methods used in the appended publications. Design Data collection Data analysis Part1 RQ1 Publication I X X X X Publication II X X X X X Publication III X X X Publication IV X X X Part2 RQ2 Publication V X X X X X Publication VI X X X X X Unpublished Case X X X X X Part 1, corresponding to RQ1, has the aim of theory development describing the industrial stateof-the-art for input data management. An inductive approach (Wallén 1996) using qualitative methods dominates the research design. Interviews are used to define the data input activities as well as the tools and techniques supporting the process. Questionnaires are applied to assess the time-consumption for each activity and for identifying the levels of automation in industrial input data management. Data analyses are based on interview coding and descriptive statistics; see further information below. Part 2, corresponding to RQ2, combines theory development and theory verification using a deductive approach (Starrin and Svensson 1994). A demonstrator of a concept for automated input data management is developed, based on previous theory. Historical archive analysis is used to specify the requirements on the software, based on the design of raw data sources in three industrial case studies. Participant observations are used to evaluate the performance of the demonstrator. As far as possible, within the given timeframe, the results are validated using the multiple case study design and triangulation; see section WORK PROCEDURE The following figure (Figure 5) illustrates the research procedure. Theoretical inputs to the research are displayed above the time-line and the empirical contributions, such as case studies in research projects, below. Note that the publications are numbered to support the line of arguments in this thesis instead of in a sequential order. Further, the publications position on the time-line illustrates when the original work was performed rather than their publication year. This means, for example, that journal articles are dated by when the study was completed and documented, thus not including the review process. 8

23 CHAPTER 2 METHOD Theoretical domains The data input procedure Statistics Input modeling Raw data collection methods Data standards Automated input data management II I V VI IV III PhD thesis Unpublished case Survey with 15 projects Automotive case 1 FACTS project Aerospace case Scholarship at NIST Empirical findings Automotive case 2 DFBB project Survey at WSC Figure 5: Research procedure: publications mapped against theoretical domains and empirical case studies. The theoretical parts are divided into six main domains where most of the literature review can be categorized. However, some additional theory is provided in Chapter 3. The empirical findings are obtained within two different projects: FACTS (Conceptual Factory Development) and DFBB (Digital Factory Building Blocks) (Chalmers PPU 2011). The FACTS project developed a simulation tool and an input data management application intended for analysis of production systems on a conceptual level in early project phases. The DFBB project currently realizes a database containing production equipment represented on neutral formats, enabling information sharing between various engineering tools and continuous update of operational data. Additionally, one case study was performed in cooperation with NIST (National Institute of Standards and Technology, USA) during a scholarship sponsored by the ProViking research school (ProViking 2011). 2.2 RESEARCH DESIGN The first thing to do after establishing the research questions is to select a research design. Different designs are appropriate depending on whether the question concerns theory development or theory verification. Flynn et al. (1990) list several types of research designs including single or multiple case studies, field studies, panel studies, focus groups and surveys. Note that the research designs are not detailed methods themselves, but serve as umbrellas for various data collection and analysis methods, often combined with each other. In this thesis, surveys are selected for most of the theory development in RQ1. A single case study is also used as a supplement for collecting the necessary data in Publication IV. Further, a multiple case study design is applied for the development and validation of a concept for automated input data management when answering RQ SURVEYS Surveys, using qualitative data collection methods such as interviews and questionnaires, are suitable for studies defining state-of-the-art within a specific group or context (Flynn et al. 1990). In RQ1, the aim is to describe the industrial state-of-the-art in input data management for DES in manufacturing industry. According to the same article, it is common to select a population which is homogeneous with regard to one important characteristic. As a result, the population for the interviews and questionnaires in this survey represents DES users from a set 9

24 CHAPTER 2 METHOD of industrial projects performed at companies with various location, size, line of business, and previous experience of DES in order to cover a broad spectra of industrial practices. A second survey is also performed within the scope of RQ1. This survey aims to map the level of automation in the industrial input data management process. To cover a wider range of companies and also to include possible differences around the world, the population was extended to consist of more people but still homogeneous with DES practitioners. The DES practitioners include industrial representatives, consultants and researchers at the major research conference in the area, which is called the Winter Simulation Conference (WSC). See section for further information about the data collection using questionnaires MULTIPLE CASE STUDIES The case study design is in general good for collection of detailed data and information from real-world environments, e.g. using archive analysis and interviews. From the collected data and information, a case study is also a powerful approach in order to investigate interactions between a phenomenon and its real-world context (Dubois and Gadde 2002, Wallén 1996). This characteristic in combination with its strength to deal with a full variety of evidence, such as observations, interviews, documents and artifacts, contributed to case studies appropriateness in this thesis. Therefore, this research design is selected for collecting the data requirements on a demonstrator for automated input data management and also to investigate the impact of automation on the time-consumption (RQ2). Many of the negative effects related to case studies stem from the involvement of the researcher, who is always present, at least in the role of an observer. This close involvement has meant that case studies are frequently challenged with regard to objectivity and credibility (Wallén 1996), due to unavoidable interpretations by the researcher. Therefore, it is important to declare and discuss the researcher s background and also to explain the real-world context from where the results are observed. Compare the discussion on empirical and applied research provided earlier in this chapter. Based on Weick (1979), Dubois and Gadde (2002) describe another common criticism against case studies. This is that case studies sometimes tend to be too broad, trying to describe everything, which usually ends up by describing nothing. To overcome this drawback, case studies must be preceded by the description of a solid theoretical framework (Yin 1994). Furthermore, understanding the background theory will facilitate the research design and by extension give good guidance of what relations to establish, what data to collect and so forth. Previous theory has served as a basis for designing the concept of automated input data management and for developing the demonstrator later used for validation and presentation purposes. 2.3 DATA COLLECTION Four data collection methods have been applied in order to systematically document observations. Interviews and questionnaires are, partly in tandem, used to describe current industrial practice (RQ1). In addition, physical measurements of electrical power have been performed in order to investigate how environmental parameters should be represented in DES studies (also RQ1). Archive analysis is the major data collection method during the development and validation of the demonstrator for automated input data management (RQ2). 10

25 CHAPTER 2 METHOD INTERVIEWS In this thesis, interviews are applied to describe the industrial state-of-the-art of input data management (RQ1). More specifically, required activities and applied techniques are identified on the basis of empirical findings from previously performed DES projects. In general, interviews are appropriate for gaining insight into people s opinions and experiences (Denscombe 2007), which is valuable when identifying current industrial practice. Interviews can be designed in different ways depending on the type of data to collect. Three common categories are: structured, semi-structured and unstructured interviews (Denscombe 2007). Firstly, a structured interview means that the interviewer keeps tight control over the topic, format and order of the questions. It is almost comparable to a face-to-face questionnaire and is usually applied to collect large amounts of data. In contrast, unstructured interviews aim to let the interviewees develop their thoughts more thoroughly about topics that interest themselves. The researcher just plants a topic or an issue, which results in problems with foreseeing the outcome of the interview, but hopefully gives a deep understanding of information conveyed by the interviewee. In this research, the semi-structured approach has been applied, which combines the qualities of the structured and unstructured interviews. It was therefore chosen for keeping control and focus of the process but still being able to get in-depth answers. In practice the researcher has a clear list of issues to deal with, but is still flexible with specific topics and order of questions, letting the interviewee control parts of the process (Denscombe 2007). Moreover, another reason is that the administration of semi-structured interviews is easier thanks to increased predictability of resources and time-consumption compared to the unstructured approach QUESTIONNAIRES Two different questionnaires are used in this thesis. One is face-to-face, clipboard style (Denscombe 2007), performed on the same population as the interviews described above. The aim of this questionnaire was to assess the time-consumption related to the different activities in the input data management process. The second one is a questionnaire investigating the different levels of automation in the process of data input preparation. This questionnaire was supplied to DES experts at the WSC 2010 and via a web-based format for reminders and additional answers. A questionnaire is a good alternative to interviews when the questions are straightforward and when large amounts of data will be collected (Denscombe 2007). Furthermore, the fact that all respondents answer the same questions provides standardized answers, which are often correct and easy to analyze. Questionnaires provide either factual information or opinions (Denscombe 2007). Here, mainly factual information about industrial practice is collected in both Publications II and III ARCHIVE ANALYSIS Archive analysis is a data collection method closely tied to single or multiple case study designs (Flynn et al. 1990). Throughout all three case studies used for answering RQ2, production data have been collected from databases containing breakdown frequencies, repair times and processing times for workstations. These data are used to identify requirements on the demonstrator for automated input data management, such as data formats to support and necessary data operations (e.g. to calculate MTBF and MTTR). 11

26 CHAPTER 2 METHOD The most significant advantage of archive analysis is that the data are unbiased because they are collected prior to the initiation of research (Flynn et al. 1990). Consequently, the demonstrator developed here will be designed to work with real-world industrial raw data not influenced by special needs of laboratory environments. A negative aspect is that it may be impossible to find all necessary data, since the researcher does not control the sources of data collection. An example is that the three companies included in this thesis did not have all parameters necessary for DES in their existing data sources. To cover up for this shortage, supplementary requirements on data availability and the data processing functionalities were identified during additional data gathering and informal interviews within the project groups ELECTRICAL POWER MEASUREMENTS Publication IV maps the variability of electrical power used as an input parameter in DES models. This is an initial step in describing how new simulation input parameters for environmental analysis should be handled and represented. In a separate case study, the electrical power utilization of five multi-operational machines was sampled during production. The power monitoring equipment was connected to the incoming three-phase connection using Y-connections (Stevenson 1982). See Publication IV for more details about the machines and the equipment. 2.4 DATA ANALYSIS Two major strategies have been applied for analysis of gathered data. Interview analysis, including structured coding and theory development, has naturally been used for the interviews related to RQ1. In other parts, statistical analyses have been used for compiling questionnaire responses and electrical power measurements (also RQ1). The same method has also been applied for evaluating the effects of automated input data management with regard to timeconsumption and data quality (RQ2) INTERVIEW ANALYSIS The analysis of interview results in Publications I and II was performed using an approach similar to Grounded Theory (Glaser and Strauss 1967). Materials based on interview notes were initially coded with as little reference to previous knowledge as possible. This step is similar to the term open coding in Grounded Theory. Later, the initial codes were compared to each other in order to find synergies and key codes, which can be compared to the step of selective coding in Grounded Theory. The last step in Grounded Theory, theoretical coding, results in formulation of a theory based on the codes from steps one and two in the previous process. In this thesis, significant codes were mainly selected on the basis of frequency. In other words, the state-ofthe-art description in Publication I is a compilation of the most common activities and techniques used by DES practitioners STATISTICAL ANALYSIS Basic statistical calculations have been performed in order to describe industrial practice, mainly to analyze the results from the questionnaires in Publications II and III. In other words, such calculations are used to evaluate how frequently different levels of automation are applied and for assessing the time-consumption in different data input activities. Descriptive statistics is a common and useful analysis method for description of current work procedures in empirical research (Flynn et al. 1990, Miles and Huberman 1994). Moreover, similar calculations are used to identify the most common data input activities and techniques from the interview materials in 12

27 CHAPTER 2 METHOD Publication I, and for quantifying the variance of electrical power from the measurements in Publication IV. As a clarification, more advanced statistical analyses, such as parameter estimation using maximum-likelihood estimation and distribution fitting with the Kolmogorov-Smirnov test, were applied as functionalities within the GDM-Tool (RQ2). Hence, these types of analyses are considered as parts of the systems design for the GDM-Tool rather than research methodologies; see sections and VALIDATION The description of research strategies in the introduction of this chapter includes terms such as empirical research, applied research and action research. All of these terms are related to challenges regarding validity and generality (Flynn et al. 1990, Gummesson 2000). However, using several cases in a multiple case study design provides a basis for validation of the proposed concept. Flynn et al. (1990) state that multiple case studies are capable of theory validation by supporting or falsifying proposed concepts. Three case studies are included in this thesis for validation of the proposed approach to automated input data management. The case studies are performed in three different companies, representing two lines of business, which is considered enough for initial validation. This line of argument is also supported by Denscombe (2007), in his description of triangulation. Triangulation often means collection of the same data using different collection methods. However, Denscombe extends the term (data triangulation) to include the collection of data from different data sources, e.g. with various cultural, social or geographical contexts. These contexts are here varied by the selection of different case study companies; see Figure 6. Case 1 Automotive industry National mediumsized company Processing time, MTBF, MTTR Case 2 Aerospace industry International large-sized company MTBF, MTTR Case 3 Automotive industry National mediumsized company MTBF, MTTR Figure 6: Data triangulation for validation of the GDM-Tool in a multiple case study design. Moreover, to evaluate the data quality between the two approaches in Publication V, statistical hypothesis testing was applied. A test for difference in mean (Montgomery and Runger 1999) was used to compare the output of a simulation model with data prepared, first using the 13

28 CHAPTER 2 METHOD traditional industrial approach (reference procedure), and then using the GDM-Tool. A significance level of α=0,05 was used. See Publication V for more information on the test procedure. In Publication VI and in the previously unpublished case, the same comparison is performed using a combination of descriptive statistics and face validation of the data and simulation models (Sargent 2005). In this way, process experts within the research projects are used to determine whether data quality can be considered similar between the two approaches. 14

29 CHAPTER 3 FRAME OF REFERENCE 3 FRAME OF REFERENCE This frame of reference covers several areas necessary for mapping current practices and increasing efficiency in input data management. Firstly, the fundamental starting point is presented by defining the terms data and information and the value-adding activities required to obtain the transformation between the two of them, i.e. the chief objective of input data management. Thereafter, the different parts of input data management (collection, processing and interfacing) are described successively. There is an emphasis on data processing, which is the central part for the research questions. Existing solutions to increase efficiency and elevate the level of automation during the entire input data management process are finally reviewed. 3.1 THE DIFFERENCE BETWEEN DATA AND INFORMATION An established distinction between data, information and by extension also knowledge is crucial when describing and developing the input data management process. For example, since the aim of input data management is to transform data to information for simulation models, the activities involved are heavily dependent on different types of values necessary to add in order to obtain information from data. Table 2 contains some of the common definitions of data in simulation-related literature: Table 2: Compilation of data definitions. Definition Publication Data is a set of discrete, objective facts about events Davenport and Prusak (1998). Not yet interpreted symbols Van der Spek and Spijkervet (1997). Data consists of analog or digital signals or indications (syntax) and are used for the representation of information in the purpose of further processing. The corresponding definitions of information are provided in Table 3: Table 3: Compilation of information definitions. Bernhard and Wenzel (2005) based on DIN (1995). Definition Publication A flow of meaningful messages Nonaka and Takeuchi (1995). Data with meaning Van der Spek and Spijkervet (1997). Data with relevance and purpose Davenport (1997). A message meant to change the receiver s perception Davenport and Prusak (1998). Data vested with meaning Choo, Detlor and Turnbull (2000). For simulation purposes, the model is the receiver of information, which means that the viewpoint of receiver perception (Davenport and Prusak 1998) is less applicable here. Instead, it is more relevant to conclude that the mission of data input activities is to interpret and process data in order to add meaning, relevance and purpose. This statement conforms well to the definition of data by Davenport and Prusak (1998) and the definition of information by Davenport (1997). Knowledge, on the other hand, is more connected to the mental models, skills, proficiency, know-how, and experience of people. Such knowledge about production systems is usually added after simulation modeling and, thus, is not necessarily part of the input data management process (Figure 7). It is however important to mention that experienced 15

30 CHAPTER 3 FRAME OF REFERENCE simulation and production engineers can deduce partial knowledge about the studied system by just looking at the information supplied to the simulation model. The creation of such knowledge, though, is not the main objective of this research. Scope of this thesis Input Data Data Information Simulation Model Knowledge Management Figure 7: The scope of data, information and knowledge included in this thesis DATA REQUIREMENTS IN DES All simulation models differ between one another because of variations in production system configurations, as well as in the model-building procedure. Therefore, simulation is often said to be an art rather than a science. This fact makes it difficult to provide general guidelines about the data requirements in DES models. For example, models aimed for conceptual analysis in early stages of factory design require less detailed data than models built for optimizing production systems in full operation (Hatami 1990). Approximations are more common in early phases in order to test as many alternatives as possible, while accuracy is more valued for decision support in on-going production. In the latter case, sample sizes around 200 raw data points or more are required to correctly mimic the dynamics of the studied system (Perrica et al. 2008). Comparing to other common analysis tools (e.g. static line-balancing and spreadsheet capacity calculations) the reader can easily understand the efforts required in input data management, but hopefully also the potential of using DES. Moreover, it is not just the number of samples for each model parameter that contributes to the significant work load. There are also many different parameters needed to correctly represent all events in an operating production system. Processing times (e.g. assembly time and machine time) and breakdown patterns (e.g. MTBF and MTTR) are often argued to be specifically important for mimicking the dynamics (Hatami 1990, Williams 1994). As stated above, it is irrelevant to give precise guidelines but the following list contains some of the most common parameters in DES models (Hatami 1990): Processing times Set-up times Breakdown frequency (MTBF or Mean Jobs Between Failure) Repair times (MTTR or Mean Down Time) Product mix Work schedules Speeds on material handling equipment Quality-related parameters (measuring frequencies, scrap rates, etc.) Etc DATA CATEGORIES Data can be divided into various categories depending on format, availability and intended area of use. Especially the two first categorizations are important for the design and selection of methods and tools during the input data management process. The most important aspect of 16

31 CHAPTER 3 FRAME OF REFERENCE data formats is the definition of qualitative and quantitative data. The term quantitative data usually means numbers and is, in the context of DES, often exemplified by processing times, breakdown frequencies and production schedules (Robinson 2004). A general definition of quantitative data is (Bordens and Abbot 2005): Data collected that are represented by numbers that can be analyzed with widely available descriptive and inferential statistics. Qualitative data on the other hand are equivalent to non-numeric facts and beliefs about the system (Robinson 2004). In simulation studies, qualitative data are usually expressed in words or pictures. Examples of qualitative DES data are CAD drawings of a layout, and rules to control elements of the system (Robinson 2004). Such data are excluded from this thesis and expected to be handled by the simulation engineer during model development. It should also be mentioned that the management of qualitative data is more difficult to automate than for quantitative data. A general definition of qualitative data is (Bordens and Abbot 2005): Data in which the values of a variable differ in kind (quality) rather than in amount. The second categorization is based on the degree of availability and collectability (Table 4) (Robinson and Bhatia 1995). Firstly, category A data are already available, for instance in automated data collection systems, CBS or just previously measured data stored in local sources. Of course, this type of data is very convenient, since further work is limited to data processing and validation. Secondly, category B data require additional effort because they need to be gathered during the simulation study. Finally, category C data are neither previously available nor collectable, often due to new processes or equipments in the investigated system. Estimation of category C data requires both a well-designed strategy and scrupulous care, in order to maintain model quality. A high portion of category A data is required to succeed with automated input data management (RQ2). Table 4: Categorization of data based on availability and collectability (Robinson and Bhatia 1995). Category A Category B Category C Available Not available but collectable Not available and not collectable Finally, the third categorization stems from Pidd (1996), who categorizes data with regard to their area of use. He states that they are intended for either preliminary investigation of a system (contextual data), model realization, or model validation. Naturally, input data management concerns the data necessary for model realization. 3.2 DATA COLLECTION Data collection is a central part of input data management, even though the development of methods and tools for data collection is delimited from this research. The review below contains information about state-of-the-art procedures to serve as a starting-point for the more central step of data processing. In addition, the format and quality of raw data, generated during data collection, are highly important when selecting methods to automate the data processing step MANUAL GATHERING Many companies use DES for single projects instead of as an integrated part of the production engineering process (McNally and Heavey 2004, Hollocks 2001). Therefore, manual efforts are 17

32 CHAPTER 3 FRAME OF REFERENCE still very common in data collection to avoid the significant investment costs of the more efficient continuous collection systems (see below). Several different methods are available when gathering data points manually. The most common are (Robertson and Perera 2002; Banks, Carson, and Nelson 1996; Pegden, Shannon, and Sadowski 1995; Williams 1994): Video recording Interviews Collecting data using a stopwatch Collection of individual domain knowledge from production engineers Plant specification and design documents Recording forms Etc. The choice of method depends on circumstances in each specific production system. Such circumstances can for example be: how often the events of interest occur, operators availability, and union agreements. However, video recording is often more reliable than real-time observations and is therefore preferred if possible (Banks, Carson, and Nelson 1996). Another general recommendation is to combine different sources to increase the data credibility (Pegden, Shannon, and Sadowski 1995) CONTINUOUS DATA COLLECTION There are mainly two types of systems for continuous collection of raw data necessary for DES models: systems dependent on operator involvement, and completely automated solutions. In this case, continuous means that data are collected at any time and, thus, collection is not just initiated to provide data for a specific simulation project. It should also be mentioned that the continuous data collection systems are seldom implemented solely for simulation purposes; they rather originate in the needs of the maintenance organization. Systems dependent on manual involvement typically consist of computer terminals (Kleindienst and Juricic 2007) where operators record timestamps for breakdowns, set-ups and similar events. A recent trend within this area is data and information collection using Wiki-based software solutions (Dungan and Heavey 2010). Wikis are good for gathering knowledge from many individuals within an organization, but are stronger in collecting information than the type of detailed raw data necessary for DES modeling. The bottom line is that collection systems including human involvement generally have lower investment costs than completely automated solutions, but the quality of data can be limited if people forget to record events or just incorrectly recall the timestamps before entering them at the terminal. Automated collection systems (ACS) on the other hand are better suited to collect large amounts of data with consistent quality (Ingemansson, Ylipää, and Bolmsjö 2005). Erroneously logged data points do exist due to communication problems, but they are possible to detect and correct using clever data processing functionality (RQ2) (Zaum, Olbrich, and Barke 2008). Common data collection systems in this category are based on the timestamps registered by PLCs (Programmable Logic Controllers) or machine clients, for example supervised by a SCADA (Supervisory Control And Data Acquisition) system (Kühn 2006). The most common standard for communication between such devices and systems is OPC (OLE for Process Control) (OPC Foundation 2010) with MTConnect (MTConnect Institute 2010) as an alternative. 18

33 CHAPTER 3 FRAME OF REFERENCE 3.3 DATA PROCESSING A crucial part of this thesis aims to describe and improve the process of transforming operational data from the shop floor (raw data) to information for simulation models. Davenport and Prusak (1998) think that this process implies adding value to the data; see for example the definitions of data and information in section 3.1. The authors specifically mention five important activities (Table 5) necessary to accomplish this increment in value. They also mention the possibility to automate parts of the transformation process by means of computerized solutions (RQ2), but state that it is difficult to completely omit human involvement. Especially the steps of contextualization and categorization are complicated to automate. Contextualization Categorization Correction Calculation Condensation Table 5: Activities required for transforming data to information (Davenport and Prusak 1998). Knowledge about what purpose the data were collected for Knowledge about units of analysis or key components of the data Removal of errors from the data Mathematical calculations or statistical analysis of the data Summarizing of the data in a more concise form In input data management for DES, the context of raw data is ideally added when the collection system or gathering procedures are designed. In these cases, the purposes of the data and the collection process are well explained. However, in numerous real-world situations, context is not added until the raw data are reviewed in connection with the establishment of a conceptual model (Van der Zee and Van der Vorst 2007). Categorization is also, in ideal cases, added before or even during actual collection of raw data. But the lack of well-structured databases, containing DES data, usually implies additional work of understanding and grouping raw data (Perera and Liyanage 2000). Quantitative DES data are often categorized with regard to specific production resources, e.g. processing times are distributed to the different machines included in the model. Moreover, correction of data implies, for example, the removal of erroneously logged data samples due to communication problems. For instance, it is common to disregard very small time-stamps (e.g. below 30 seconds) from automatically collected breakdown logs, especially when modeling automated work stations (Alexandersson and Wirf 2001). Calculations are frequently needed in order to obtain simulation parameters such as the MTBF, which requires the subtraction of one stop time with the previous one. Finally, condensation (or input modeling (Leemis 2004)) is normally done using statistical or empirical distributions for parameters including variability (Robinson 2004). This specifically complex and time-consuming task will be further explained in sections below DATA REPRESENTATIONS The first decision to make when supplying an input data set to a simulation model is what kind of representation to use. Statistical or empirical distributions are most common because they condense the data set to a convenient size (Robinson 2004). Empirical distributions work basically like mathematical descriptions of histograms, categorizing the samples in different intervals. Consequently, this approach requires more space for conveying information than do statistical distributions. Statistical distributions condense all data to a distribution family name and a set of parameters, usually two when dealing with continuous and univariate distributions. 19

34 CHAPTER 3 FRAME OF REFERENCE Another type of representation is to keep the original data points as traces or bootstraps (Robinson 2004) (see Table 6). A trace is data samples listed in the order of collection and, thus, the simulation model reads them successively from top down. A bootstrap takes a list of empirical samples and randomly reorganizes them before presentation to the simulation model. Traces, bootstraps and empirical distributions have the advantage of keeping the data close to their original form, which guarantees a realistic model behavior. However, in this thesis statistical distributions are preferred because their ability to extend the time span of data sets to include production behavior not specifically observed during data collection (Robinson 2004). In addition, the convenient format is advantageous to include in standardized as well as customized model interfaces. Table 6: Examples of traces and bootstraps as alternatives to statistical distributions. Breakdown time-stamp (hh:mm) Time between failures Trace (min) Time between failures Bootstrap 1 (min) Time between failures Bootstrap 2 (min) 08:41 N/A N/A N/A 09: : : : : : INPUT MODELING Input modeling, i.e. the condensation of raw data to suitable representations, includes several steps and relatively complex statistical calculations. This, in combination with the manual steps in data correction and calculation, is a major reason for the extensive time-consumption in input data management. There are many research contributions presenting methods to support the input modeling process but, they focus on the mathematical calculations behind, and therefore mainly aim to increase the quality of data representations; see for example Leemis (2004). As indicated above, the input modeling procedure starts with data that are already corrected and calculated. Consequently, the data are available as a number of data points, for instance representing samples from a machine s processing time. Going from these data points to a statistical distribution is a process that includes a number of steps depending on different authors divisions of tasks. However, all authors have the same opinion of how the process should be performed, and the most common description includes four steps (Banks, Carson, and Nelson 1996; Pegden, Shannon, and Sadowski 1995; Leemis 2004): 1. Evaluating the basic characteristics of the empirical data set. 2. Select distribution families for evaluation. 3. Select the best-fitting parameter values for all chosen distribution families. 4. Determine the goodness-of-fit and select the best distribution. Depending on whether the process will be executed with or without computer calculation support, one or more distributions can be evaluated. Manual calculations are complex and, thus, only one distribution family is usually selected in step two. Using computer support, there is an opportunity to compare several distributions to each other and probably reach closer to the optimal selection. 20

35 CHAPTER 3 FRAME OF REFERENCE Evaluating the Basic Characteristics Firstly, the data set needs to be evaluated with regard to sample independence, to make sure that there is no systematic change in the data during the collection period. For example, if a learning curve effect is present for the processing time at a manual assembly station, a strictly random representation is inappropriate. Instead the learning effect has to be identified and separately modeled as a known variable. There are several methods to assess sample independence (Leemis 2004). Among the graphical solutions, the scatter-plot (Figure 8) is most widely used, which means that the data points are plotted in the order of collection. The data set can be considered independent if no tendency is identified as a function of time (cloud-shaped scatter-plot). For more detailed evaluations and increased possibilities for automation, it is also possible to test the data set mathematically, e.g. by linear regression (Leemis 2004). Figure 8: Example of a scatter-plot Distribution Families After ensuring the data with regard to independence, a suitable distribution family should be selected for further evaluation, given that statistical representation is desired. In a manual approach, the distribution family is chosen based on the nature of sampled data. Using the calculation capacity of automated solutions, more families could be evaluated, which is advantageous since many families have similar properties. This thesis uses mostly the following five continuous and univariate distributions, which are considered sufficiently accurate for most industrial DES projects and also represented in all commercial DES software packages (Law 2007): Exponential Usually used for the time between customer arrivals in service systems or for time between failures (TBF) of production equipment. Good assumption for TBF if only the mean value is known. LogNormal Time to complete a task, e.g. time to repair (TTR). Weibull Time to complete a task, e.g. processing times at a manual work station. Gamma Time to complete a task, similar to LogNormal and Weibull. Triangular Requires min, mean and max or min, mode and max. Good approximation for category C data (Robinson and Bahtia 1995). 21

36 CHAPTER 3 FRAME OF REFERENCE Maximum Likelihood Estimation The third step in the input modeling procedure, outlined in section 3.3.2, is to estimate the input parameters for the selected distribution families. In practice, this estimation is often done by simple calculations based on the sample mean and variance and the relationship between distribution mean and variance and its parameters. In a gamma distribution, for example: µ = kθ σ = sqrt(kθ 2 ) where µ is the sample mean, σ is the sample standard deviation, k is the scale parameter and θ the shape parameter. However, solving this equation system to obtain the scale and shape values seldom identifies the best-fitting distribution, especially when the empirical data do not exactly fit a given distribution. Instead, due to desirable statistical properties, the method of maximum likelihood or maximum likelihood estimation (MLE) is preferable. The definition of the likelihood function is (Montgomery and Runger 1999): L( ) f ( x1; ) f ( x2; )... f ( xn; ) where: is the unknown parameter (or vector of parameters) f is the probability density function x 1, x 2,, x n are the observed values The maximum likelihood estimator of is the value of which maximizes the likelihood function L ( ) Goodness-of-fit Tests When an appropriate distribution family is selected and associated with a set of estimated parameters, the complete distribution must be evaluated by comparing its conformity with the empirical data. In a manual input modeling procedure, this is usually done by comparing a plot of the distribution to a histogram of the empirical data. However, such graphical comparison is problematic to quality-assure and also unsuitable for automation. Instead, there are several types of statistical goodness-of-fit tests available, and one of them is called Kolmogorov-Smirnov (KS) (Law 2007). The KS test calculates the maximum distance between the empirical and the fitted Cumulative Distribution Functions (CDF), which is applied for automatically selecting the best-fitting distribution in the demonstrator used for answering RQ2. 22

37 CHAPTER 3 FRAME OF REFERENCE Figure 9: KS tests use the difference between the empirical and fitted distributions to evaluate their compliance. The empirical cumulative distribution function is the likelihood of finding a sample smaller than or equal to a given value: i 1 F n n ( x) P( X x) i n just before x = X i just after x = X i where n is the number of samples and 1 i n Moreover, the fitted cumulative distribution function is: F( x) P( X x x) 0 f ( y) dy where f(y) is the density function for the selected distribution. Further, the KS test statistic (D) is the maximum distance between the two functions, calculated at all points of X i: D max( F( X 1in i i 1 ), n i n F( X i )) D can later be used to directly compare selected distributions to each other, simply by stating that a smaller D indicates a better fit, which is implemented as default functionality within the demonstrator (the GDM-Tool) for RQ2. However, D can also be used to statistically evaluate the fit of a specific choice by calculating the p-value in a hypothesis test with a null hypothesis stating that the empirical samples come from the distribution under evaluation (Montgomery and Runger 1999, Law 2007). It is important to mention that there are also several other goodness-of-fit tests available, for example Anderson-Darling and Chi-Square tests (Law 2007). However, Chi-Square tests are troublesome when selecting the number and size of histogram intervals for comparing the 23

38 CHAPTER 3 FRAME OF REFERENCE empirical data with a statistical distribution. This is specifically problematic for automated input data management. Anderson-Darling tests are better in that sense but incorporate more complex calculations than Kolmogorov-Smirnov tests. They also focus more on comparing the tails of distributions rather than on the most frequent events of production resources. Therefore, Kolmogorov-Smirnov tests are considered most appropriate for automated comparisons in this thesis (RQ2) SOFTWARE SUPPORT IN DATA PROCESSING The reader can imagine that performing all steps described above for data processing is a complex task requiring competence, experience and time. This is a major reason for the fact that simulation practitioners often select a distribution family by guesswork and estimate the related parameters by experience; see the argumentation in the introduction to this thesis (Chapter 1). Fortunately, it is common that companies experienced in DES take advantage of available software solutions. Note, however, that manual involvement is still required to link the tools as long as the complete chain is not completely integrated as proposed in this thesis. One common example is MS Excel, alone or in combination with Visual Basic macros, which is used for data categorization, correction and calculations (Kumar and Nottestad 2009). Specialpurpose software solutions are frequently preferred for data condensation because of their convenient interface for simulation users and their ability to provide programming code ready to use in commercial DES packages (Kumar and Nottestad 2009). ExpertFit (Law and McComas 2003) and Stat::Fit (Geer Mountain Software Corporation 2011) are two examples of such programs. Data condensation can also be performed using mathematics and statistics software applications with a wider area of use, e.g. Matlab, Minitab and SPSS. Finally, some DES packages also provide partial support for data processing, mainly in the condensation step. 3.4 DATA INTERFACES AND STANDARDS Information management problems, i.e. the supply of processed data to simulation models, affect many aspects of manufacturing operations (Gallaher, O Connor, and Phelps 2002). They are a particular hindrance to the creation and reuse of manufacturing simulations. For example, Robertson and Perera (2002) state that current industrial procedures often include customized spreadsheet interfaces for the supply of information to simulation models. In some companies the information is even supplied by manually typing it into the model code; see further explanations in section 3.6. Both solutions have proved to be time-consuming when updating simulation models with recent data and when setting up various kinds of experimental designs. For establishing procedures including automated input data management, a standardized format for supplying the data to simulation models is highly desirable CMSD One such standardized format is titled CMSD (Core Manufacturing Simulation Data) which was developed by NIST, in collaboration with Chalmers University of Technology, other universities, and industrial partners. The outcome resulted in a standard, launched in September 2010, which follows the guidelines, policies, and procedures of the Simulation Interoperability and Standards Organization (SISO) (SISO 2010, 2011). The CMSD specification describes a CMSD information model using the Unified Modeling Language (UML) (UML Resource Page 2009). The primary objective of this information model is to provide a data specification for efficient exchange of manufacturing life-cycle data in a 24

39 CHAPTER 3 FRAME OF REFERENCE simulation environment. The CMSD objective aims to: foster the development and use of simulations in manufacturing operations; facilitate data exchange between simulation and other manufacturing software applications; enable and facilitate better testing and evaluation of manufacturing software; and increase manufacturing application interoperability. It is important to state that CMSD covers the representation of input data to simulation models and other engineering applications with similar data needs. Thus, logical relations describing model behavior and representation of output data are not covered. Such data and information have to be handled separately by the model builder, possibly using other standards or information models such as SysML (Huang, Ramamurthy, and McGinnis 2007). CMSD covers the following data categories: Resource information describes the people and equipment that perform activities. Order information specifies an external request to the manufacturing enterprise. Calendar information specifies time periods when production is and is not ongoing. Skill definition information describes the skills and proficiency levels a resource has. Set-up definition describes time to configure a resource, and to change configuration. Part information specifies materials, subcomponents, and end product. Bill-of-materials information specifies the subcomponent parts and quantities. Process plan information specifies production activities needed to make products. Maintenance plan information specifies maintenance processes for a resource. Job information specifies an internal request for production activities to take place. Schedule information specifies a time-plan for production activities. Distribution information specifies statistical distributions. Layout information specifies spatial data and relationships between resources. Several test implementations in industrial case studies have been performed to prove that CMSD is a feasible neutral format and, thus, compatible with numerous simulation software packages (Johansson et al. 2008, Johansson et al. 2009, Kibira and Leong 2010). These studies show that CMSD is a good alternative when handling solely DES data, for example as a link between data processing applications and simulation applications. However, the standard lacks evidence for supporting data exchange between a wider range of engineering tools, e.g. to enable data exchange between manufacturing process preparation tools and simulations of material flows in production STEP AP214 Another standard which the author has been in contact with in the research project DFBB (see section 2.1) is STEP AP214 ISO In contrast to CMSD, this standard does not originate from the DES application area but aims to demonstrate and extend its capabilities for dynamic simulations. This standard mainly stems from the needs of mechanical products within the automotive industry and is frequently applied for the representation of information such as geometry and kinematics (Kjellberg et al. 2009). STEP AP214 has previously been evaluated with regard to the use within production systems applications, and for the handling of process data such as production sequences and resource capabilities (Falkman et al. 2008). However, until recently, STEP AP214 has not been utilized for describing data on the detailed level as required for DES, but the DFBB project has shown initial progress in stochastic representations of breakdown patterns and processing times for DES models. 25

40 CHAPTER 3 FRAME OF REFERENCE AUTOMATIONML AutomationML is also a standardization effort aiming to connect different engineering applications in the design and development of production systems. Using several existing standards, AutomationML comprises information about factory topology, geometry, kinematics, and logics (sequencing, behavior and control). The main format used in AutomationML is CAEX, but COLLADA is also frequently applied for geometry and kinematics data. In addition, any other data standard may be used if the recommended formats do not suffice. For example, CMSD was used to carry DES data in a brief demonstration within the DFBB project. In such cases, a pointer to a separate document is linked via the CAEX file (AutomationML consortium 2010). Thanks to the diversity in allowed data formats, there is an extensive flexibility in using AutomationML but there is not such an established data model behind as compared to STEP AP214. In correspondence to STEP AP214, AutomationML does not originate from the area of production flow simulation. Instead, it has the capability of storing and carrying data for a wider range of applications. However, a pre-study within the DFBB project demonstrated the possibility to carry breakdown data and processing times to a DES model built in ARENA, but there is no demonstration of the possibility to represent the entire set of data needed in DES models. The bottom line is that both STEP AP214 and AutomationML hold the potential to serve as a link between DES and more engineering applications than for example CMSD. More research is necessary to do this, which makes CMSD more appropriate (for this thesis) for the time being. 3.5 THE INPUT DATA MANAGEMENT PROCEDURE The following section considers all three parts of the input data management: collection, processing and interfacing. Existing methods for increasing efficiency are reviewed and possible pitfalls are highlighted. This information is important in order to focus on the activities having highest impact on time-consumption during input data management when designing new efficient solutions as in RQ METHODS FOR RAPID INPUT DATA MANAGEMENT According to several previous publications (e.g. Perera and Liyanage 2000, Lehtonen and Seppälä 1997), there is an increasing need for systematic approaches and documented procedures of input data management due to the substantial number of non-experts using simulation. This is also a prerequisite for a wider dissemination of DES, especially among Small and Medium Enterprises (SME), which is one of the driving forces behind the state-of-the-art description in RQ1. One contribution, using a systematic approach, is a methodology based on the Integrated computer-aided manufacturing DEFinition (IDEF) (Perera and Liyanage 2000). The methodology focuses mainly on reducing required time for identification of parameters to include in simulation models. After investigating the production system, a functional model is built using pre-developed IDEF constructs. The functional model can be compared to a conceptual model, which is a more common terminology; see Van der Zee and Van der Vorst (2007) for more information on conceptual modeling. Thereafter, a required entity model is generated, which can be translated into a relational database, providing the model builder with a structure to follow during data collection and for data storage. The generation of the entity model from the functional model is done using a mapping table like Table 7. 26

41 CHAPTER 3 FRAME OF REFERENCE Table 7: Table mapping the relations between functional modeling elements, required input data and model entities. Functional modeling element Required data 27 Corresponding RM entity Part Part ID PART Part description PART Batch size MACHINE_OPERATION Max batches MACHINE_OPERATION Inter arrival time MACHINE_OPERATION Machine Machine ID MACHINE Machine description MACHINE MTBF MACHINE_GROUP MTTR MACHINE_GROUP Input buffer capacity MACHINE_GROUP Output buffer capacity MACHINE_GROUP Operator Operator ID OPERATOR Operator description OPERATOR Efficiency OPERATOR Skills OPERATOR Learning curve effect OPERATOR Another approach named controllability analysis (CA) has been used to increase efficiency in problem definition and data management phases of simulation projects (Lehtonen and Seppälä 1997). CA is an iterative approach intended to focus only on relevant aspects of the problem to solve. At each aggregation level, the aspect of major relevance is focused upon and further analyzed in order to pinpoint the most important factors with regard to project objectives. This structured methodology identifies important parameters, and facilitates the data management process by minimizing collection of data irrelevant for solving the problem. The advantage of these two methodologies is that they focus on specific and relevant problems of input data management. However, they merely address delimited steps during the data input procedure, which is insufficient when considering the entire chain of data input activities using a holistic approach. Instead, Bernhard and Wenzel (2005) propose a methodology covering more aspects of the input data management process. This publication proposes an eight-step methodology based on the knowledge and experiences from a cross-disciplinary team with background in data acquisition, statistics and visualization. The identified steps are: Goal setting, Information identification, Preparation of collection, Collection, Data recording, Data structuring, Data analysis, and Validation CRUCIAL ACTIVITIES WITH REGARD TO THE TIME-CONSUMPTION All methods presented in aim to reduce the impact of known difficulties in the input data management process. These difficulties considered to have highest impact on the total timeconsumption are (Perera and Liyanage 2000): 1. Poor data availability 2. High-level model details 3. Difficulty in identifying available data sources 4. Complexity of the system under investigation 5. Lack of clear objective 6. Limited facilities in simulation software to organize and manipulate input data 7. Wrong problem definition There are no other publications listing pitfalls in input data management as comprehensively as the list above, but other authors agree and point out separate objects. For example, Moon and Phatak (2005) argue that the high level of model detail requires many data parameters and

42 CHAPTER 3 FRAME OF REFERENCE samples. They also state that there is a lack of data sources containing data suitable for DES. Consequently, substantial efforts in additional data gathering are required. 3.6 AUTOMATED INPUT DATA MANAGEMENT Different approaches to input data management use various levels of automation (Robertson and Perera 2002), which is very important for the design and verification of the proposed concept for automated input data management (RQ2). Figure 10 shows four alternatives ranging from an entirely manual work procedure to a completely automated link between existing data sources and simulation models. Figure 10: Four approaches to input data management using various levels of automation (Robertson and Perera 2002). 28

43 CHAPTER 3 FRAME OF REFERENCE In summary, Robertson and Perera (2002) states that methodologies a and b are most present in industry but hypothesize that organizations will strive towards methodologies c and d. The main reasons for this are that it would increase the data accuracy and reliability whilst also minimizing the efforts during the entire input data management chain METHODOLOGIES A & B Firstly, the solution with lowest level of automation (methodology a in Figure 10) implies that the model builder, or other members of the project team, manually collects the raw data needed from the appropriate data sources. This can include manual extraction of category A data as well as measurements and interviews to obtain category B and C data. After manual analysis and transformation to information, the results are manually typed into the model code where they also finally reside. This approach is easy to follow and involves a continuous validation of input data. On the other hand, it takes an extensive amount of time just to go through the process once, and due to its inflexibility, the time-consumption grows even more significant when the system changes and the data need to be updated. The second methodology (b) is equivalent to the first one for the collection of raw data and transformation of data to information. However, there is a difference in how the results are supplied to the simulation model. In methodology b, the information is presented in an intermediary spreadsheet and automatically imported to the model. In this way the flexibility increases as it becomes easier to update, change and experiment with the information than in methodology a. Still, the disadvantage is that the collection and transformation of data to information rely on manual efforts. However, the separation of model and information, which enable use of the model among people unfamiliar with model building, makes this solution most popular in industry. Both methodologies a and b involve significant manual work since no software support is integrated in the data storage systems. Thus, data processing is often manually performed, as described in section 3.3, sometimes supported by separate data analysis software packages METHODOLOGY C In the third alternative, methodology c, the simulation model utilizes an off-line intermediary simulation database that is connected to the CBS and automatically retrieves and stores recent data for the simulation model. Moreover, the intermediary simulation database is connected to the simulation model and, thus, the supply of information to the model is also automated. In this way, the time-consumption for collecting and transforming data can be dramatically reduced and the flexibility for changes and updates is still present. Despite all advantages only one realworld case is known by Robertson and Perera (2002), but there are also some efforts on this methodology published as results from other research projects. One example is published by Randell and Bolmsjö (2001) where they fed a simulation model with information from an ERPsystem via an intermediary simulation SQL (Structured Query Language) database. Another concept exemplifying methodology c is named Manufacturing Data Acquisition (MDA), which incorporates both the collection and some initial processing of raw data from production resources (Aufenanger, Blecken, and Laroque 2010). These features enable combination of data samples and give a good consistency in data formats. However, the concept requires that all related technical solutions are implemented consistently, already in the collection of raw data. 29

44 CHAPTER 3 FRAME OF REFERENCE METHODOLOGY D Finally, methodology d implies that the transfer of simulation information from the CBS to the simulation model is fully automated. This reduces the time-consumption dramatically since human involvement is needed neither for data collection and transformation nor for importing information to the model. The major drawback is the lack of available data (Moon and Phatak 2005, Robertson and Perera 2002) already prepared for DES purposes in major CBS applications. One example of methodology d is often referred to as the digital factory. This approach aims to connect data from different tools used throughout the entire product and production engineering process. There are basically two possible solutions to enable this integration: 1. Use a commercial PLM package, such as Siemens Teamcenter (Siemens 2011) or Delmia (Dassault Systemes 2011). 2. Connect all the individual engineering tools, and other necessary sources within the CBS, to each other using neutral formats or customized scripts. The first solution presupposes that all engineering tools are selected from the same software vendor, which has proved difficult due to the diversity of tools with strengths in various parts of the product and production realization process (Semini, Fauske and Strandhagen 2006). The second solution, using neutral formats, allows the use of a variety of engineering tools and is therefore more promising if the aim is to cover the entire process. However, substantial research and development is required to provide this link, without relying on expensive customized scripts (Kühn 2006). 3.7 SOFTWARE ARCHITECTURE FOR GENERIC DATA MANAGEMENT The previous sections in this frame of reference state that the input data management process includes several activities and requires multiple data operations, e.g. for data cleansing, calculations and statistical input modeling. Major reasons are the extensive data requirements for DES (section 3.1.1), the variety in level of automation, and the diversity and evolution of data sources (section 3.2). Consequently, computer applications aimed to support the complete chain of activities have to provide a variety of features (data operations) and the possibility of customization and rapid adaption to changes in data sources. A plug-in-based architecture (Balderud and Olofsson 2008) satisfies the demands above by adding well-delimited functionality in specific software components. The architecture enables developers to update, add or remove functionality continuously without affecting the main application or the data storage. Hence, development or configuration can be performed based on organizational context and passed on to third-party developers if desired. These qualities are highly valuable for the iterative application development in the multiple-case-study approach in this research. Plug-ins usually operate by using services provided by the main application and by communicating with a managing component keeping track of registered plug-ins and displaying them in the user interface (Balderud and Olofsson 2008); see Figure 11. The described architecture originates in the Command Pattern (Gamma et al. 1995) encapsulating requests (i.e. data operations) including user-defined inputs (not known by the software). Such input can be the specification of a data column (input 1) which will be converted to another user-defined data format (input 2). Another common application area for plug-ins is the add-ins used for customization of MS Office functionality. For example, plug-ins are used to create shortcuts to 30

45 CHAPTER 3 FRAME OF REFERENCE other software packages or to extend the number of calculation operations (e.g. for data analysis in MS Excel ). Default plug-ins Import method 1 Main application User interface Data operation 1 Data operation 2 Data operation 3 Plug-in management Export format 1 Custom Plug-ins Spec. operation Data storage Figure 11: Each plug-in performs a user-specified request. Plug-ins can be selected to suit the desired functionality at a specific company, and the structure also allows further development of needed functionality. 31

46 CHAPTER 3 FRAME OF REFERENCE 32

47 CHAPTER 4 RESULTS 4 RESULTS This chapter contains the research results of the two interrelated parts (named studies in Wilkinson 1991) presented in this thesis. The first part relates to RQ1 and the second to RQ2. After the results of each part, there is an interim discussion summarizing the findings with focus on one research question at a time. The interim discussions aim to connect the publication results to each other, to literature, and to the thesis purpose and aim. They also include discussions on the methods used. A general discussion, connecting the studies to each other, is provided in the next chapter. Below is the structure of this chapter on results: 4.1 Results related to RQ1 Publications I, II, III, and IV. 4.2 Interim discussion focusing on RQ Results related to RQ2 Publications V, VI, and an additional unpublished case study. 4.4 Interim discussion focusing on RQ INDUSTRIAL STATE-OF-THE-ART RQ1 This first part of the results chapter aims to answer RQ1: What is the industrial state-of-the-art in the input data management process? The purpose is to describe the necessary activities, how they are executed in industry, and the level of automation used to support the process. Publication I maps the necessary activities and describes best-practice work procedures for each activity. Publication II identifies the activities having highest impact on the total timeconsumption and, therefore, should be considered extra interesting to automate. Publication III presents current industrial approaches to automated input data management and investigates their dissemination. The last publication in part 1 (Publication IV) contributes to the description of future requirements on input data management, derived from the need for new input parameters when DES analyses are extended to include sustainability aspects PUBLICATION I - A Methodology for Input Data Management in Discrete Event Simulation Projects Knowing and mastering the important activities in input data management is crucial for enabling efficient simulation studies. This applies in procedures with significant manual involvement as well as in the requirement specification of automated support systems. Automation is of course a tempting and potent solution for reduced time-consumption, but all organizations are not ready to adopt such systems, e.g. due to inferior availability of category A data (see section 3.5.2). An additional aspect is that the number of non-specialists working with DES increases and, thus, easier and more comprehensive methodologies, such as practical guidelines, are highly desired (section 3.5.1). Objective and Contribution to RQ1 The objective of this publication is to map current practice in input data management and thereby compile a systematic best-practice methodology. Such methodology holds potential to enable more efficient and accurate input data management for simulation projects, for all levels of automation, except completely automated processes. The proposed methodology includes and describes all activities detected during an investigation of industrial DES projects, which is an important contribution to the answer of RQ1. In addition, the publication provides state-of-theart guidelines for the tasks executed in each activity. 33

48 CHAPTER 4 RESULTS Identify and define relevant parameters Specify accuracy requirements Identify available data Choose methods for gathering of not available data No Will all specified data be found? Yes Create data sheet Compile available data Gather not available data No No Prepare statistical or empirical representation Sufficient representation? Yes Validate data representations Validated? Yes Finish final documentation Figure 12: Activities in input data management, structured as a best-practice methodology. 34

49 CHAPTER 4 RESULTS Study Description The mapping of important data input activities was performed by evaluating 15 completed industrial simulation projects. The evaluation embraced pure industrial projects as well as projects including parties from both industry and academia. All plants simulated in the projects were located in Scandinavia, mainly in Sweden. Moreover, to obtain as general results as possible, the involved companies were selected to represent a variety of contextual factors such as organizational size, line of business, and previous experience in DES. Semi-structured interviews (section 2.3.1) were used to specify common input data activities, to collect information about their internal work procedures, and to identify main problems resulting in extensive time-consumption (topics are provided in the Appendix). Results and Conclusions In current industrial input data management for DES, practitioners perform thirteen distinct activities; see Figure 12. Further descriptions and guidelines supporting the tasks in each activity are provided in the appended publication. Represented as such a best-practice methodology, the activities fit well into the frequently cited works of Banks, Carson, and Nelson (1996), Law (2007), and Pegden, Shannon, and Sadowski (1995), all providing comprehensive methodologies for DES projects. In these methodologies, the input data management part represents a smaller portion of an entire project. That smaller portion is more thoroughly described in this publication. The authors suppose that the profit of using the methodology is most significant in organizations with limited experience of DES. The argument is that experienced organizations and simulation engineers continuously discover and document efficient working procedures in an iterative manner. However, there are no studies aimed to quantify the methodology s impact, so the main contribution thus far is the state-of-the-art description provided as a part of RQ PUBLICATION II - Mapping of Time-Consumption During Input Data Management Activities Practitioners and researchers jointly argue that high-quality input data is crucial in simulation studies. This, in combination with too few and insufficient methods and tools, contributes to the fact that input data management is one of the most time-consuming parts of simulation projects. Previous measurements and estimations claim that input data management consumes 10-40% of the total time in DES projects (see section 1.1). However, few studies have closely investigated the different data input activities for finding the primary causes and quantifying their individual time-consumption. Objective and Contribution to RQ1 This article presents an empirical mapping of current industrial work procedures for input data management. It continues the work of Publication I by assessing the time-consumption for each activity in the input data management process. The main objective is to identify the input data management activities having highest impact on the total time-consumption as a part of the answer to RQ1. By extension, this study may serve as a guideline for the design of IT support systems and other methodologies in future research on efficient data management. Additionally, the study summarizes the most common reasons for extensive time-consumption in input data management, as supplementary support in the development of such tools and methodologies. 35

50 CHAPTER 4 RESULTS Study Description The empirical mapping was performed by evaluating 15 completed industrial simulation projects, the same as in Publication I. Semi-structured (2.3.1) interviews enabled specification of common input data activities and identification of common problems resulting in extensive time-consumption. In addition, face-to-face questionnaires (section 2.3.2) were used to assess the time-consumption for each activity and to quantify the amount of available data (questions are provided in the Appendix). In order to compare the activities to each other, the timeconsumption for each activity was related to the time-consumption of the entire input data management process in the specific project. Using such relative comparison compensates for the difference in scope between the projects. Results and Conclusions Figure 13 shows that the three activities having highest impact on the total time-consumption are: collection of raw data, mapping of available data, and data analysis and preparation. From this result, it is argued that highly time-consuming activities should recieve specific attention when increasing efficiency in the complete input data management process. A detailed summary of the time-consumption for each activity in all 15 projects is available in Table 8. Data analysis and preparation 10% Data validation 7% Final documentation 5% Input data parameter identification 8% Accuracy requirement specification 2% Mapping of available data 10% Choice of gathering methods 4% Document creation 4% Data collection 50% Figure 13: The time-consumption of each individual data input activity. Furthermore, the study identifies that the two major reasons for problems which occurred during input data management were: substantial need for manual raw data gathering due to inferior data availability, and complex designs of computerized data sources. The latter slows down the identification of available data and indicates that sources containing raw data are generally not designed for simulation purposes. By extension, this is also an explanation to the limited availability of raw data. In addition, the article summarizes the time-consumption of the input data management process in all 15 projects and concludes that it accounts for on average 31% of the total project time. 36

51 CHAPTER 4 RESULTS Table 8: Data table displaying the time-consumption during data input activities in all projects evaluated in Publication II. 37

52 CHAPTER 4 RESULTS PUBLICATION III - Input Data Management for Simulation - Industrial Practices and Future Trends. Automation of data input activities is naturally one of the solutions to the extensive timeconsumption described in Publication II. Reduction of the human involvement can be achieved by addressing different parts of the input data management process, i.e. the raw data collection, the data processing and the supply of information to the simulation model. In the most automated approach, the simulation model is totally integrated to all necessary data sources, either to major business systems (e.g. ERP) or to other tools used during the product and production engineering process (e.g. a PLM environment); see section However, issues like interoperability problems and limited data availability make additional, less automated, solutions necessary. A compilation of different approaches is provided by Robertson and Perera (2002), which is used as a starting point for this publication. Objective and Contribution to RQ1 The aim of this paper is to map the current industrial practice in input data management with regard to the level of automation. It is intended to be an update of a previous publication (Robertson and Perera 2002) and, therefore, uses the defined approaches therein. This article forms a part of the industrial state-of-the-art description (RQ1) by completing the findings about data input activities from Publications I and II with information about applied support systems. Study Description This publication presents the results of a survey performed during the WSC WSC is one of the world s major forums for DES specialists representing industry, academia and government. A questionnaire was distributed to all participants and the industrial representatives were asked to answer 12 questions (see Appendix) about the simulation procedures at their specific companies, mainly focused on input data management. Researchers with close connection to industry (a recent case study) were also asked to complete the form with information obtained at the case study company. Reminders were sent out by containing a link to a webquestionnaire (exact copy of the original form). Answers from 86 companies were collected, including different business areas such as: manufacturing (35 responses), logistics, health care and military applications. Data were analyzed using descriptive statistics to show how many companies used the different approaches to automated input data management (section 3.6). Results and Conclusions The questionnaire responses show that DES is used on a regular basis, with frequent reuse of models, by 65% of the participating companies. 8% have even integrated DES in their business process as a mandatory tool in major development projects. These are fairly decent figures compared to previous literature. However, the reader should keep in mind that few SME visit WSC; such enterprises are therefore most likely under-represented due to study delimitations. Taking a closer look at the input data management procedure, there is an obvious lack of structured approaches as well as continuous collection of raw data. Among the manufacturing companies, 63% do not even use support of checklists, templates or documented guidelines to increase efficiency of the data input activities. Furthermore, many companies have computerized systems as their main source of data, but Figure 14 also shows that the diversity of sources to compile all necessary data is extensive. All these factors indicate the need of structured 38

53 CHAPTER 4 RESULTS approaches to input data management, and also imply that suggested solutions for automated input data management should support import of data from several sources. Figure 14: The use of different data sources in manufacturing industry. Regarding the use of automated solutions during the entire input data management procedure, approximately 20% of all companies use one of the completely automated approaches (17% methodology c and 3% methodology d in the manufacturing industry); see section 3.6. The most common approach is still methodology b, using a spreadsheet interface automatically connected to the DES model but relying on significant manual work during data collection and processing; see Figure 15. The same figure also shows that many companies desire and foresee an increased use of more automated solutions in ten years. Figure 15: Current methodology and level of automation in input data management among manufacturing companies. The results of Publication III clearly show progress in the use of automated solutions for input data management in DES during the last decade. In the original study (Robertson and Perera 2002), just a few companies reported implementations of the more automated methodologies c and d, including database connections. Today, around 20% of the companies have automated connections to their simulation data sources, most of them using an intermediary database. The rest of the companies rely mostly on manual input data management, and still strive to reach a higher level of automation. However, the lack of simulation data in their CBS and insufficient support in automated data processing are reported as two major hindrances. More research and development in this area would probably increase the use of simulation on a regular basis. 39

54 CHAPTER 4 RESULTS PUBLICATION IV - Data Requirements and Representation for Simulation of Energy Consumption in Production Systems. Sustainability thinking is nowadays a natural part of production systems development, and there are numerous research contributions addressing detailed technological applications as well as improvements on a system-level. However, there is often higher potential in the latter, for example by eliminating non-value-added activities in order to reduce energy consumption (Cao, Chou, and Cheng 2009). In other words, waiting times should be carefully analyzed in order to minimize the effects of balancing and system losses. DES is a powerful tool for such analyses and its application area has therefore recently been extended from focus on economic aspects to include ecological sustainability (Chapter 1). This shift introduces requirements on new input parameters in simulation models and, thus, is likely to drive changes in the input data management process. Yet few previous research contributions (Solding, Petku, and Mardan 2009; Solding, Thollander, and Moore 2009) have closely investigated the new requirements when handling data parameters such as electrical power in DES models. Objective and Contribution to RQ1 The aim of this publication is to specify how electrical power should be represented as an input parameter in DES models. As stated above, new input parameters will probably introduce additional activities in the input data management process or, at least, changes in the conventional work procedure. This publication is therefore interesting for the description of input data management activities in RQ1, especially when looking into the future of DES modeling. However, more research is required in this field and the reason for including the publication in this thesis, at this early stage, is to stimulate further research and to prepare support tools and input data management procedure to these novel requirements. Study Description This case study is performed at an automotive company and includes measurements of the electrical power utilization for five multi-operational tooling machines. All five machines perform milling operations in a production line for engine components. The power utilization is measured with a frequency of 1 Hz on the incoming three-phase connection. Thus, all functions of the machines were measured as one unit, including contributions from major machine systems (e.g. machine spindle) as well as from peripheral functions such as lights, control system and pumps. In total, more than samples were collected. After collecting the data, all samples were assigned to one of the following four machine states: busy, idle, down, and stand-by. Descriptive statistics were calculated for all four machine states, both for the individual samples within a cycle and for the calculated average power utilization of cycles; see Figure 16. The descriptive statistics reported below are the average values and the standard deviations as a measurement of variability. Variability is a key factor for collection and representation of DES parameters, since highly variable parameters have to be stochastically represented using statistical distributions or similar approaches. Consequently, such parameters require more data samples, compared to those represented only by mean values. 40

55 CHAPTER 4 RESULTS Figure 16: Illustration of machine state cycles and their corresponding variations in power levels. Results and Conclusions The results show that the standard deviation, for the average power utilization in busy cycles, ranges from 1 to 2% of the power levels for the five machines included in this study. For idle and down cycles, the same values are 9% and 1% respectively (average standard deviations for all five machines); see Table 9 and Table 10. Looking further into the busy state, the variability between product cycles is even smaller when also considering the product variant a known factor (1.5%). These numbers show that the variability between product cycles is limited and unnecessary to include in conventional DES modeling, even though DES models are dynamic. The reason is that the variability in processing time has considerably higher impact on the final energy consumption. This means that the collection of raw data describing power utilization can be limited to a few samples, just enough to calculate a credible mean value. However, if DES models are used or will be used to analyze the environmental footprint, using the individual product as the unit of analysis, electrical power probably needs to be considered a stochastic variable. This must be evaluated in future research. Table 9: Average power utilization per machine, distributed between the different machine state cycles. Machine PiB (kw) PiB (V1) PiB (V2) PiI (kw) PiD (kw) PiS (kw) OP OP 30_ OP 30_ OP 40_ n/a n/a n/a OP 40_ n/a Average Table 10: Standard deviations for the average power utilization between individual machine state cycles. Machine SbBc (kw) SbBc (V1) SbBc (V2) SbIc (kw) SbDc (kw) SbSc (kw) OP n/a OP 30_ n/a n/a OP 30_ n/a OP 40_ n/a n/a 0.37 n/a n/a OP 40_ n/a Average n/a In Table 9, PiB = Power in Busy; PiB (V1 & V2) = Power in Busy given one of the two product variants; PiI = Power in Idle; PiD = Power in Down; PiS = Power in Stand-by. All numbers represent the average power utilization. In Table 10, SbBc = Standard Deviation (StDev) between Busy cycles; SbBC (V1 & V2) = StDev between Busy Cycles given one of the two product variants; SbIc = StDev between Idle cycles; SbDc = StDev between Down cycles; SbSc = StDev between Stand-by cycles. An additional result obtained in this study is that 33% of the total energy consumption for the five machines stems from non-production time. In other words, a substantial part of the energy 41

56 CHAPTER 4 RESULTS cost, and of the related environmental impact, stems from non-value-added time explained by balancing and system losses. Note that this result is specific to the particular production system and the time of measurements for this study. However, it is a strong indication that improvement of production flows is a very important area. 4.2 INTERIM DISCUSSION RQ1 This part of the thesis investigates the industrial state-of-the-art in input data management and the current level of automation applied in this process. Publication I identifies thirteen activities performed by industrial simulation practitioners to transform raw data from the shop floor to information for simulation models. Best-practice descriptions are included in the same publication. Further, Publication II identifies the three most time-consuming data input activities, which are important to focus on in order to reduce the time-consumption during input data management. Publication III shows that the level of automation is still limited in companies worldwide and that the significant manual involvement results in extensive time-consumption for keeping models up-to-date. In fact, the input data management procedure constitutes as much as 31% of the total time in DES projects, which is still comparable to research results reported almost 20 years ago (Trybula 1994). Thus, improved support systems, both computerized and manual, are important in order to increase the applicability of DES. Additionally, Publication IV initiates a more thorough investigation of how to represent environmental parameters, such as electrical power, in DES models. This is a crucial research area for the extended capabilities of DES towards sustainability analyses. At present, it seems that electrical power does not need stochastic representation for common-purpose DES models, but further studies are required. Future studies should include other manufacturing processes, environmental parameters, and model purposes DATA INPUT ACTIVITIES The 13-step state-of-the-art description provided in this thesis can be applied as a valuable methodology for increased rapidity and precision in input data management, especially for the increased number of non-specialists working with DES (Perera and Liyanage 2000, Lehtonen and Seppälä 1997). At present, the major part of literature on DES data covers separate elements of the input data management procedure. For example, publications aiming to improve the identification of relevant parameters (e.g. Perera and Liyanage (2000), Lehtonen and Seppälä (1997)) only address issues directly related to 8% of the time-consumption of the complete input data management process according to findings in Publication II. The description provided here is one of the few (Bernhard Wenzel (2005) is another) contributions addressing the entire chain of data input activities. In addition to the described state-of-the-art procedure, some common shortages were identified during the interviews and in the questionnaire responses. For example, little time and effort are spent on defining the accuracy requirements on the included parameters. This might result in too few samples collected for important parameters (quality issue) as well as too many samples for less crucial production resources (time-consumption issue). It is also obvious that simulation engineers often skip a separate data validation, which is likely to lead to late additional iterations to secure an acceptable data quality. A final shortage is the lack of sufficient tools for data processing and analysis (Perera and Liyanage 2000). There are special-purpose software solutions for statistical analysis of data (for example ExpertFit (Law and McComas 2003) or Stat::Fit (Geer Mountain Software Corporation 2011)) but they have limited capabilities to 42

57 CHAPTER 4 RESULTS categorize, correct and calculate the data. Additionally, these statistics applications require some repetitive manual work in order to feed the application with raw data and to supply the results to the simulation software. This is done for every analysis, which of course adds to the timeconsumption of input data management. The time-consumption analysis provided in Publication II presents results in line with previous research regarding the problems related to input data management. Data collection, identification of available data, and data analysis and preparation are identified as the most problematic and time-consuming data input activities. They conform well to three of the major pitfalls presented in literature (Perera and Liyanage 2000): poor data availability, difficulties in identifying available data sources, and limited facilities to organize and manipulate input data. However, this study adds new knowledge about the time-consumption related to separate data input activities and facilitates for quantification of the benefits expected from possible solutions. Such solutions should focus on support tools for data processing, and systems enabling reduction of the manual work during the entire input data management chain. It is also important to develop the computerized sources to meet the extensive raw data requirements of detailed production analysis tools, e.g. DES. At present, such systems are mainly designed for the logistic, financing and maintenance organizations (Moon and Phatak 2005). Additionally, more established data models defining simulation parameters (e.g. what is a processing time, MTBF and MTTR?) would facilitate data identification, collection, and processing as well as the interoperability between data systems and analysis tools LEVEL OF AUTOMATION IN INPUT DATA MANAGEMENT This thesis shows (Publication III) that there has been progress in the use of automated solutions to input data management during the last decade. Going from single pilot implementations around year 2000 (Robertson and Perera 2002), there is now one out of five companies using automated connections between computerized data sources and their simulation models. The most common solution among these companies includes an intermediary off-line database allowing data manipulation required to create what-if scenarios. The solution is also convenient for security reasons compared to a direct link to the CBS. However, around 80% of the companies still rely on extensive manual work in data collection and processing, and the link between the processed simulation information and the model typically consists of an MS Excel spreadsheet. This finding indicates that simulation projects are still often performed on a consultancy basis with limited use as a desk-top resource for production engineers. A very interesting additional finding from Publication III is the increasing need for data automatically extracted from external databases. In manufacturing applications, this is probably due to the fact that environmental analyses have been increasingly combined with DES. Such studies often include LCA data collected from external databases such as the European reference Life Cycle Database (ELCD) (Institute for Environment and Sustainability 2011) or EcoInvent (Swiss Centre for Life Cycle Inventories 2011). Publication IV is a first step towards a correct representation of environmental parameters in DES models. The results, showing that deterministic representations of electrical power are enough, indicate that automated connections to databases containing such environmental data seem relevant and important. The author is also involved in a project named EcoProIT (EcoProIT research project 2011) working with automated connections to LCA databases. 43

58 CHAPTER 4 RESULTS METHODOLOGICAL DISCUSSION The most obvious issue of a study, including interview and questionnaires, is the sample sizes and the number of respondents. In the study including 15 DES projects (Publications I and II), the number of samples is considered sufficient since the data collection was performed using face-to-face communication enabling in-depth understanding and attendant questions if necessary. In Publication III, the questionnaire was distributed to around 700 DES researchers and practitioners and 86 responses were collected. The questions were aimed to collect information about industrial business procedures, so it is likely that many people declining to submit an answer were researchers without close connections to industry. As a comment to the reference on Grounded Theory (Glaser and Strauss 1967) for interview analysis in Publications I and II, it should be clarified that the author has previous industrial experience in DES and input data management. This is important to declare when using an inductive approach since the experience, most probably, affects the coding of empirical data. Thus, the work procedure corresponds more to the interpretation of Grounded Theory later published by Strauss. He advocates a more pragmatic use of previous theoretical and practical knowledge than does Glaser. Data coding, data analysis and other knowledge should not be seen as distinct activities. Further, in Publication II, it is important to state that the intention is to identify time-consuming data input activities and to compare activities to each other. It is for instance inappropriate to use separate assessments and infer that exact time measurements are performed. One reason is that the times are not measured in real-time, but based on the team members perception and memory after the project is completed. Another reason why the assessments should be used for comparison rather than as absolute numbers is that the definitions of activities were initially somewhat vaguely described. However, the respondents did not find this problematic and no questions about activity delimitations arose. The fact that some activities are renamed between Publications I and II is also because the methodology and activity definitions evolved during the research process. A specific example is the activity called prepare statistical or empirical representation in Publication I and data analysis and preparation in Publication II CONNECTION TO RQ2 Despite solid state-of-the-art descriptions and systematic guidelines, it is difficult to drastically reduce the time-consumption during input data management. This is mainly due to the manual involvement required to collect data and to carry data and information between the sources and different processing applications (e.g. MS Excel and distribution-fitting software). Therefore, there is a significant potential in automating the data input activities identified for answering RQ1, especially the most time-consuming activities. Empirical data related to RQ1 show that data collection is the most time-consuming activity. Therefore it might seem natural to proceed with finding technical solutions supporting the collection of raw data in production systems. However, the root cause is that companies have not adopted existing technology for ACS, rather than a complete lack of such equipment including sensors and databases (Ingemansson, Ylipää, and Bolmsjö 2005). This thesis will therefore, from now on, focus on the processing of available raw data to information and the supply of information using standardized interfaces for simulation models. 44

59 CHAPTER 4 RESULTS This first part (RQ1) also shows that the data input activities consume 31% of the total time in an average DES project, which is a significant reason for the relatively low dissemination of DES in manufacturing industry. Thus, user-friendly best-practice descriptions and extended automated solutions are necessary in order to reduce systems and balancing losses, and by extension to increase efficiency in production systems. Some companies with limited access to category A data will prefer the systematic approach and other companies will implement automated solutions such as methodology c or d in Robertson and Perera (2002), hopefully influenced by the design specifications presented in the next part of this thesis (RQ2). 4.3 AUTOMATED INPUT DATA MANAGEMENT RQ2 From the discussion in section 4.2, it is obvious that manual involvement during data input activities results in extensive time-consumption. In turn, this fact hinders the dissemination of DES as a desk-top resource for production engineers trying to increase the equipment efficiency in production systems. Part 2 (RQ2) proposes, tests and validates an approach to input data management, which increases the level of automation significantly in comparison to the current industrial practice. Publication V outlines the necessary functionalities of such an approach, develops a demonstrator, and presents a first test case performed in the automotive industry. Publication VI presents a test case of the same approach and demonstrator performed in the aerospace industry and adds a few necessary functionalities. This chapter also includes an additional test case in the automotive industry, which is not yet published nor appended as a publication. The additional test case has the same purpose as Publication VI PUBLICATION V - Automated Input Data Management: Evaluation of a Concept for Reduced Time-Consumption in Discrete Event Simulation To increase the level of automation in input data management, previous research contributions have primarily suggested automated connections between simulation models and data sources within the CBS, such as ERP and MRP (Material Requirement Planning) systems. The problem is, though, that these major systems do not often include all required data for DES (section 3.6.3). Hence, there is a need for a solution that is also able to extract raw data from other sources of category A data. Examples of such sources are major CBS applications, legacy systems, and person-based spreadsheet solutions. Objective and Contribution to RQ2 The aim of this paper is to develop a concept, and an associated demonstrator, of automated input data management in simulation of material flows in production. An additional objective is to perform a first comparison of the time-consumption and data quality to a traditional industrial approach to input data management (a reference procedure). This paper addresses all aspects of RQ2 by identifying necessary functionalities of the concept and providing an initial evaluation of the time-consumption in the automotive industry. Note that the demonstrator, called the GDM-Tool, is not intended to reach the requirements of a commercial software solution. It is developed as a demonstrator for the proposed concept with the purpose of facilitating validation and presentation. 45

60 CHAPTER 4 RESULTS Study Description The research approach in this paper is to develop a software solution based on design criteria identified using an actor s approach during a case study in a Swedish automotive company. These design criteria are for example: the type of data structures to import, required data format to support and conform, and necessary data operations for converting the data to simulation information. All information was collected in project meetings, workshops and informal meetings during participation in the development of a simulation model at the company. The project team included process experts and simulation engineers from the company and from NIST in the USA, together with researchers from Swedish universities and institutes. After developing the demonstrator, the solution was tested as a first step towards validation. The test includes comparison of the time-consumption and data quality to a traditional industrial approach to input data management. This reference procedure consists of: manual raw data extraction; categorization, correction, and calculations using MS Excel ; and condensation using a commercial distribution-fitting software solution. The data quality was validated using hypothesis testing, with a level of significance equal to 95%, on the output results of the simulation model (section 2.5). Both the design and the testing of the software solution are delimited to comprise the data sources and the production process in the actual case study. The production line modeled in this case study consists of semi-automated assembly stations, and the simulation parameters included are: processing times, MTBF and MTTR. Results and Conclusions Figure 17 illustrates the proposed concept of automated input data management, including three major functions: data extraction, data processing (conversion) and output preparation. A key feature is the ability to extract data from several sources with different internal structures. When all raw data are imported, a series of operations is required to convert the data from a crude form into relevant simulation input. Such operations typically provide functionalities for conformation of data types, data filtering, calculations and condensation. Finally, to enable efficient data sharing, the final information is presented in CMSD format. However, other output options are also supported to avoid hindrance to the application of customized solutions. Figure 17: Overview of the proposed concept for automated input data management. The proposed method, and consequently also its demonstrator the GDM-Tool, is divided into two very central user activities: configuration and automation (see Figure 18). Configuration is required once to specify the sequence of operations for import and processing of data, and export of information to CMSD. Once this mapping is performed, data processing can be 46

61 CHAPTER 4 RESULTS repeated in automation mode without further efforts, as long as the modeled system remains unchanged. Continuously updated Data source A Required 1 time GDM-Tool config. mode Production Data source B Specification Data source C GDM-Tool automation mode Simulation model Request new data Figure 18: Illustration of the difference between configuration and automation modes in the GDM-Tool. Configuration is performed by applying a series of tools (area A in Figure 19) and the data can continuously be reviewed in the table view (area B in the same figure). The series of tools is stored as a configuration path (area C), which can automatically be repeated for obtaining updated data sets in automation mode (Figure 20). Figure 19: User interface developed to demonstrate the functionalities required for automated input data management. Automation mode is intended to be more frequently applied than configuration mode in order to gain advantage compared to the reference procedure. Every time the simulation engineer plans to run the model, he or she loads the previously specified configuration, specifies the location of the data sources (latest version) and selects a target for the output file. All steps along the configuration path will be executed when the user clicks the Run button, and the CMSD file will be updated with the most recent production data. 47

62 CHAPTER 4 RESULTS Figure 20: Dialog box for executing a data update using automation mode in the GDM-Tool. The GDM-Tool is a Windows -based desktop program written in C#.NET. Due to the lack of standardized data structures, it is unreasonable to strive towards a completely generic interface between data sources and simulation applications. Instead, the GDM-Tool uses a plug-in-based architecture (section 3.7) to facilitate the configuration process described in previous sections. Thus, all data operations for import, processing and export correspond to a plug-in. The plug-in structure also enables easy extension of the GDM-Tool. This is possible since plug-ins are separately developed and compiled, and the application will automatically detect new plug-ins and allow users to apply them without modifying or re-compiling the main program. The test results, comparing the automated approach to the previously described reference procedure, are shown below. Both the manual process and the GDM-Tool are compared to throughput statistics from the real-world process. The output statistics are collected during the same weeks as the processing times were mapped. However, the breakdown times are collected during a longer period of time to obtain sufficient samples for rigorous statistical analysis. Table 11: Comparison of the time-consumption between the traditional industrial approach and the GDM-Tool in Publication V. Process/Activities Tools Time-consumption Reference procedure Extraction, categorization, calculations, cleansing MS Excel 6 hours, 15 minutes Condensation, documentation Distribution-fitting tool 3 hours Total manual 9 hours, 15 minutes Automated The GDM-Tool Configuration Configuration mode 2 hours Automated Automatic mode < 1 min Total automated Difference 2 hours 7 hours, 15 minutes 48

63 CHAPTER 4 RESULTS Table 11 shows the results with regard to time-consumption measured over the entire process, starting with extraction of raw data and ending with simulation data residing in an interface ready to use in a simulation model. The time-consumption was reduced by 78%, including the configuration steps, given that all necessary plug-ins are available. There were slightly more than rows of raw data for breakdowns and around 7200 for processing times. Table 12: A comparison between simulation outputs from traditional and automated input data management. Period with known processing times Extended simulation period Output Manual Real-world GDM-Tool Manual Real-world GDM-Tool Mean Std Dev Table 12 shows the output data from the real-world process and from one and the same simulation model with input data prepared both manually and by means of the GDM-Tool. All results are given in products per time unit, but the time unit is unpublished for secrecy reasons. To the left in Table 12, the simulation results are compared to real-world data from the same period of time that the raw data for processing time were collected. To the right, the simulation period was extended to six months but still used the same data for processing times. Hence, these data are expected to be generally applicable. The results show that the data prepared by the GDM-Tool underestimate the total output of products by 2% during the period with correct processing times. During the same time, the reference procedure overestimates the output by 2%. Using hypothesis testing, it is stated that there is no statistical basis for inferring a difference between the two approaches of input data management. For the extended simulation period, the same differences are 4% and 2% in comparison to real-world data for the GDM-Tool and the reference procedure respectively PUBLICATION VI - Towards Continuously Updated Simulation Models: Combining Automated Raw Data Collection and Automated Data Processing This publication originates from the same line of arguments as Publication V. The proposed automated approach, linking data collection, processing and interfacing, is assumed to enable reduction of the extensive time-consumption in input data management. Techniques for automated collection of raw data constitute a more significant part in this paper than in the previous one. Thus, the complete chain of input data management is addressed. Similar solutions, such as MDA (see Chapter 3.6.2), often include highly customized components both for raw data collection and data processing algorithms. Objective and Contribution to RQ2 The aim of this publication is to evaluate the feasibility of combining automated raw data collection and automated data processing into a push-button solution for DES. The case study is important for validating the proposed concept of automated data management by presenting an additional test case performed in a different type of industry than in Publication V. Furthermore, it adds a measurement of the difference in time-consumption between traditional and automated input data management in order to quantify possible benefits of the concept. Study Description This publication combines the capabilities of two existing technologies, MTConnect for automated raw data collection (see section 3.2.2) and the GDM-Tool for automated data 49

64 CHAPTER 4 RESULTS processing (see section 4.3.1). The combined solution is designed and tested in a case study at a manufacturing company in the aerospace industry to ensure that it is applicable in a real-world context. All parts of the study were performed in close collaboration with NIST (USA). They built the simulation model requesting the information, contributed detailed knowledge about the production data, and were responsible for the contact with process experts at the company. The study includes the management of MTBF and MTTR for CNC machines, and the information was supplied to the simulation model using CMSD in order to demonstrate how the results can be presented in a neutral format. In addition to designing and demonstrating the solution, this study also measures the timeconsumption for automatically completing the input data management process, by combining MTConnect and the GDM-Tool, and compares it to the industrial reference procedure outlined in section The reference procedure was performed by a simulation engineer at NIST and the GDM-Tool was configured by the author. Note that the collection of raw data was previously performed at the case study company using MTConnect and, thus, also excluded from the time measurements. Results and Conclusions This publication states that the outlined approach to automated input data management, including MTConnect, the GDM-Tool and CMSD, works for the production data included in this case study. Compared to the reference procedure, the time-reduction is 75% (from 4 hours to 1 hour) just for the processing of raw data. If the time for raw data collection is included, which normally takes several days or weeks, the reduction is of course even more significant. Another option evaluated by the authors is to use MS Excel macros instead of the GDM-Tool and this solution can be almost as efficient, given that the user has built the macros in advance or that he/she has a well-established library of suitable code sections. However, the advantage of the GDM-Tool is that it provides data processing operations applicable for any manufacturing company storing their raw data in some relational table format. Both MTConnect and the GDM-Tool are quite new applications for input data management to DES, and it is therefore necessary to improve them using further industrial case studies. In this case study some experiences should be highlighted: Data provided by MTConnect are polled in intervals specified by the user and, thus, presented as a list of machine states. Similar systems in manufacturing industry typically store raw data as events containing information of both start time and duration. However, the case set-up included a pre-developed script transforming the states to events. Further development of the GDM-Tool, according to the finding above, would streamline the data flow even more by reading the raw data directly from the MTConnect XML-file and eliminate the need for MS Excel. There are still a few issues on how to interpret data points provided by MTConnect. One example is that some down-time samples seemed to be too short to be considered as machine breakdowns in a DES model. Rather, they appeared to be logged due to communication problems between the machine and the MTConnect agent. In any case, more studies are required before using MTConnect data in sharp industrial DES studies. 50

65 CHAPTER 4 RESULTS ADDITIONAL CASE STUDY IN THE AUTOMOTIVE INDUSTRY As a part of a current research project DFBB (section 2.1), this case study is not yet published as a scientific paper. One aim of the DFBB project is to provide various engineering tools with data throughout the complete life-cycle of production systems, e.g. including conceptual design, implementation, ramp-up and steady state production. The proposed concept for automated input data management, demonstrated by GDM-Tool, is here used for processing of operational data supplied to and stored in digital building blocks for use in DES. Objective and Contribution to RQ2 This case study is performed at a Swedish automotive company (not the same as in either of Publications V or VI) and aims to test the concept of automated input data management proposed in this thesis. Further, the findings will serve as a basis for possible additions or changes in functionalities provided by the demonstrator (the GDM-Tool). In addition, the difference in time-consumption between traditional (the reference approach outlined in 4.3.1) and automated input data management is measured. Study Description In correspondence with the two other case studies used for answering RQ2, this work consists of applying the GDM-Tool for automating the data input activities to a DES model. Here, the simulation model represents parts of a production line for engine components. The machines perform milling operations and they are arranged as a serial production line including parallel machines within operation steps; see Figure 21. The input parameters automatically supplied to the DES model are MTBF and MTTR. The time-consumption for the manual and automated data input processes are measured in the same way as in Publications V and VI, using one sample for each process. The small amount of samples is due to the limited number of personnel trained in the input data management process (see further discussion in section 4.4.3). OP 20 Op 30_2 OP 30_1 OP 40_2 OP 40_1 Figure 21: Flow chart of the production line. Results and Conclusions The test implementation showed that no further functionality was required compared to the data operations obtained in Publication V. Furthermore, the test implementation resulted in reduced time for data processing, from 2 hours to 30 minutes (75%). The main reason for the increased efficiency is the automatic link between the categorized, corrected and calculated data to the distribution-fitting function. 51

66 CHAPTER 4 RESULTS Table 13: Comparison of the data quality between automated and manual input data management. Machine Parameter Automated processing Traditional processing Difference in mean Op 20 MTTR Weibull 244, 0.29 Weibull 410, 0.30 Too few samples MTBF Gamma , 0.20 Gamma , 0.25 Too few samples Op 30_1 MTTR LogNormal 3.00, 1.87 LogNormal 2.76, % MTBF Weibull 2644, 0.28 Weibull 2643, % Op 30_2 MTTR Weibull 36.48, 0.49 LogNormal 2.56, % MTBF Gamma , 0.15 Gamma , % Op 40_1 MTTR Weibull , 0.54 Weibull , % MTBF Weibull 9652, 0.27 Weibull 9642, % Op 40_2 MTTR Weibull 23.98, 0.50 Weibull 23.98, % MTBF Weibull 6887, 0.25 Weibull 6878, % In this case, the data sets included 5941 rows of data samples from slightly more than one month of production. The information processed by the GDM-Tool and supplied to the simulation model (see Figure 22) turned out to be similar between automated approach and the reference procedure; see Table 13. The data set contained too few stops in Op 20 for statistical data processing and is therefore left out of the comparison. MTBF corresponds well between the two approaches and the difference in mean for MTTR is also relatively low and well within an acceptable interval. For example, the error often differs more between two different distribution families, representing the same data set, because of their individual abilities to mimic short or long breakdowns. This is a proof that the data operations, such as the statistical distributionfitting plug-in, works correspondingly to commercial stand-alone applications. Consequently, the data were not repeatedly validated using the simulation outputs. Figure 22: A simple user view of the DES model developed in the commercial simulation package ARENA. 4.4 INTERIM DISCUSSION RQ2 This part includes the specification of required functionality for a middleware solution capable of transforming production data (raw data) to information for DES models. Further, a demonstrator (the GDM-Tool) of the proposed concept is designed, developed and evaluated through three separate and independent case studies. It is extremely important to understand that the author aims to propose a concept for automated input data management, not a commercial software solution ready for the market (despite existing inquiries). The software demonstrator should be considered as a proposed framework, inspiring researchers and 52

67 CHAPTER 4 RESULTS companies to develop stand-alone applications or integrated modules in more comprehensive IT systems for production purposes. The proposed concept is similar to methodology c (Figure 23) presented by Robertson and Perera (2002), which is based on an intermediary database connecting simulation models to CBS applications (e.g. ERP systems). Such solutions are also previously evaluated in research case studies (Randell and Bolmsjö 2001), providing valuable information to this thesis for validation purposes. The difference is that the GDM-Tool can extract data from several sources, more than those traditionally included in the CBS, which is necessary according to the findings in Publication III (RQ1). Further, the tool contains specific data processing functionality to meet the extensive processing requirements connected with DES data (see section 3.1.1). Its architecture also allows efficient extension and customization necessary in special purpose models. Figure 23: Example of methodology c for automated input data management (Robertson and Perera 2002) REQUIRED FUNCTIONALITY The required functionalities of an efficient solution for automated input data management are derived in this thesis by combining information from previous research with empirical findings from the case studies. Table 14 summarizes these functionalities and motivates why they are required. Note that most functionalities relate to the fundamental activities required to transform data to information (Davenport and Prusak 1998). For a more thorough list of all data operations (plug-ins) implemented in the demonstrator (the GDM-Tool); see Publication V. As a complement to the list in Publication V, Publication VI also identified the need for differentiating between states and events in a table of raw data (e.g. collected using MTConnect). Such functionality can be implemented by allowing removal of rows based on the content in previous rows. This is not yet implemented in the demonstrator due to capabilities of an external script already present in the case study. 53

68 CHAPTER 4 RESULTS Table 14: Required functionalities in an efficient solution for automated input data management. Functionality Data import from several sources. Mostly realized by text-file or spreadsheet interfaces to databases or local sources. Data table manipulation. E.g. splitting data columns, merging tables, removing irrelevant rows or columns. Formatting of data samples. Changing data formats to achieve conformity between data sources and to enable necessary calculations. Calculations All types of calculations, e.g. for finding the TBF or to switch between time units. Data filtering To exclude unwanted data points, e.g. because of the desired simulation period. Statistical analysis and condensation Evaluation of sample independence and selection of the best-fitting distribution. Data categorization Assigning data points to the correct production equipment and tagging the samples to conform to a selected export format. Information export to several formats Export of the results, either to neutral formats (currently CMSD) or to customized interfaces. Exemplifying reasons for implementation Previous literature reports poor data availability and problems of identifying available data sources (Perera and Liyanage 2000). The three case studies required data from multiple data sources. Publication III reports requirements of several sources, usually connected with using text-file interfaces. Archive analysis from the case studies identified needs for several operations, e.g. to split the date and time from one original column. Common to remove erroneously logged data (Alexandersson and Wirf 2001). Empirical findings from the case studies show that data from different sources, and even from columns within the same table, are of various formats and unsuited for calculations. All parameters encountered during the case studies required calculations. Limited facilities to organize and manipulate input data in current simulation software (Perera and Liyanage 2000). Historical data need to be filtered in order to exclude samples from another system state, e.g. before a major improvement project. One of the most time-consuming data input activities; see Publication II. Data need to be condensed to suit neutral formats (SISO 2011). Limited facilities to organize and manipulate input data in current simulation software (Perera and Liyanage 2000). Required to assign data samples to the correct system entities in DES models and data standards (SISO 2011). Interoperability problems are costly for industry (Gallaher, O Connor, and Phelps 2002). DES users still use customized formats to a significant extent; see Publication III. 54

69 CHAPTER 4 RESULTS The proposed concept addresses the three most time-consuming activities identified in Publication II (RQ1). Data collection is supported by allowing connections to several data sources. Of course, it presupposes that raw data are automatically, or at least previously, collected and does not fill the common need for manual gathering. Identification and understanding of data sources are needed only once, which will save time in the long run. However, the most significant contribution is in the processing of raw data (addressing the data analysis and preparation), including correction, calculation and condensation. This is a wellknown problem (Perera and Liyanage 2000, Publication III), which is automated in the proposed solution. The functionality of data processing is to a large extent handled by the statistics plug-in in the GDM-Tool. This plug-in imports data sets and starts with analyzing them with regard to sample independence using scatter plots (section ). Furthermore, it automatically identifies distribution parameters using MLE (section ) and selects the best-fitting statistical distribution for data condensation by means of KS tests (section ). Figure 24 visualizes an example of the goodness-of-fit evaluation visualized in a P-P plot. Figure 24: A P-P plot exemplifying the goodness-of-fit functionality provided by the statistics plug-in. As mentioned in Publication V, the automated execution of data activities requires a first-time configuration. Once a configuration is set, the most recent data files can automatically be processed and the results supplied to the DES model for up-to-date decision support. An example of a configuration path using common data operation plug-ins is shown in Figure 25. Excel Remove Column (repeted) Remove Row by Column Value Split Column Numeric Converter Create Numeric Data Column Excel PI Data Value Corrector Statistics CMSD Figure 25: Example of common data operations applied in a typical configuration for obtaining the MTTR from an ACS. There is of course potential to improve the demonstrator even further in future research projects, especially regarding additional support in statistical input modeling. For example, the present condensation of raw data sets exclusively supports statistical distributions. This representation is selected because of its capability to extend the simulation period further than the actual time for data collection (Robinson 2004). However, some simulation practitioners 55

70 CHAPTER 4 RESULTS prefer the precision offered by more empirical representations such as empirical distributions, traces and bootstraps. An extension to include these representations is possible, but traces and bootstraps are more difficult due to the lack of support in existing standards. Additional data representations would also allow the engineers to refuse statistical representations based on the P-value, which is not possible at present. The best-fitting statistical distribution is automatically selected based on the D-statistics; see section Another possible extension is to increase the level of automation in the detection of outliers, e.g. data samples erroneously collected due to communication problems between data collection and storage applications (Alexandersson and Wirf 2001). Algorithms for outlier detection are already available and, thus, implementation to the GDM-Tool is just a matter of time. Similar algorithms are available for statistically evaluating the independence of samples, which is now a manual monitoring option performed by studying a scatter-plot. However, note that all possible extensions above are additions to a common industrial procedure (the reference procedure) and not required to match the quality of conventional DES models EVALUATION OF THE PROPOSED CONCEPT Table 15 contains the findings from the three case studies used to validate the proposed concept of automated input data management for DES. The time-reduction compared to the reference procedure (section 4.3.1) is calculated including the data input activities: collection, processing, and documentation. Note that all possible data input activities are not included, just parts necessary to update when using simulation models on a continuous basis. For example, manual gathering does not exist in any of the three case studies since the presence of category A data is a prerequisite. Other activities only have to be performed on one initial occasion, e.g. the identification of relevant parameters and the sources to collect raw data from. This is of course the same for both the automated and the conventional approach. Table 15: Compilation of the results from the validation of the proposed concept of automated input data management performed in three test cases. Case study Data sources DES parameters Timereduction #1 Processing times and shift Processing times, times from planning system, MTBF, MTTR stop logs from local ACS. #2 MTConnect alarm time stamps. #3 Stop logs from local ACS, shift times from local spreadsheet. Identified functionalities 78% All necessary functionality (first case study). MTBF, MTTR 75% Transformation between states and events. MTBF, MTTR 75% Average 76% The evaluation shows that the proposed concept for automated input data management is capable of processing input data from several different sources more efficiently than traditional industrial approaches. The main reasons are that the GDM-Tool demonstrates more comprehensive functionality regarding necessary data operations, and that all steps in the input data management process can be completed in one procedure without manual involvement. The traditional approach relies on manual handling and special-purpose tools for data categorization, correction, calculations and condensation. Such tools have limited capabilities to categorize, correct and calculate the data (Perera and Liyanage 2000), and the use of special- 56

71 CHAPTER 4 RESULTS purpose solutions also results in manual handling of data between applications, for example to supply data from one application to another. Regarding the supply of data to DES models, the approach is validated using the neutral format CMSD (SISO 2011) in all three case studies. The demonstrator also supports export to customized interfaces such as MS Excel spreadsheets. Moreover, initial data mappings of AutomationML (AutomationML consortium 2010) and STEP AP214 (Kjellberg et al. 2009, Falkman et al. 2008), performed in the research project DFBB (Chalmers PPU 2011), also shows that interoperability with the GDM-Tool should be possible to implement. This is, however, not yet tested in a real-world environment. A detailed quantification of data quality obtained in the three case studies is delimited in this thesis; see section 1.6. However, it is important to highlight that all three case studies partly assessed the data quality in order to ensure that it is maintained compared to the conventional industrial approach. One case compared the approaches using hypothesis testing (Montgomery and Runger 1999) and the other two by comparing the input parameters using descriptive statistics and face validation (Sargent 2005). In the additional case study, the difference in mean between the automated and manual approaches might seem extensive (Table 15) for some operations. However, such differences are normal also between manually selected distributions and the original data samples, due to the statistical distributions way of representing short and long breakdowns. When using univariate distributions, long breakdowns are often unrepresented in the data supplied to the simulation, which is normally desired by the simulation engineer who wants to model production systems under normal circumstances. The validation is mainly performed using two DES parameters: MTBF and MTTR. However, the solution can easily be applied to other quantitative parameters, such as set-up times and processing times. The latter is demonstrated by being part of the validation in Publication V. The selection of MTBF and MTTR is strategic, since the handling of these two parameters is very extensive and covers the needs of most other (quantitative) simulation parameters. The handling of MTBF and MTTR includes (Williams 1994): An extensive number of samples due to the parameters importance for model dynamics. The raw data formats are often very crude and require corrections and conformation. Calculations are required, e.g. for subtracting a start time of a breakdown with the corresponding value for the previous stop (MTBF). Data from several sources are often required. For example, a combination of stop data and work schedules is required to obtain MTBF and MTTR cleansed from nonproduction time. Condensation is preferred to facilitate the supply of all data points as information to the simulation model. Outliers are commonly encountered, e.g. due to communication problems between the logging equipment and the database METHODOLOGICAL DISCUSSION Development and validation of the proposed approach to automated input data management, and its demonstrator the GDM-Tool, are based on three case studies. This is generally considered satisfactory for initial validation (Flynn et al. 1990). Furthermore, the three cases represent different companies, countries and lines of business, which is good for varying the contextual 57

72 CHAPTER 4 RESULTS factors in a data triangulation approach (Denscombe 2007). However, the fewer cases the more influence of company-specific circumstances, so it is of course desirable to continue with further tests and evaluations in future research projects. The limitation to three samples also makes the statistical calculations somewhat vulnerable. 76% time-reduction should therefore be considered more as an indication than as an absolute value at this stage. The fact that the automated approach reduces the time-consumption (RQ2) is nonetheless considered to be proven. A second consideration in the validation cases is the possible variation between users of the demonstrator (the GDM-Tool). Thus, the user aspect is delimited here and all configurations and test implementations have been performed by people from the research team, mainly by the author himself. The chief reason is that there are not enough employees in the partner companies with the necessary education and experience in input data management to set up a complete test including user variations. The author has experience of several industrial DES projects including sole responsibility of all steps of input data management, so it would be interesting to make future tests including engineers with less experience in DES. A final comment is that no case study shows a situation where automated input data management is inappropriate. It would have been interesting to obtain empirical findings from such circumstances, but the appropriateness of automation is now discussed solely on a theoretical basis; see section 5.1. The selection of case studies is dependent on the companies participating in the three projects during the PhD studies (Figure 5). 58

73 CHAPTER 5 DISCUSSION 5 DISCUSSION This chapter aims to connect the two parts of this thesis (RQ1 and RQ2) and relate the results to the thesis purpose and aim. Additional discussions of findings associated with each part can be found in the interim discussions located in sections 4.2 and 4.4. Note that the methodological discussions are also provided in the interim discussions. Input data management is still one of the most critical and time-consuming phases of a DES project. Ten years ago Robertson and Perera (2001) stated that It is strongly argued that data collection is the most crucial and time-consuming stage in the model building process and there has been no significant improvement of the efficiency since then. This thesis shows that input data management still consumes on average 31% of the total project time. Thus, further efforts in this area are needed to increase the relatively low dissemination of DES in industry (McNally and Heavey 2004). Results from Publication III (RQ1) show that many companies work hard to implement more efficient solutions (methodologies c and d in Robertson and Perera (2002)), but need further support from researchers, which is provided in this thesis. The desire and spirit to move forward mean that successful solutions to the difficulties in input data management will increase the use of DES on a daily basis. Models will be continuously updated without major efforts and, thus, production engineers have access to an analysis tool capable of including dynamic aspects of production systems. Balancing and system losses can be reduced in a more effective way compared to the present situation. This will in turn lead to more robust and efficient production systems. The main solution proposed in this thesis (RQ2) relies on automated input data management. In order to be as effective as possible, it is mainly designed to support engineers in the work with the three most time-consuming activities identified in RQ1: data collection, identification of available data, and data analysis and preparation. A demonstrator is developed and three independent case studies have shown the concept to be feasible and that the time-consumption was reduced by 76% on average. The key features of the proposed solution are that the intermediary application allows import of data from several sources and provides necessary functionality for automated transformation of data to information for DES models. The first feature facilitates data collection and reduces the need for repetitive identification of available data. The second feature naturally increases efficiency in data analysis and preparation. This concept can be categorized as a methodology c solution (section 3.6), which might be perceived as somewhat defensive for being front-end research. However, findings from Publication III indicate that industry prefers such an approach, and the publication also pinpoints several problems with the even more integrated methodology d. These problems are: Data in sources within the CBS are not detailed enough for simulation (Moon and Phatak 2005). Simulation projects still rely on data from several sources; significant parts of the data reside in local systems (Publication III). There is a need for additional data processing and possibilities for creating what-if scenarios. There is a diversity of simulation tools and they are selected on the basis of their different strengths as well as the employees experience (Semini, Fauske, and 59

74 CHAPTER 5 DISCUSSION Strandhagen 2006). This fact makes totally integrated PLM packages (example of methodology d) challenging to implement throughout organizations. As an alternative to exclusively relying on single PLM packages, the latter problem above could instead be addressed by increased interoperability using neutral data formats; see examples in section Neutral formats would also facilitate the distribution of data between other production engineering applications, e.g. ergonomic simulations, layout planning tools and line balancing software. 5.1 IS AUTOMATION ALWAYS FEASIBLE? Although automation of data input activities holds tremendous potential for reducing the timeconsumption in DES projects, it is possible to identify situations where other approaches are still more applicable. The most obvious is in SME not working with simulation on a regular basis. Such companies usually have no specialists dedicated to DES analyses, too much unavailable (category C) data, and limited possibilities for investing in the necessary equipment. Therefore, systematic approaches outlined as best-practice guidelines for manual input data management are also necessary. The state-of-the-art descriptions provided in Publication I can be used for this purpose, but there are also other supporting publications in the area (see for example Bernhard and Wenzel 2005, Perera and Liyanage 2000, Lehtonen and Seppälä 1997). Furthermore, in addition to the reliance on available data, the confidence in automated solutions is dependent on the competence and experience of the user. A user who is familiar with the modeled production system and common data input activities can follow the process and interpret the output of a software solution automating certain steps (e.g. the GDM-Tool). However, less experienced users might perceive the automated process as a black box and, thus, question the validity and credibility of the information submitted to the simulation models. This is a situation where the more systematic approaches also would be preferable. An additional prerequisite for successful automation is that the modeled system must stay unchanged with regard to production equipment, parts routing, etc. Such major changes require modifications in the model logic, which are not handled by an automated input data management application. Consequently, automated input data management is most beneficial when connected to models used on a frequent basis for continuous improvement of production performance. Frameworks for facilitating updated model logics belong to another research area, for example described in publications about SysML (Huang, Ramamurthy and McGinnis 2007). 5.2 RESEARCH CONTRIBUTION AND POSSIBLE INDUSTRIAL APPLICATIONS One contribution to the research community is the mapping of current industrial state-of-the-art in the input data management process. The identification of the most time-consuming activities is specifically important for prioritizing future efforts related to research on more efficient data handling. Focus on supporting the activities having highest impact on the total timeconsumption holds higher potential for significant increases in efficiency. Moreover, the outline of a solution for automated input data management adds value to previous contributions by providing suggestions on detailed functionality, i.e. necessary data operations. Previous contributions in the area are more targeted on higher-level system architecture. For industry, one contribution is the hands-on guideline on input data management, including best practice-descriptions of the different activities. Such systematic guidelines are useful for increasing efficiency of input data management in companies with a limited amount of category 60

75 CHAPTER 5 DISCUSSION A data, mainly SME. The main contribution, however, is the description of functionalities and software architecture for automated input data management, which is presented using the demonstrator called the GDM-Tool. This demonstrator realizes the middleware solution requested and identified as a gap in section 1.1. The GDM-Tool is not yet robust enough to be launched as a commercial product, but it may well serve as a prototype for commercial software solutions realized by major DES users, consultancy firms and vendors of production data management systems. 5.3 FUTURE RESEARCH Many interesting ideas for further research have been identified in the previous discussion sections (4.2 and 4.4). One that is highly prioritized for the author and the Virtual Production research group at Chalmers is to incorporate environmental analyses in DES studies. Significant progress is already obtained but the acquisition of accurate data is frequently mentioned as a major problem. Firstly, further studies are needed to evaluate whether the type of data representation described in Publication IV is valid for other parameters than electrical power and for different types of production equipment. Secondly, research is required on automated extraction and processing of data from external databases, e.g. EcoInvent, ELCD and UPLCI (Overcash, Twomey and Kalla 2009). As discussed in section 4.4.3, three test cases are used for the development and validation of the proposed concept for automated input data management. This limited number of cases results in difficulties of presenting exact quantifications of the concept potential. Consequently, more case studies are needed for better precision of the obtained reduction in time-consumption, and for increased confidence in the necessary data processing functionality. Additionally, current work focuses on the technical aspects of automation, and future cases should therefore also include human aspects of the engineers working together with the application. What education and experience are required to reduce the time-consumption using the GDM-Tool, and how should the user interface be improved to support the usage among non-experts? An additional recommendation, for both industry and researchers, is to intensify the work on increasing the amount of category A data. At present, significant efforts are required for manual data collection. This fact contributes to the extensive time-consumption and hinders automated solutions. One solution is further implementation of continuous collection systems, preferably systems enabling data collection on a temporary basis. Such a solution would also increase the possibilities for efficient data collection in SME. Another alternative is to include the data requirements of DES when implementing major CBS applications, for example Manufacturing Execution Systems (MES) or ERP systems. A great option for a future research project is therefore to develop and test a flexible data collection system and automatically connect it to a sharp implementation of the GDM-Tool for further data processing. A DES model used for continuous improvements should be the receiver of the final result. Full implementations running at three companies for six months each would be satisfactory for studying the precision in data collection, the reduction in time-consumption including the user influence, and the effects of using DES as a desktop resource. The author is fully convinced that the participating companies would experience an increased utilization of existing resources as well as a more robust and reliable production output. By extension, increased capacity, higher service levels, and possibilities to reduce stock levels throughout the value chain would be obtained. 61

76 CHAPTER 5 DISCUSSION 62

77 CHAPTER 6 CONCLUSIONS 6 CONCLUSIONS The aim of this thesis is to reduce the timeconsumption for input data management in simulation of production flows. Increased efficiency in this process will enable more frequent use of dynamic simulations and, thus, support production engineers in improving performance and robustness of production systems. Results in one of the appended publications indicate limited progress during the last decade and states that RQ1: What is the current industrial state-of-the-art for the input data management process? RQ2: How can efficient and automated input data management, for simulation of material flows in production, be realized? companies need further assistance in finding appropriate solutions for automating the data input activities. One such solution is presented and evaluated in this thesis. Today, the industrial state-of-the-art for the input data management process (RQ1) includes a significant amount of manual work. This is a major reason for the fact that input data management consumes on average 31% of the entire time during Discrete Event Simulation projects. The level of automation is generally low in the gathering of raw data as well as in data processing. Furthermore, the supply of information to simulation models is also heavily dependent on human involvement, either using customized spreadsheet interfaces or even typing the values directly in the model code. Thirteen distinct data input activities are identified and described in section This description may well be used as a best-practice guide-line for increasing efficiency and precision in input data management. An evaluation of 15 industrial DES projects also shows that the three most time-consuming activities are: data collection, identification of available data, and data analysis and preparation. The proposed solution for automated input data management (RQ2) is presented using a software demonstrator, called the GDM-Tool. The most significant difference between this solution and the common industrial approach is that the GDM-Tool enables automation and complete integration of data collection, data processing, and supply of information to simulation models. The GDM-Tool extracts data from several sources, demonstrates the necessary operations for data processing, and provides an automated link to simulation models using neutral formats. The current set of data operations has shown sufficient capabilities in three independent case studies performed in the automotive and aerospace industries. However, the plug-in-based architecture of the GDM-Tool also provides the possibility to add functionality if future case studies or real-world implementations require more customized solutions. Three case studies have shown that the time-consumption for input data management can be reduced by approximately 75% compared to the common industrial approach, given that the necessary raw data are available. This number includes a one-time configuration, so the potential is even higher when simulation models are repeatedly used. The main reason for the increased efficiency is improvement in the data processing step. Several manual activities for data categorization, correction, calculations, and condensation are removed thanks to the unbroken chain of automated activities. The integration of data operations also eliminates the need for manual handling of data between different special-purpose applications. 63

78 CHAPTER 6 CONCLUSIONS 64

79 CHAPTER 7 REFERENCES 7 REFERENCES Alexandersson, T., and L. Wirf Improving the Input Data Treatment for Increased Simulation Output Quality. M.Sc. Thesis, Department of Production Engineering, Chalmers University of Technology, Gothenburg, Sweden. Aufenanger, M., A. Blecken, and C. Laroque Design and Implementation of an MDA Interface for Flexible Data Capturing. Journal of Simulation 4(4): AutomationML consortium Whitepaper AutomationML Part 1 AutomationML Architecture (State May ). Available from Accessed on July Balderud, J., and A. Olofsson A Plug-in Based Software Architecture for Generic Data Management. M.Sc. Thesis, Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden. Banks, J., J.S. Carson, and B.L. Nelson Discrete-Event System Simulation (2 nd ed.), Prentice- Hall, Upper Saddle River, New Jersey. Bernhard, J., and S. Wenzel Information Acquisition for Model Based Analysis of Large Logistics Networks. In: Proceedings of the 19 th European Conference on Modelling and Simulation, eds. Y. Merkuryev, R. Zobel and E. Kerckhoffs, Bordens, K.S., and B.B. Abbot Research Design and Methods: A Process Approach. McGraw- Hill, New York. Bryman, A., and E. Bell Business research methods (2 nd ed.), Oxford University Press, New York. Cao, H.-J., Y.-C. Chou, and H. H. Cheng Mobile Agent Based Integration Framework for Flexible Dynamic Job Shop Scheduling, In: ASME Conference Proceedings IDETC/CIE2009. Chalmers PPU - Department of Product and Production Development Available from Accessed on May 28, Choo, C. W., B. Detlor, and D. Turnbull Web work: information seeking and knowledge work on the World Wide Web, Kluwer Academic, Dordrecht. Coughlan, P., and D. Coughlan Action Research for Operations Management. International Journal of Operations & Production Management 22(2): Danemark, B., M. Ekström, L. Jakobsen, and J.Ch. Karlsson Att förklara samhället, Studentlitteratur, Lund (in Swedish). Dassault Systemes Delmia Digital Manufacturing & Production. Available from Accessed on August 3, Davenport, T. H Information Ecology, Oxford University Press, New York. Davenport, T. H., and L. Prusak Working knowledge: How organizations manage what they know. Harvard Business School Press, Boston, Massachusetts. 65

80 CHAPTER 7 REFERENCES Denscombe, M The good research guide: for small-scale social research projects (3 rd ed.), Open University Press, Maidenhead. DIN Deutsches Institut für Normung e.v Informationsverarbeitung Part 1. DIN Beuth, Berlin (in German). Dubois, A., and L.-E. Gadde Systematic combining: an abductive approach to case research. Journal of Business Research, 55: Dungan, P., and C. Heavey Proposed Visual Wiki System for Gathering Knowledge About Discrete Event Systems. In: Proceedings of the 2010 Winter Simulation Conference, eds. B. Johansson, S. Jain, J. Montoya-Torres, J. Hugan, and E. Yücesan, EcoProIT research project Chalmers University of Technology. Available from Accessed on August 11, Falkman, P., J. Nielsen, B. Lennartsson, and A. von Euler-Chelpin Generation of STEP AP214 models from discrete event systems for process planning and control. IEEE Transactions on Automation Science and Engineering, 5(1): Flynn, B.B., S. Sakakibara, R.G. Schroeder, K.A. Bates, and E.J. Flynn Empirical Research Methods in Operations Management. Journal of Operations Management, 9(2): Gallaher, M., A. O Connor, and T. Phelps Economic Impact Assessment of the Standard for the Exchange of Product Model Data (STEP) in Transportation Equipment Industries. Planning report 02-5, Prepared for the National Institute for Standards and Technology. Gamma, E., R. Helm, R. Johnson, and J. Vlissides Design Patterns: Elements of Reusable Object-Oriented Software, Addison-Wesley Longman Publishing Co. Inc., Boston, MA, USA. Geer Mountain Software Corporation, Stat::Fit Commercial Webpage. Available from Accessed on August 4, Glaser, B.G., and A.L. Strauss The discovery of grounded theory: strategies for qualitative research, Aldine de Gruyter, New York. Gummesson, E Qualitative Methods in Management Research (2 nd ed.). SAGE, Thousand Oaks. Hatami, S Data Requirements for Analysis of Manufacturing Systems Using Computer Simulation. In: Proceedings of the 1990 Winter Simulation Conference, eds. O. Balci, R.P. Sadowski and R.E. Nance, Heilala, J., S. Vatanen, J. Montonen, H. Tonteri, B. Johansson, J. Stahre, and S. Lind Simulation-Based Sustainable Manufacturing System Design. In Proceedings of the 2008 Winter Simulation Conference, eds. S.J. Mason, R.R. Hill, L. Mönch, O. Rose, T. Jefferson, and J.W. Fowler, Hollocks, B.W Discrete-event simulation: an inquiry into user practice. Simulation Practice and Theory, 8:

81 CHAPTER 7 REFERENCES Huang, E., R. Ramamurthy, and L.F. McGinnis System and Simulation Modeling using SysML. In: Proceedings of the 2007 Winter Simulation Conference, eds. S.G. Henderson, B. Biller, M.-H. Hsieh, J. Shortle, J.D. Tew and R.R. Barton, Ingalls, R.G Introduction to Simulation. In Proceedings of the 2002 Winter Simulation Conference, eds. E. Yücesan, C.-H. Chen, J. L. Snowdon, and J. M. Charnes, Ingemansson, A On Reduction of Production Disturbances in Manufacturing Systems Based on Discrete-Event Simulation. Doctorial dissertation, Department of Mechanical Engineering, Lund University, Lund, Sweden. Ingemansson, A., T. Ylipää, and G. S. Bolmsjö Reducing bottle-necks in a manufacturing system with automatic data collection and discrete event simulation. Journal of Manufacturing Technology Management, 16: Institute for Environment and Sustainability European Reference Life Cycle Database. Available from Accessed on August 11, Johansson, B., Å. Fasth, J. Stahre, J. Heilala, S.K. Leong, Y.T. Lee, and F.H. Riddick Enabling Flexible Manufacturing Systems by Using Level of Automation as Design Parameter. In: Proceedings of the 2009 Winter Simulation Conference, eds. M.D. Rossetti, R.R. Hill, B. Johansson, A. Dunkin and R.G. Ingalls, Johansson, M., B. Johansson, S.K. Leong, F.H. Riddick, and Y.T. Lee A Real World Pilot Implementation of the Core Manufacturing Simulation Data Model. In: Proceedings of the Summer Computer Simulation Conference. Edinburgh, Scotland. Johansson, M., and R. Zachrisson Modeling automotive manufacturing process. M.Sc. Thesis. Department of Product and Production Development, Chalmers University of Technology, Gothenburg, Sweden. Kibira, D., and S. K. Leong Test of Core Manufacturing Simulation Data Specification in Automotive Assembly. In: Proceedings of the Simulation Interoperability Standards Organization (SISO) and Society for Modeling and Simulation (SCS) International European Multi Conference, Orlando, Florida, USA. Kjellberg, T., A. von Euler-Chelpin, M. Hedlind, M. Lundgren, G. Sivard, and D. Chen The machine tool model - A core part of the digital factory. In: CIRP Annals - Manufacturing Technology, 58(1): Kleindienst, J., and D. Juricic Optimal Selection of Information Terminals for Data Acquisition in Manufacturing Processes. In: Proceedings of the 6 th EUROSIM Congress on Modelling and Simulation, eds. B. Zupančič, R. Karba and S. Blažič. Kumar, S., and D.A. Nottestad Flexible capacity design for the Focus Factory a case study. International Journal of Production Research, 47(5): Kühn, W Digital Factory Simulation Enhancing the Product and Production Engineering Process. In Proceedings of the 2006 Winter Simulation Conference, eds. L.F. Perrone, F.P. Wieland, J. Liu, B.G.Lawson, D.M. Nicol, and R.M. Fujimoto,

82 CHAPTER 7 REFERENCES Law, A. M., and M. G. McComas How the ExpertFit Distribution-Fitting Software can make your Simulation Models more Valid. In: Proceedings of the 2003 Winter Simulation Conference, eds. S. Chick, P.J. Sanchez, D. Ferrin and D.J. Morris, Law, A.M Simulation modeling & analysis (4 th ed.), McGraw-Hill, New York. Leemis, L Building credible input models. In: Proceedings of the 2004 Winter Simulation Conference, eds. R.G. Ingalls, M.D. Rossetti, J.S. Smith and B.A. Peters, Lehtonen, J.-M., and U. Seppälä A methodology for data gathering and analysis in a logistics simulation project. Integrated Manufacturing Systems, 8: MANUFUTURE EuroStat 2006 McLean, C., and S. Leong The expanding role of simulation in future manufacturing. In Proceedings of the 2001 Winter Simulation Conference, eds. B. A. Peters, J. S. Smith, D. J. Medeiros, and M. W. Rohrer, McNally, P., and C. Heavey Developing simulation as a desktop resource. International Journal of Computer Integrated Manufacturing 17: Miles, M.B., and A.M. Huberman Qualitative data analysis: an expanded sourcebook (2 nd ed.), SAGE Publications, Thousand Oaks. Montgomery, D.C., and G.C. Runger Applied statistics and probability for engineers (2 nd ed.), Wiley & Sons, New York. Moon, Y. B., and D. Phatak Enhancing ERP system s functionality with discrete event simulation. Industrial Management & Data Systems 105: MTConnect Institute Available from Accessed on August 7, Nonaka, I., and H. Takeuchi The knowledge-creating company: how Japanese companies create the dynamics of innovation, Oxford University Press, New York. OPC Foundation Available from Accessed on August 7, Overcash, M., J. Twomey, and D. Kalla Unit Process Life Cycle Inventory for Product Manufacturing Operations, ASME Conference Proceedings MSEC2009, West Lafayette, IN, USA. Pegden, C.D., R.E. Shannon, and R.P. Sadowski Introduction to simulation using SIMAN (2 nd ed.), McGraw-Hill, New York. Perera, T., and K. Liyanage Methodology for rapid identification of input data in the simulation of manufacturing systems. Simulation Practice and Theory, 7: Perrica, G., C. Fantuzzi, A. Grassi, G. Goldoni, and F. Raimondi Time to Failure and Time to Repair Profiles Identification. In Proceedings of the 5 th FOODSIM conference. Dublin, Ireland. Pidd, M Tools for Thinking: Modelling in Management Science, John Wiley & Sons, Chichester. 68

83 CHAPTER 7 REFERENCES ProViking Available from Accessed on May 29, Randell, L.G., and G.S. Bolmsjö Database driven factory simulation: a proof-of-concept demonstrator. In: Proceedings of the 2001 Winter Simulation Conference, eds. B.A. Peters, J.S. Smith, D.J. Medeiros, and M.W. Rohrer, Robertson, N., and T. Perera Feasibility for automatic data collection. In: Proceedings of the 2001 Winter Simulation Conference, eds. B.A. Peters, J.S. Smith, D.J. Medeiros, and M.W. Rohrer, Robertson, N., and T. Perera Automated data collection for simulation?. Simulation Practice and Theory 9: Robinson, S., and V. Bhatia Secrets of successful simulation projects. In: Proceedings of the 1995 Winter Simulation Conference, eds. C. Alexopoulos, K. Kang, W.R. Lilegdon and D. Goldsman, Robinson, S Simulation: The Practice of Model Development and Use. John Wiley & Sons Ltd, Chichester. Sargent, R.G Verification and validation of simulation models. In: Proceedings of the 2005 Winter Simulation Conference, eds. M.E. Kuhl, N.M. Steiger, F.B. Armstrong, and J.A. Joines, Semini, M., H. Fauske, and J.O. Strandhagen Applications of Discrete-Event-Simulation to Support Manufacturing Logistics Decision-Making: A Survey. In Proceedings of the 2006 Winter Simulation Conference, eds. L.F. Perrone, F.P. Wieland, J. Liu, B.G.Lawson, D.M. Nicol, and R.M. Fujimoto, Siemens Siemens PLM Software, Teamcenter. Available from Accessed on August 3, SISO Simulation Interoperability Standards Organization: SISO Policies and Procedures. Available from Accessed on November 16, SISO Simulation Interoperability Standards Organization, CMSD Product Development Group. SISO-STD , April Solding. P., D. Petku, and N. Mardan Using simulation for more sustainable production systems methodologies and case studies. International Journal of Sustainable Engineering 2: Solding, P., P. Thollander, and P.R. Moore Improved energy-efficient production using discrete event simulation. Journal of Simulation, 3: Starrin, B., and P.G. Svensson Kvalitativ metod och vetenskapsteori, Studentlitteratur, Lund (in Swedish). Stevenson, W.D Elements of Power System Analysis (4 rd ed.), McGraw Hill, New York. 69

84 CHAPTER 7 REFERENCES Swiss Centre for Life Cycle Inventories The EcoInvent Database. Available from Accessed on August 11, Trybula, W Building simulation models without data. IEEE International Conference on Systems, Man, and Cybernetics. Humans, Information and Technology, 1: UML Resource Page Unified Modeling Language. Available from Accessed on January 3, Van der Spek, R., and A. Spijkervet Knowledge Management: Dealing Intelligently with Knowledge. In: Knowledge Management and its Integrative Elements, eds. J. Liebowitz and L. Wilcox, 31-58, CRC Press. Van der Zee, D.-J. and J.G A.J. Van der Vorst Guiding principles for conceptual model creation in manufacturing simulation. In: Proceedings of the 2007 Winter Simulation Conference, eds. S.G. Henderson, B. Biller, M.-H. Hsieh, J. Shortle, J.D. Tew and R.R. Barton, Wallén, G Vetenskapsteori och forskningsmetodik (2 nd ed.), Sudentlitteratur, Lund (in Swedish). Weick, K.E The social psychology of organizing (2 nd ed.), Random House, New York. Wild, R On the selection of mass production systems. International Journal of Production Research 13: Wilkinson, A.M The Scientist s Handbook of Writing Papers and Dissertations, Prentice-Hall, New Jersey. Williams, E.J Downtime Data -- its Collection, Analysis, and Importance. In: Proceedings of the 1994 Winter Simulation Conference, eds. J.D. Tew, M.S. Manivannan, D.A. Sadowski, and A.F. Seila, Williams, E.J Making Simulation a Corporate Norm. In: Proceedings of the 1996 Summer Computer Simulation Conference, eds. V.W. Ingalls, J. Cynamon and A.V. Saylor, Yin, R.K Case study research: design and methods (2 nd ed.), SAGE Publications, Thousand Oaks. Zaum, D., M. Olbrich, and E. Barke Automatic data extraction: A prerequisite for productivity measurement. In: Proceedings of the IEEE International Engineering Management Conference, Europe: Managing Engineering, Technology and Innovation for Growth. 70

85 APPENDED MATERIALS APPENDED MATERIALS This part of the thesis includes questions and topics used in interviews and questionnaires for data collection in Publications I, II and III. In some cases, the materials are translated from the original format (in Swedish) to English. Topics used for the semi-structured interviews in Publications I and II 1. Please draw an outline of your applied work procedure during input data management in your DES project (using white-board or pen and paper depending on the meeting location). a. Please describe the data input activities performed in your DES project? b. Try to use as general terminology as possible. c. In what order did you perform the activities? d. If possible, please draw the relations between activities to form a flow diagram. e. Was there a need for iterations of activities during your work-procedure? 2. Can you identify possible improvements of your applied work-procedure? 3. What would you change in the work procedure if it were possible to do the project over again? 4. Based on the experience from your project, what is most important in order to increase efficiency in input data management? 5. Do you think that the input data management phase in your project would have been more rapid if a structured methodology was applied? Please motivate. Questions included in the face-to-face questionnaire for Publication II 1. How many people are employed at the company where the DES project was completed? 2. How frequently does the company use DES for improvements of production flows? a. Never before, this was the first project. b. Sporadically, in few improvement projects. c. In major change projects. d. In almost all improvement projects. e. Sporadically in order to identify improvement possibilities. f. Continuously in order to identify improvement possibilities. g. On a daily basis for planning and control purposes. 3. How much time did the entire project consume? 4. How much time was planned for input data management according to the project plan? 5. How much time did each data input activity consume (respond in man-hours)? 6. Which input parameters were included in your model, from what source did you collect them, and how much time did you spend on collecting each specific parameter? 7. If any, please list the software solutions or other tools used for data processing? 8. Were you able to separately validate the collected data? 9. Were you able to validate the simulation model according to the real-world system? 10. Did the input data management step follow the time-consumption estimated in the project plan? 11. Did the complete project finish on time according to the project plan? 12. Was the project considered successful based on the objectives in the project plan? 71

86 APPENDED MATERIALS Questions included in the questionnaire for Publication III 1. Please specify your major areas of application for DES, e.g. manufacturing, logistics, health care, and military. 2. What makes you to use simulation in your business? (please select the most appropriate answer) a. Simulation is used to address a specific business need such as design of a new factory. Model is not re-used once the project is completed. b. Simulation is regularly used to improve business operations. Models are often reused. c. Use of simulation is mandatory within the business in every improvement project. 3. Do you apply a structured approach to input data management (including raw data collection and data processing), such as data collection templates, guidelines and/or checklists? a. Yes, please specify. b. No. 4. Which is the main source of input data to DES models? (please select one answer) a. Manual gathering (e.g. stop watch, movie recording) b. People-based systems (e.g. interviews, expert knowledge) c. Paper-based systems (brochures etc.) d. Local computer-based systems (e.g. spreadsheets) e. Computer-based corporate business systems (e.g. ERP, MES, PLM) f. Other, please specify. 5. Which sources of input data are commonly used? (several alternatives are allowed) a. Manual gathering (e.g. stop watch, movie recording) b. People-based systems (e.g. interviews, expert knowledge) c. Paper-based systems (brochures etc.) d. Local computer-based systems (e.g. spreadsheets) e. Computer-based corporate business systems (e.g. ERP, MES, PLM) f. Other, please specify. 6. What is your major approach for selection between duplicate data sources (if you have multiple sources for the same data item)? (please select the most appropriate answer) a. Data duplication is never encountered b. Select the most recent data c. Base the selection on personal experience d. Combination of data sources e. Base the selection on team knowledge f. Select data most local to the source/origin g. Other, please specify. 7. How are data accuracy, reliability and validity mainly assured? (please select the most appropriate answer) a. Interviewing area experts b. Basic sanity checks c. Personal experience d. The internal or external customer s responsibility e. Model validation runs 72

87 APPENDED MATERIALS f. Other, please specify. 8. Models develop and evolve; how is data validity maintained? (please select the most appropriate answer) a. Continuous manual efforts for data collection b. Manual efforts for data collection, only initiated when the model will be used c. Automated collection for parts of the data d. Continuous automated collection of all necessary data e. Models are not maintained and reused f. Other, please specify. 9. How are data (information) supplied to the simulation model? (please select the most appropriate answer) a. Manually written in the model code b. Via an external spreadsheet (automatically connected to the model) or similar c. An off-line database automatically connected to the model d. Direct link between corporate business systems and simulation model e. Other, please specify. 10. Where is the majority of data (information) held, i.e. where does the processed data reside? (please select the most appropriate answer) a. In the simulation model b. In a paper-based system c. In a local computer-based system (e.g. a spreadsheet) d. A computer-based corporate business system (e.g. ERP, MES) e. Other, please specify. 11. Considering the entire input data management process, which is the most common methodology (Figure 10 was appended)? (please select the most appropriate answer) a. Methodology A b. Methodology B c. Methodology C d. Methodology D e. Why? Please describe benefits and problems. 12. Which methodology do you think will be used in ten years? (please select the most appropriate answer) a. Methodology A b. Methodology B c. Methodology C d. Methodology D e. Please explain your choice. 73

88 APPENDED MATERIALS 74

89 Publication I Skoogh, A., and B. Johansson A Methodology for Input Data Management in Discrete Event Simulation Projects. In: Proceedings of the 2008 Winter Simulation Conference, eds. S.J. Mason, R. Hill, L. Moench, O. Rose, T. Jefferson, and J.W. Fowler,

90

91 Proceedings of the 2008 Winter Simulation Conference S. J. Mason, R. Hill, L. Moench, and O. Rose, eds. A METHODOLOGY FOR INPUT DATA MANAGEMENT IN DISCRETE EVENT SIMULATION PROJECTS Anders Skoogh Björn Johansson Department of Product and Production Development Chalmers University of Technology Hörsalsvägen 7A Gothenburg, SE , Sweden ABSTRACT Discrete event simulation (DES) projects rely heavily on high input data quality. Therefore, the input data management process is very important and, thus, consumes an extensive amount of time. To secure quality and increase rapidity in DES projects, there are well structured methodologies to follow, but a detailed guideline for how to perform the crucial process of handling input data, is missing. This paper presents such a structured methodology, including description of 13 activities and their internal connections. Having this kind of methodology available, our hypothesis is that the structured way to work increases rapidity for input data management and, consequently, also for entire DES projects. The improvement is expected to be larger in companies with low or medium experience in DES. 1 INTRODUCTION Discrete event simulation (DES) has proved itself to be a very powerful tool for decision support in production development (Williams 1996). The tool provides possibilities to conduct precise dynamic analyses in order to improve running production or to secure smooth implementations of new products or production equipment. However, despite the promising potential, industry has not fully adopted the tool (Ericsson 2005). One of the most conspicuous disadvantages of DES is arguably the extensive amount of time needed to perform a simulation study (Johansson, Johnsson, and Kinnander 2003). The substantial time-consumption has been especially conspicuous when applying DES in early conceptual phases of major change projects, e.g. new product introduction or implementation of new production equipment. In this kind of projects, quick responses on analyses are usually essential in order to reduce project lead-time. Moreover, there is a broad consensus on the fact that input data management is one of the crucial parts of a simulation project, with regard to the time-consumption. Previous studies have shown that the input data phase constitutes on average 31% of the time in entire projects (Skoogh and Johansson 2007), and Trybula (1994) reported similar results, stating that the input data phase consumes 10-40%. Fortunately, a considerable amount of research work has been performed to reduce the time-consumption for input data management. A lot of work has focused on automating the process of input data collection. For instance, Randell and Bolmsjö (2001) demonstrated a method to reduce project lead-time using database driven factory simulation. Robertson and Perera (2002) described how the Corporate Business System can be used as the simulation data source and, thus, be advantageous in order to increase speed of input data collection. On the other hand, complete automation of the entire input data management process requires well developed original data sources. However, in many cases these sources omit some necessary data for simulation, especially data needed to mimic the dynamics of the investigated system (Robertson and Perera 2002; Ho, Wu, and Tai 2004). Furthermore, small and medium sized companies do not always have continuous collection of production data. Hence, there is also a need for structured methodologies to support more traditional working procedures throughout the input data phase. However, there are few practical guides or previous contributions using a more systematic approach (Perera and Liyanage 2000, Lehtonen and Seppälä 1997), which is a pity since the number of non-specialists working with DES tools is increasing (Hollocks 2001). Moreover, successful examples of this kind of methodologies are found by just studying the numerous guidelines of structuring entire simulation projects (Banks et al. 2004; Law 2007; Pidd 1995). These have proved to be of great support for practitioners, not least for users with low or medium experience of DES. The aim of this paper is to contribute to the work towards more time-efficient and accurate input data management for simulation projects, by proposing a structured methodology for activities in the input data phase, e.g.

92 Skoogh and Johansson identification, collection, analysis, and storage. A survey was performed among 15 previously completed simulation projects, in order to identify the design requirements for the methodology. Results from the survey also indicate a significant potential of utilizing an easy guide that outlines the most important steps in the input data phase of DES projects. 2 INPUT DATA MANAGEMENT In this paper, input data management is defined as the entire process of preparing quality assured, and simulation adapted, representations of all relevant input data parameters for simulation models. This includes identifying relevant input parameters, collecting all information required to represent the parameters as appropriate simulation input, converting raw data to a quality assured representation, and documenting data for future reference and re-use. Our focus is on management of data required for model realization. However, many of the activities and descriptions are also relevant for contextual data and data needed for model validation (Pidd 2003). Moreover, the approach adopted in this paper is primarily intended for quantitative data and, thus, logical relations between model entities are presupposed to be handled in the conceptual model. In addition to the categorization of data as contextual, required for model realization and needed for model validation (Pidd 2003), Robinson and Bhatia (1995) divide data into three other categories based on availability and collectability (Table 1). This classification is very useful to refer to when considering input data methodologies, since the three categories require significantly different approaches during collection. Firstly, category A data is already available, for instance in automated logging systems, Corporate Business Systems or just previously measured data intended for another study. Of course, this type of data is very convenient, since further work is limited to data analysis and validation. Secondly, category B data requires additional effort because it needs to be gathered during the simulation study. Finally, category C data is neither previously available nor collectable, often due to new processes or equipments in the investigated system. Estimation of category C data requires both a well designed strategy and scrupulous care, in order to maintain model quality. Table 1: Classification of data (Robinson and Bhatia 1995) Category A Category B Category C Available Not available but collectable Not available and not collectable Existing literature on input data management focuses mainly on how to represent extensive sets of raw data in simulation models (Robinson 2004, Perera and Liyanage 2000). Hence, there is a lot of information available on how to select a proper statistical or empirical distribution. Consequently, guidelines and information about various distribution families, Maximum Likelihood Estimations (MLE) and goodness-of-fit tests are well described, for instance in Leemis (2004) and Law (2007). However, efforts to cover a wider range of issues in the input data management process, using a systematic approach, appear less frequently during a literature review (Perera and Liyanage 2000, Lehtonen and Seppälä 1997, Hatami 1990). One of the contributions using a systematic approach is a methodology based on the Integrated computer aided manufacturing DEFinition (IDEF) (Perera and Liyanage 2000). The methodology focuses mainly on reducing required time for identification of parameters to include in the simulation model, which is arguably one of the most time-consuming activities in input data management. After investigating the system of interest for simulation, a functional model is built using pre-developed IDEF constructs. Thereafter, a required entity model is generated, which can be translated into a relational database, providing the model-builder with a structure to follow during data collection and for data storage. Furthermore, controllability analysis (CA) has been used to increase efficiency in problem definition and data management phases of simulation projects (Lehtonen and Seppälä 1997). CA is an iterative approach intended to focus only on relevant aspects of the problem to solve. At each aggregation level, the aspect of major relevance is focused upon and further analyzed in order to pinpoint the most important factors with regard to project objectives. This structured methodology is very sound in order to identify important parameters, and facilitates the data management process by minimizing collection of data that is actually irrelevant for solving the problem. However, the methodology does not describe more detailed input data management activities like collection, preparation of raw data or data validation. 3 PROJECT INTERVIEWS The suggested methodology is based on 15 semi-structured interviews (Denscombe 1998) where simulation practitioners contributed with their experiences from DES projects performed between 1999 and The selected projects represent a wide range of companies with regard to size, line of business, and previous DES experience. During the interviews, the working procedures applied in each of the 15 projects were closely examined. Additionally, several issues from the projects input data processes were addressed. The respondents shared their reflections on problems they faced related to input data management. The interviews also covered the respondents own suggestions on important steps to make input data management more efficient in future DES projects. Specific results from the interviews are presented in Table 2

93 Skoogh and Johansson and Table 3. The 15 project members were also requested to estimate the value of a predefined methodology for input data management. On the specific question do you think that the input data management phase in your project would have been more rapid if a structured methodology was applied?, the average response was 5.73 on a scale from 1 to 7. 1 means that the respondent totally disagrees and 7 means that he or she totally agrees. The specified reasons, indicating that there is promising potential in a more structured way of working with input data, are presented in Table 2. The reasons are arranged in decreasing order, starting with the most frequent. No explanations for disagreements were given (only one answer was below 4). Table 2: The respondents major explanations of why a structured methodology is assumed to increase rapidity and quality of input data management. Major Expected Benefits Increased awareness and focus on identifying the correct parameters, before starting the data gathering A generally increased definition of work structure Deciding the number of samples before starting the data gathering Inconspicuous but important activities, such as a separate validation of input data, are highlighted Increased focus on identifying the correct data sources and making sure that all data will be found Additionally, the interviews brought up several interesting points about input data related problems, which likely could have been avoided using a structured methodology. These problems are summarized in Table 3. Table 3: Problems experienced due to lack of structured methodology. Experienced Problem Made too many measurements with regard to model detail. Late additional rounds of data gathering Several attempts of raw data gathering failed. Many iterations in data collection. Root Cause Measured everything from the beginning, without specifying required accuracy. No rigid analysis, verifying that all data would be found. The gathering methods were not properly chosen and clearly defined in advance. Inefficient validation process. No separate data validation. Accordingly, the proposed methodology was developed as a combination of the 15 closely examined working procedures, the respondents further suggestions, and the authors experiences from more than 50 DES projects world wide. 4 PROPOSED METHODOLOGY The proposed methodology follows distinct activities which are shown in Figure 1. The proposed methodology for input data management does fit well into the previously frequently cited works of Banks et al. (2004), Law (2007), Pegden, Shannon and Sadowski (1995), which all show methodologies on how to perform a DES project. In these methodologies, the input data management part represents a smaller portion of a full project. That smaller portion is described in more detail below. Figure 1 shows a scheme of the proposed input data management methodology. 4.1 Identify and Define Relevant Parameters The first step while preparing input data for simulation models is to identify which parameters are necessary to include in the model. This might appear a simple task, but due to problems like high system complexity and selecting an appropriate level of detail according to the problem definition and objectives, one should not underestimate the required effort (Perera and Liyanage 2000). It is of great importance to closely investigate the system, for example by practice or pre-observation sessions, and detailed interviews with process experts. Preferably, the identification of data is performed in close connection to the development of a conceptual model (Robinson and Bhatia 1995). Moreover, to support the identification process, there are models and methodologies, which help to select an appropriate level of detail (Lehtonen and Seppälä 1997) and to decide which parameters are usually needed to model specific entities or processes. Core Manufacturing Simulation Data (CMSD) is one such effort, driven by the Simulation Interoperability Standards Organization (SISO) (Lee et al. 2007). The previously introduced IDEF-based methodology, developed by Perera and Liyanage (2000), also include functionality to connect specific parameters to entities in the conceptual model. Finally, the activity does not just include identification of relevant parameters. All parameters also need to be defined with regard to how they shall be measured and represented in the model. For instance, in many cases it is not obvious how to define a machine s cycle time. Does it start when the product is taken from the material handling device into the production cell, or should the measurement start when the machine actually starts processing the part? According to our interviews, lack of parameter definition has caused confusion during input data management in several of the 15 studied projects. To avoid the problem,

94 Skoogh and Johansson system experts should be involved to explain how the company usually defines and measures different parameters. significantly affect model performance can be paid less attention than critical ones. Thus, possible bottlenecks and narrow sectors in the system require high accuracy data with high validity. During input data management, before the model is developed, system knowledge is, again, the only possible source to determine how important certain data is for model performance. For example, if a sequenced production chain has one resource which has significantly longer processing time compared to other resources upstream and downstream, it is of more importance to have accurate data on this resource, since it will likely control the output frequency. However, later during the project it is also possible to create an experiment to analyze input data sensitivity, which will verify (or disprove) that the collected data is sufficient for the model validity. This is a very powerful complement to early decisions on accuracy requirements. It is recommended to do sensitivity analyses for all borderline cases. Another factor that affects the efforts and number of samples required for a specific parameter is process variability, which is also possible to predict using process knowledge. Process attributes expected to show a constant behavior, e.g. conveyor speed and cycle times for automated stations, need only enough samples to ensure that no unexpected variability is present. On the other hand, factors with high variability require more samples to succeed with a good representation. For instance, one of the most common input data types is breakdown data. Data describing Time To Repair (TTR) and Time To Failure (TTF) is often highly variable and hard to find on short notice, if no historical data has been collected over a long time period. Perrica et al. (2008) present one example on accuracy and estimation for both TTF and TTR. They also recommend that at least 230 samples should be collected to estimate the probability functions of TTR and TBF. This number of samples is a good rule of thumb for collection of all types of variable parameters, if possible. 4.3 Identify Available Data Figure 1: Proposed methodology for increased precision and rapidity in input data management. 4.2 Specify Accuracy Requirements For quality reasons it is usually good to collect as much raw data as possible in order to generate good representations for simulation parameters. To be efficient, however, it is worthwhile to differentiate between the demand of accuracy for each parameter. System entities which do not In order to save time during input data management, it is important to take advantage of all previously collected data, the available category A data. Nowadays, continuous data collection is increasing significantly in industry, which is good for simulation projects. Unfortunately, simulation aspects have usually been ignored during the specification process of the collection systems and databases. Hence, companies incorrectly believe and claim that they have all data necessary for simulation, but on closer investigation they do not. Therefore, it is of crucial importance to go through all sources of available data to make sure that required data is possible to extract and that it is measured in a suitable way. This problem was experienced in several of the projects included in this study. To learn and

95 Skoogh and Johansson to understand the structure of all sources is many times a very time-consuming task (Skoogh and Johansson 2007). Available data can be found at many different places and in various formats. A significant share of available data stems from collection systems, automated or manual, which are used to follow up certain aspects of the system. For example order handling system, maintenance systems, staffing systems and other databases. Other common sources of data are systems used by other functions in the company, such as Enterprise Resource Planning (ERP) systems, Material Planning Systems (MPS) and Manufacturing Execution Systems (MES). Data can also be found in materials from previous analysis efforts, for instance frequency studies, Lean efforts, quality related projects or other system design processes. The result of this activity is a list of sources for parameters classified as category A data. Moreover, if some sources are computerized, instructions on how to extract each parameter shall be included. 4.4 Choose Methods for Gathering of Not Available Data If, as in most simulation studies, some or all data is not available beforehand (categories B and C), it needs to be either gathered or estimated. In this activity, gathering and estimation methods for data in these categories are defined and chosen. The choices will be the basis for the evaluation and decision in activity 4.5 (that all data will be found and that data collection can start). Note that no data collection starts in this activity, however, the choices will define how the actual gathering will be performed, later in activity 4.8. To gather data for a DES model can be done in many ways. The most common, and probably easiest way, is to use a stopwatch and start to walk along the product flow, measuring parameters for each and every step of the process. At each process step, and for each different product, measurements are made for all parameters identified in activity 4.1. This method is rather chaotic but swift to conduct. Care needs to be taken when considering where and when a process ends and another starts. Hence, it is important to adhere to the parameter definitions, also established in activity 4.1. Moreover, buffer capacities and conveyor speeds can also be collected this way. If more than one person will collect the data, make sure that exactly the same way of measuring is used. Other examples of manual gathering methods are frequency studies and video analyses. Time studies on a more detailed level are preferable if the project is to deliver more accurate results, or if all entities in the model do not yet exist in reality. It will, however, require more time spent during the input data phase of the project. MTM (Methods-Time Measurement 1973), SAM (Sequence-based Activity and Method analysis) (Johansson and Kinnander 2004) and DFA (Design for Assembly) (Boothroyd and Dewhurst 1989) studies can be used, for manual and automatic operations, during modification of existing or design of new assembly systems. For other new systems, it is recommended to use more process oriented simulation or emulation tools in order to create good quality input data for DES models. Cycle times can for instance be extracted from tools for offline programming of robots, PLC emulation, or code generation for NC- Machines. However, many times when the system does not yet exist (category C data), no information at all is available and parameter values have to rely on estimates. Robinson (2004) gives three options to support this kind of guesswork: discussions with subject matter experts like machine vendors or in-house production engineers, review of historical data from similar systems in the same, or another, organization and, finally, for some processes there are standardized data available that is previously measured and stored in process libraries. Another difficult situation for data gathering is when humans are involved. Humans will not act logically in all cases and are much more unpredictable than other parts of a system. Even though breakdowns of machines and other resources are unpredictable, they still tend to follow a distribution which can be modelled using random numbers to generate a failure. Gathering of data at manual stations also needs to be carefully planned in order to avoid the Hawthorne effect (Landsberger 1958) and to avoid annoying operators, which can jeopardize further cooperation. 4.5 Will All Specified Data Be Found? It is necessary to check that all parameters will be possible to find with regard to the outcome of previous activities, e.g. available data, possible gathering methods and required number of samples. Hence, the decision is not a straightforward yes or no decision; aspects on enough data points, data accuracy and data quality have to be considered. If mistakes are made in this step, there is a risk to suffer from them, for example in activity 4.10, since too few data points in this step will give a bad estimate on the probability function, or in 4.12, since low quality data could be invalid. If all data will be possible to find, one can proceed with a limited risk of future unnecessary iterations due to problems in the data collection process. On the other hand, if some parameters turn out to be impossible to collect, the accuracy requirements or the relevance of the parameter must be reevaluated. 4.6 Create Data Sheet A data sheet needs to be established in order to maintain coherence in the data collection process. All raw data, as well as all analyzed data, should reside at the same place,

96 Skoogh and Johansson usually a spreadsheet or, in large projects, a database. Unfortunately, many project teams try to save time by storing raw data in temporary spreadsheets and analyzed data directly in a simulation spreadsheet interface. Usually this approach gives the opposite effect due to lack of structure and loss of data, since information stored in the interface runs the risk of being overwritten. To use pre-defined data structures such as CMSD (Lee et al. 2007), is an efficient way to design appropriate data sheets. The CMSD data structure is based on a Unified Modeling Language (UML) scheme, from which an extensible Markup Language (XML) instance document can be generated, in order to store specific data for a model or a system. Many models can reside their data in the same XML document, if desired. 4.7 Compile Available Data In this activity, all data in category A is collected or extracted from the sources of available data, identified in step 4.3. Category A data can be found as raw data, for instance automatically measured cycle times or time-stamps stating start and stop times of breakdowns. However, category A data can also be previously analyzed and ready to use in simulation models, either as a result of previous projects or because the same data is used by other functions in the organization. Previously analyzed data is ready to await data validation in step 4.11, but raw data requires a lot more efforts. Based on the number of samples for all parameters, decided as a result of the accuracy requirements in step 4.2, a sufficient amount of data points are extracted from the sources, usually databases. Thereafter, additional calculations are often needed to convert the samples into a suitable form. For instance, to obtain TTR information from the breakdown time-stamps exemplified in the paragraph above, the stop times need to be subtracted from the start times. Moreover, a majority of cases requires some kind of filtering process, for example to exclude incorrectly measured samples or data points from shifts that do not represent a normal system state. The final result of this activity is sets of raw data points (for instance 230 individual cycle times from an assembly station) ready to analyze in order to prepare a statistical or empirical distribution in step 4.9. For pre-analyzed data, further preparation is usually not required and can, thus, be finally reported in the data sheet. 4.8 Gather Not Available Data This activity includes measurements of previously unavailable production data, but also estimation of performance for future equipment. Hence, it will change data from being category B or C (Robinson and Bhatia 1995) to become category A data. The input to this activity is which parameters to measure (from activity 4.1), how many samples to gather for category B parameters (from activity 4.2) and which gathering methods to use (from activity 4.4). For category B data, the activity might consume quite some time, since data gathering often equals manual work. If the system to be modeled has a high frequency of products, it might be quicker. However, if cycle times are long, data gathering is surely a time-consuming process due to the fact that more than 200 samples are often preferable (Perrica et al. 2008). To gather category C data, on the other hand, is usually less time-consuming if the assumptions are based on information from process experts. However, if the assumptions are based on historical data from similar processes, gathering of category C data can also be rather time-consuming. The result of this activity is, in conformity with activity 4.7, sets of raw data ready to analyze and prepare for simulation in step 4.9. For category C data, the results are often given on a form that is already suitable for simulation and can be finally reported in the data sheet. 4.9 Prepare Statistical or Empirical Representation The actual data collection in activities 4.7 and 4.8 results, as stated, either in already pre-analyzed data and/or in sets of raw data, which need a way to be represented in simulation models. For constant data, the analysis part is usually not very arduous, but data describing variability require some more efforts. The variability needs to be represented, generally using one of the following four options (Robinson 2004): traces, empirical distributions, bootstrapping, or statistical distributions. Of course all four options have pros and cons, which should be evaluated before a choice is made. However, three of the four options are quite straightforward using basic mathematics, but the fourth alternative, input modeling with statistical distributions, requires more attention. Moreover, the statistical representation is a very popular way to describe variability when possible and, hence, much research adopting this approach is available. Fortunately, there are numerous tools supporting the process of input modeling. ExpertFit and Stat::Fit are two examples and many of the commercial simulation software packages also hold functionality in input modeling. For those who have no access to one of these tools, it is necessary to do it the hard way. Leemis (2004) gives a good description of manual input modeling including the following steps: Assess sample independency Chose one or more distribution families to evaluate Estimate parameters, for example using MLE Assess model adequacy using a goodness-of-fit test Visualize the model adequacy using P-P or Q-Q plots The result of this activity is that the data sheet is completed with data representations that are ready to use in the simulation model.

97 Skoogh and Johansson 4.10 Sufficient Representation? The decision whether the representations delivered by activity 4.9 are sufficiently adequate is not always easy to make. At best, a chosen statistical distribution can be mathematically justified by passing a goodness-of-fit test, usually at level α = 0.05 (Perrica et al. 2008). However, especially for large number of samples, goodness-of-fit tests are very conservative and it is almost impossible for any representation to pass. Hence, it is very important for the simulation engineer to decide the required level of significance according to the accuracy requirements specified for each parameter. For this reason, graphical comparison of the representation and the original data might be preferable in some situations. Later during the simulation project, a sensitivity analysis can be made on representations with weaker correspondence to the real-world data. In this way, critical parameters are identified and additional investigations on data accuracy can be required for these factors. If representations are insufficient according to the accuracy requirements, additional data collection and analysis are needed. Other solutions are to change the representation of variability (se activity 4.9) or in worst case to reconsider the accuracy requirement for a specific parameter and consequently also for the entire simulation model Validate Data Representations Data validation is an important activity to make sure that all raw data is correctly measured and filtered, and that calculations and analyses during the preparation process are properly performed. The activity is very difficult according to Sargent (2005), who states that there is not much that can be done to ensure that the data is correct. One reason is that data in itself rather often is a part of the validation procedure. Nevertheless, to ensure face validity and to stick to good procedures along the entire data collection process is a good start. Face validity can be achieved by cooperating with process experts during the entire input data management phase and also setting up a final check towards the end, for instance using more structured interviews. Moreover, in addition to face validity there are other methods to validate the data before using it in the simulation model. One example is to evaluate data with regard to production followups; e.g., breakdown data can be compared to previously performed measurements on equipment availability. Sargent (2005) also mentions comparison to other models as a technique to validate entire simulation models. The technique can also be applied in data validation by comparisons to known results or to data in other, previously validated models, including similar equipment. Since the model always will be a simplified representation of the real system, it is of great importance to understand what data is crucial for model performance. As in activity 4.2, it is of course more important to make thorough validations on crucial parameters than on parameters of lesser importance. To finally make sure that no mistakes are made in the process of differing between central and non-central parameters, a sensitivity analysis can be performed once the model is built. Finally, it is important to notice that validation of the data will be done once more during model building, since the data will be a part of the model validation later, given that a project methodology such as those described in Law (2007) and Banks et al. (2004) is followed. Still, a good data validation is a very efficient way to reduce the need for late additional iterations of data collection, since possible mistakes are detected as early as possible. It is also easier to pinpoint the root cause of a failed separate data validation than in a complete model validation Validated? If the data validation succeeds for all parameters during activity 4.11, the representations are ready to use in the simulation model. Due to previously mentioned difficulties with data validation, the project team should remember that data can still be the problem causing a failed model validation later during the project. However, data validation is a good start that prevents many unnecessary future iterations of data collection. On the other hand, if the data validation fails for one or more of the parameters, there will be a need to step back and identify the cause of the problem. Many times the problems stem from miscalculations in the analysis and preparation activity (4.9) but sometimes further gathering or extraction of raw data cannot be avoided. On rare occasions one might need to go all the way back and reevaluate the chosen gathering methods Finish Final Documentation Documentation is a continuous process throughout the entire input data management phase, starting already in the first activity where parameters are identified and defined. Much of the information to document should already be available in the data sheet, including selected parameters, raw data, and finally chosen simulation representations. However, there are often things of importance for future referencing and reuse, which are not in the data sheet. For example, the sources of data, the gathering methods, the validation results, and all assumptions made during the input data process are all of great importance for maintaining future data validity. The final result of this activity is a data report and the completed data sheet. Both of them go into the final documentation of the entire simulation project.

98 Skoogh and Johansson 5 CONCLUSION The purpose of this study is to present a structured methodology for the input data management process in DES projects. The intention is to cover all aspects of the process, including identification, collection, and preparation of input data for simulation models. As a result, this paper proposes a structured methodology including 13 activities and their internal connections. During a review of 15 previously performed simulation projects within industry, a lack of a clear mode of operation for handling input data was identified. Moreover, the results show that a more structured way to work holds significant potential to increase both rapidity and quality in the input data phase of DES projects. Similar methodologies to the one presented in this study already exist for other parts of DES projects. For instance, Sargent (2005) outlines a set of activities for verification and validation of simulation models. Furthermore, even more known methodologies are available on a macro level, describing efficient ways to perform entire simulation projects, see for example Banks et al. (2004) and Law (2007). Simulation practitioners seem to find this kind of methodologies very helpful in their daily work, especially those who have not previously been involved in an extensive number of DES projects. However, it is important to highlight that there are previous contributions explaining detailed methods for collection and analysis of simulation data. For instance, Leemis (2004) and Law (2007) describe the process of input modeling, mainly from a statistical perspective. Moreover, Perera and Liyanage (2000) presents a methodology for rapid identification of input parameters. Hence, our work does not intend to give any contributions on this more detailed level. Instead, we focus on linking all activities within input data management in an efficient way. Some of the projects evaluated in this study are performed in companies with limited experience of DES. Additionally, many of the project members do not work with simulation on a daily basis as their only work assignment. The authors suppose that the profit of using the methodology is largest in such circumstances, since more experienced organizations and simulation engineers continuously discover and document efficient working procedures in an iterative manner. Still, there is always a risk of following an old route and, thus, the proposed methodology can be of value for these organizations as well. 6 FUTURE RESEARCH For future work, we will validate the proposed methodology and evaluate its impact on data quality and rapidity in input data management. Skoogh and Johansson (2007) have measured the total time-consumption in the input data phase of DES projects that did not follow any structured way of working during their data collection and preparation. We will introduce the proposed methodology in several simulation projects starting in upcoming years and measure the time-consumption. Consequently, the impact of the new methodology will be quantified. In parallel to the evaluation of the proposed methodology, our research group also works with development of a generic data management tool (the GDM-Tool). This work focuses on improving efficiency in companies that have advanced far into the implementation of well designed computer applications for logging and storage of production data. The tool is configurable to both standardized and custom made data sources and automates many of the time-consuming activities discussed in this paper, for instance data extraction and statistical analysis. ACKNOWLEDGMENTS The funding for this research is granted by VINNOVA (Swedish Agency for Innovation Systems, integrates research and development in technology, transport and working life.), and SSF (Swedish Foundation for Strategic Research). The authors would also like to thank all members of the DES projects included in the study. They shared a significant amount of time by taking part in the interviews. Edward Williams (University of Michigan and PMC, Dearborn, Michigan, U.S.A.) has kindly provided valuable suggestions for enhancing the presentation of this paper. REFERENCES Banks, J., J. S. Carson, B. L. Nelson, and D. M. Nicol Discrete-Event System Simulation, 4th ed. Upper Saddle River, New Jersey: Prentice-Hall Incorporated Boothroyd, G., and P. Dewhurst Product Design For Assembly. New York: McGraw-Hill, Inc. Denscombe, M The good research guide: for smallscale social research projects. Buckingham: Open University Press Ericsson, U Diffusion of Discrete Event Simulation in Swedish Industry. Doctorial dissertation, Department of Materials and Manufacturing Technology, Chalmers University of Technology, Gothenburg, Sweden. Hatami, S Data requirements for analysis of manufacturing systems using computer simulation. In Proceedings of the 1990 Winter Simulation Conference, ed. O. Balci, R. P. Sadowski and R. E. Nance, New Orleans, Louisiana Ho, C-F., W-H. Wu, and Y-M. Tai Strategies for the adaptation of ERP systems. Industrial Management & Data Systems 104:

99 Skoogh and Johansson Hollocks, B. W Discrete-event simulation: an inquiry into user practice. Simulation Practice and Theory 8: Johansson, B., J. Johnsson, and A. Kinnander Information structure to support discrete event simulation in manufacturing systems. In Proceedings of the 2003 Winter Simulation Conference, ed. S. Chick, P. J. Sánchez, D. Ferrin and D. J. Morrice, New Orleans, Louisiana. Johansson, B., and A. Kinnander Produktivitetsförbättring av manuella monteringsoperationer (in Swedish). Chalmers University of Technology report ISSN , Internapport 004:25. Landsberger, H. A Hawthorne Revisited. Ithaca: Cornell University Press Law, A. M Simulation modeling & analysis. 4th ed. New York: McGraw-Hill, Inc. Lee, Y. T., S. Leong, F. Riddick, M. Johansson, and B. Johansson A Pilot Implementation of the Core Manufacturing Simulation Data Information Model. In Proceedings of the Simulation Interoperability Standards Organization 2007 Fall Simulation Interoperability Workshop. Orlando, Florida: Simulation Interoperability Standards Organization, Inc. Leemis, L Building credible input models. In Proceedings of the 2004 Winter Simulation Conference, ed. R. G. Ingalls, M. D. Rossetti, J. S. Smith and B. A. Peters, Washington, D.C. Lehtonen, J-M., and U. Seppälä A methodology for data gathering and analysis in a logistics simulation project. Integrated Manufacturing Systems 8: MTMA Methods-Time Measurement. MTM Association for Standards and Research, Fairlawn, New Jersey. Pegden, C. D., R. E. Shannon, and R. P. Sadowski Introduction to simulation Using SIMAN, 2nd ed. New York: McGraw-Hill. Perera, T., and K. Liyanage Methodology for rapid identification of input data in the simulation of manufacturing systems. Simulation Practice and Theory 7: Perrica, G., C. Fantuzzi, A. Grassi, G. Goldoni, and F. Raimondi Time to Failure and Time to Repair Profiles Identification. In Proceedings of the 5 th FOODSIM conference. Dublin, Ireland Pidd, M Computer simulation in management science. 2nd ed. Chichester: John Wiley & Sons Pidd, M Tools for Thinking: Modelling in Management Science. 2nd ed. Chichester: John Wiley & Sons Randell, L. G., and G. S. Bolmsjö Database driven factory simulation: a proof-of-concept demonstrator. In Proceedings of the 2001 Winter Simulation Conference, ed. B. A. Peters, J. S. Smith, D. J. Medeiros, and M. W. Rohrer, Arlington, Virginia Robertson, N., and T. Perera Automated data collection for simulation?. Simulation Practice and Theory 9: Robinson, S Simulation: The Practice of Model Development and Use. Chichester: John Wiley & Sons Robinson, S., and V. Bhatia Secrets of successful simulation projects. In Proceedings of the 1995 Winter Simulation Conference, ed. C. Alexopoulos, K. Kang, W. R. Lilegdon and D. Goldsman, Arlington, Virginia Sargent, R. G Verification and validation of simulation models. In Proceedings of the 2005 Winter Simulation Conference, ed. M. E. Kuhl, N. M. Steiger, F. B. Armstrong and J. A. Joines, Orlando, Florida Skoogh, A., and B. Johansson Time-consumption analysis of input data activities in discrete event simulation projects. In Proceedings of the 2007 Swedish Production Symposium. Gothenburg, Sweden Trybula, W Building simulation models without data. In 1994 IEEE International Conferance on Systems, Man, and Cybernetics. Humans, Information and Technology, 1: IEEE Williams, E. J Making Simulation a Corporate Norm. In Proceedings of the 1996 Summer Computer Simulation Conference, ed. V. W. Ingalls, J. Cynamon and A. V. Saylor, New Orleans, Louisiana AUTHOR BIOGRAPHIES ANDERS SKOOGH is a PhD student in the field of Discrete Event Simulation at the Department of Product and Production Development, Chalmers University of Technology, Sweden. In 2005 he obtained his M.Sc. degree in Automation and Mechatronics from the same university. Anders has industrial experience of Discrete Event Simulation from his former employment as logistics developer at Volvo Car Corporation. His is Anders.Skoogh@chalmers.se BJÖRN JOHANSSON is an assistant professor at Product and Production Development, Chalmers University of Technology, currently also a guest researcher at National Institute of Standards and Technology in Gaithersburg, Maryland, USA. His research interest is in the area of discrete event simulation for manufacturing industries. Modular modeling methodologies, environmental effects modeling, software development, user interfaces, and input data architectures are examples of interests. His address is Bjorn.Johansson@chalmers.se

100

101 Publication II Skoogh, A., and B. Johansson Mapping of Time-Consumption During Input Data Management Activities. Simulation News Europe, 19(2):39-46.

102

103 +++ Time Consumption During Input Data Management Activites +++ t N Mapping of Time-Consumption During Input Data Management Activities Anders Skoogh, Björn Johansson, Chalmers University of Technology, Sweden {Anders.Skoogh, Bjorn.Johansson}@chalmers.se The success of a discrete event simulation project relies heavily on input data quality. In order to achieve high quality data a significant amount of time needs to be spent, either due to absence of data or problems with defining and extracting existing data from databases. This paper presents a distribution of the timeconsumption for the activities in the input data phase during discrete event simulation projects. The results show where efforts need to be focused to reduce time-consumption and improve quality of input data management. Introduction The competition between companies in all markets has increased considerably during the recent decades and it is getting more and more important to optimise the efficiency in production [1]. To improve productivity, some organisations use analysis tools like Discrete Event Simulation (DES) in major change projects as well as for continuous improvements. However, the input data needed to analyse the production is often not available, or at least, it takes plenty of time to collect and prepare the data for further analysis. DES is a powerful tool for productivity analysis and it is argued that input data management is the most crucial and time-consuming step in DES projects [2] [3]. The time spent on input data management is typically as much as 10-40% of the total time of a DES project [4]. This set-back sometimes tempts organisations to choose less complex analyses with lighter requirements on input data quality. As a result, these analyses yield results of poor, or at least, inferior quality. Few previous studies have closely mapped the input data phase in order to find the reasons for the heavy time-consumption [5]. Even fewer studies focus on identifying the input data activities which are most favourable to improve. The aim of this work is to identify the most time-consuming activities in the input data phase of DES projects. The results will show where to put important efforts in future research, in order to reduce time-consumption and increase quality of input data management. Not only in simulation projects, but also for projects using other production analysis methods. 1 Input Data Management in Discrete Event Simulation One always present step in DES projects is the input data phase, usually called Data Collection ; see for example the widely applied methodologies described in Banks et al. [3], Law and Kelton [6], and Rabe et al. [7] (Figure 1). These methodologies merely show the input data management step as a black box. However, in practice input data management includes several activities such as collection of raw data from various sources, transformation of data to information and documentation. Here, data is referred to as a set of discrete, objective facts about events [8] (e.g repair times for a machine). Information on the other hand, is slightly simplified defined as data with meaning [9]. In this case, information can be exemplified by a statistical representation of Mean Time To Repair (MTTR), which contains both relevance and purpose for the receiver (the simulation model). In this paper, the input data phase is described in more detail than on the black box level. We have divided the internal time-consumption within the input data phase into separate activities and measured the time-consumption for each activity. The focus on input data is surprisingly low in previous scientific contributions within the field of DES. Perera and Liyanage [5] is one of few contributions that really address the difficulties related to the input data management in DES projects. They rank the major pitfalls in input data collection as follows: 1. Poor data availability 2. High level of model details SNE 19/2, August

104 t N +++ Time Consumption During Input Data Management Activites +++ SNE 19/2, August Difficulty in identifying available data sources 4. Complexity of the system under investigation 5. Lack of clear objectives 6. Limited facilities in simulation software to organise and manipulate input data 7. Wrong problem definitions There is also lack of publications on systematic guidelines to overcome these issues and to reduce time-consumption in input data management (one is Bernhard and Wenzel [10]). Instead, earlier research performed on reduction of time-consumption in input data management has primarily focused on the level of human involvement in the process. A study made by Robertson and Perera [2] describes four alternative solutions for managing data for DES models: 1. Tailor-made solution Data primarily derived from the project team Data manually supplied to the model by the model builder Data resides in the simulation tool 2. Spreadsheet solution Data primarily derived from the project team Data manually supplied to the computer application (e.g. MS Excel spreadsheet) Data automatically read by the model via a computer application Data Resides in the computer application 3. Off-line database solution Data primarily derived from a Corporate Business System (CBS) Data automatically supplied to an off-line database from the CBS Data automatically read by the model Data resides in an intermediary simulation database 4. On-line database solution Data primarily derived from the CBS Data automatically supplied to the model from the CBS Data resides in the CBS 40 Figure 1. Steps in a simulation study [7]. The same publication states that solution 1 and 2 were most frequently used in industry, which is most likely still a valid statement. However, some research work and industrial applications have strived towards less human involvement in the input data management process. For example, some years ago the tendency shifted towards integration of systems, in which DES is one component that share data and information with many other applications within the same package. DELMIA from Dassault Systèmes [11] and SIEMENS Teamcenter [12] are two examples of such Product Lifecycle Management (PLM) software packages. Moreover, simulations driven by an off-line simulation database using input data from Enterprise Resource Planning (ERP) systems have also been performed [13]. This is one example of the contributions towards solution 3 and 4, described above. However, the situation remains; Robertson and Perera [2] state that: It is strongly argued that data collection is the most crucial and time consuming stage in the model building process. Therefore, this paper evaluates if this statement is still valid and shows where future efforts should be concentrated. This is done by summarising the time-consumption within DES projects in general, in the input data phase in specific and even more important in the activities of the input data phase.

105 +++ Time Consumption During Input Data Management Activites +++ t N 2 Material and Methods The study embraces the analysis of 15 DES projects performed between 2000 and The projects have been performed in a wide range of companies with regard to line of business, size of organisation and previous experience in DES. The plants in which the projects were performed are all located in the Nordic countries, mainly in Sweden. Both pure industrial cases and simulation projects performed in cooperation between industry and academia are included among the 15 projects. Semi-structured interviews [14] were conducted with members from each project in order to define the work procedure and activities in the input data phase of the projects. The agenda of the interviews was focused on the kind of problems, related to input data, which arose during the project. Furthermore, an additional aim with the interviews was to identify key factors for rapid and precise input data management, from a practitioner s viewpoint. The respondents were also asked to fill in a questionnaire where the time-consumption for the whole project as well as for each specific activity in the input data phase was specified. Moreover, information about availability and sources of input data in the projects were gathered in order to detect reasons of extensive time-consumption as well as factors for successful input data management. All times responded in the study are given in the unit mandays. One man-day equals to one eight-hour working-day for one person. For example, if two persons have spent two days to carry out a task together, the amount of time reported to this study is four mandays. The respondents were asked to write the time given with a resolution by minimum one man-day but if they were able to recall in greater detail they were allowed to answer in fractions of man-days. The authors compiled all collected information in a data-sheet and analysed it in order to map the timeconsumption for all activities and to find patterns in prerequisites and work procedures, which can reduce time-consumption in data management. The findings from the questionnaires were then combined with the information from the interviews. The results are presented in Section 4. 3 Input Data Management Activities In the presented analysis of time-consumption, the input data process in DES projects is divided into nine separate activities. Each activity consists of several tasks. The number of tasks and the way to execute each task can differ slightly between simulation projects because of differences in prerequisites and objectives. However, the work procedures are structurally very similar among simulation input data phases, and the activities defined below cover the process of all studied projects. Below, each input data management activity is briefly described to enable measurements of the timeconsumption. However, a more thorough description, including supportive guidelines, is provided in Skoogh and Johansson [15]. 3.1 Identification of Input Data Parameters The identification of required input data parameters has earlier been addressed as one of the key activities for successful input data management. The process is often performed in cooperation with people having expert knowledge of the modelled manufacturing process. The parameters to include are often dependent on project objectives, on model complexity and on level of model detail. Therefore, there is an ongoing interaction between construction of the conceptual model and identification of input data parameters [3]. 3.2 Accuracy Requirement Specification It is of great advantage if the project team can forecast each input parameter s impact on model behaviour. If accuracy requirements can be specified for each factor, the effort spent on information collection can be optimised. Accordingly, more resources and time can be assigned to important parameters instead of less central ones. As a result of this activity, the required number of unique data-points for each parameter is decided. 3.3 Mapping of Available Data Once the relevant parameters are selected, the project team needs to search for and map the input data already available, without need for manual gathering in the real world production system. Such available data can generally be found in simple manual systems (e.g. spreadsheets with previously performed time studies) or in more complex computer based systems such as ERP-systems, Manufacturing Execution Systems (MES) or other databases holding process information (e.g. time-stamps logged by Programmable Logic Controllers (PLCs)). However, it is hazardous to instantly rely on the applicability of information from this kind of systems, SNE 19/2, August

106 t N +++ Time Consumption During Input Data Management Activites SNE 19/2, August 2009 without further investigations. Despite that database specifications and people with extensive practical experience say that data is available, simulation engineers frequently find the data in a crude form or measured in a manner that makes it useless for simulation. Consequently, the activity of mapping available data includes identifying sources, understanding the sources and making sure that it will be possible to extract required data from the systems. 3.4 Choice of Gathering Methods When the available data has been mapped, a gap between required data and available data will be detected in most simulation projects. Hence, some additions will be necessary. In this activity the project team decides which methods to use in order to gather missing data from the modelled system. The choice will mainly depend on possibilities to measure and on the expected accuracy of each parameter according to earlier specifications (section 3.2). Examples of gathering methods are time studies, frequency studies and interviews. 3.5 Document Creation In order to store all data that will be collected from available sources or from real world measurements, a document needs to be created. A well-designed document helps to structure the data collection procedure. It also gives greater possibilities to reuse data in future studies and to make small adjustments if errors occur, or if the modelled system changes during the project time. 3.6 Data Collection The data collection activity can be divided into two parts. One is the extraction and compilation of available data from the identified sources. The other is to gather the missing data according to the previously specified methods (Section 3.4). Extract and Compile Available Data Despite the availability of data, some efforts are almost always needed to extract relevant information from the data sources. As mentioned before, more complex databases often contain data in forms that require some transformation before it can be used for further analysis in a simulation project. One example is breakdown data that is often logged in a crude form where start and stop times of all stops are stored. In this case, efforts are needed both to sort out the stops of interest for the analysis, and to calculate the absolute length of breakdowns. Gather Missing Data Many times this activity is fairly straight-forward since the procedure is well outlined in previously presented activity (Section 3.4). However, depending on the chosen methods, type of modelled process and requirements of accuracy, it can be a time-consuming activity. 3.7 Data Analysis and Preparation The outcome of the data collection activity is often a large set of data points, e.g. 100 measured cycle-times or 2000 repair times extracted from a maintenance database. In the data analysis and preparation activity, the way to represent the data in the simulation model is selected. Regardless of whether an empirical or statistical representation is chosen, some preparations are performed in this activity. For example, the statistical representation requires fitting the data set to a statistical distribution. 3.8 Data Validation Before the data is used in the simulation model, a separate data validation activity helps to ensure accuracy in further analysis. An early control of the data representations correctness usually saves iterations in later model validation, where more sources of error are involved. The data representation can be validated using production follow-ups or expert knowledge, e.g. Turing tests [16]. 3.9 Final Documentation It is important to document the results of the input data phase, since they are of vital importance for the model outputs and furthermore for the decisions taken with reference to the analysis. The final documentation is also necessary in order to make future simulation projects less time-consuming by enabling reuse of input data. 4 Survey and Interview Results The results and analysis section is divided into two parts. The first part presents the analysis of timeconsumption for input data activities and the second part shows the data availability in the studied DES projects. 4.1 Analysis of Time-Consumption The respondents were asked to assess the time spent in each of the activities during the input data phase of the 15 DES projects included in the study. The percentages of time in each activity with regard to the duration of the entire input data phases are presented

107 +++ Time Consumption During Input Data Management Activites +++ t N Project Input data parameter identification Accuracy requirement specification Mapping of available data Choice of gathering methods Document creation Data collection Data analysis and preparation Data validation Final documentation # 1 12% 2% 2% 2% 0% 60% 12% 4% 5% # 2 3% 0% 7% 7% 1% 51% 7% 22% 0% # 3 5% 2% 12% 1% 2% 63% 1% 6% 6% # 4 5% 2% 4% 5% 5% 61% 7% 5% 5% # 5 3% 3% 12% 1% 6% 57% 12% 0% 6% # 6 3% 0% 15% 3% 5% 58% 8% 5% 5% # 7 1% 4% 2% 2% 1% 40% 25% 12% 12% # 8 9% 0% 9% 4% 9% 52% 9% 4% 4% # 9 5% 0% 9% 5% 5% 45% 23% 5% 5% # 10 4% 4% 9% 4% 7% 50% 7% 9% 7% # 11 33% 11% 11% 7% 0% 24% 2% 11% 2% # 12 14% 7% 14% 11% 7% 21% 14% 4% 7% # 13 5% 0% 10% 10% 5% 50% 10% 5% 5% # 14 5% 3% 13% 5% 8% 56% 3% 5% 3% # 15 10% 0% 21% 0% 0% 62% 8% 0% 0% Average 8% 2% 10% 4% 4% 50% 10% 7% 5% Table 1. Time-consumption for each input data activity with regard to the entire input data phase. in Table 1. Around half of the input data management time is used for actual data collection, both from available sources and from manual gathering. Mapping of available data together with data analysis and preparation are the other two activities on the topthree ranking of time-consuming activities. It is not surprising that the collection activity claims a significant amount of time. Some more detailed findings about the most time-consuming parameters, and how data availability influences time-consumption, will be further examined later in this section. However, the fact that mapping of available data is a topthree ranked activity is more conspicuous. But information from many of the respondents is very similar and claims that the major reasons are the complexity of the data sources and that the available data is not collected and stored in a way that is ready for use in simulation models. Hence, a lot of time is needed to understand the data sources and to ensure that the data is relevant in the specific case. Ensuring that it will be possible to extract and transform required data into a suitable representation for the simulation model, also adds to the extensive time-consumption. Table 2 shows the ranking among input data parameter classes with regard to required collection time. Process times, breakdown data, set-up times, tool changes and material handling data are all straightforward parameter classes, but production planning and organisational data contain some sub-types. Information needed for production planning incorporates data such as production schedules, arrival patterns of incoming parts, and sales data. Organisational information contains data about staffing plans, shift schedules and breaks. Note that the sum of the timeconsumptions for all parameter classes is not equal to 100% since all classes are not applicable in every studied project. Interview responses indicate that the reason for the heavy time-consumption for process data depends on problems with defining the process delimitations, e.g. when a cycle starts and stops. For breakdown data, the corresponding problem is to sort out the stops of interest for the simulation study, among all other kind of logged process-disturbances in the IT-systems. Both process data and breakdown data often include large amounts of data since they are considered to be particularly important for model performance and dynamics. 4.2 Data Sources and Availability of Information in DES Projects The availability of data necessary for production analysis is not satisfying in most of the studied DES projects. Only one of the 15 cases had all data available when the project started, and combined with a study performed by Johansson et al. [17] it is obvious that insufficient work has been performed in order to Time-consumption Parameter class (percentage of the entire input data phase) Process times 42% Breakdown data 32% Production planning data 19% Material handling data 14% Set-up times 12% Tool-change times 8% Organizational data 7% Table 2. Required time efforts for collection of input parameter classes. SNE 19/2, August

108 t N +++ Time Consumption During Input Data Management Activites +++ Parameter class All data available No data available Combination of available and manually gathered data Process times 33% 27% 40% Breakdown data 64% 9% 27% Production planning data 18% 55% 27% Material handling data 0% 62,5% 37,5% Set-up times 22% 44% 34% Tool-change times 20% 80% 0% Organizational data 40% 40% 20% Table 3. Percentage of studied projects having all, none or parts of the needed input data available. data for use in production analysis tools like DES. It is obvious that no evident progress has been made to reduce the time-consumption for input data management in recent years. For instance, this study shows that the time-consumption for input data management in DES-projects is still 31% on average, which is a high percentage compared to older studies. The opinion is also supported by the fact that only 7% of the studied projects had all required input data available when the project started. This is almost the same availability ratio as Johansson et al. [17] found six years ago (6%). 44 SNE 19/2, August 2009 support analyses with proper input data. Two projects out of the 15 had no data at all to start with, and had to gather all data manually. Table 3 shows the data availability for each input parameter class, presented as the percentage of projects having all, none or parts of the required data available. As seen in Table 3, breakdown data is the category that is most frequently collected and stored, followed by organisational data and process times. Contrarily, material handling data was not fully available in any of the projects. It is important to note that it is not relevant to directly compare time-consumption for different parameter classes, since the amount of raw data and importance for model performance varies significantly among the classes. Therefore, one should not draw the conclusion that data availability is insignificant for the time-consumption because breakdown data collection takes more time than to gather data for material handling equipment (Table 2), despite the fact that breakdown data has higher availability. The study results rather show that a large share of available data has a positive correlation with rapidity of input data collection. One single example is that the only project having all needed data available is also the project with lowest percentage (12%) of time spent in the input data phase. Investigating the actual time for collection of input data in projects with full data availability compared to projects that fully or partly include manual gathering supports the same conclusion. To illustrate, the mean time required for collection of process times was less than one week when data was fully available and slightly more than three weeks when manual gathering was needed. 5 Discussion The survey and the interview results clearly show the difficulty for companies to effectively manage their Two of the top-three time-consuming input data activities both shed light on the same difficulty in input data management at present. Both problems with actual data collection (50%) and mapping of available data (10%) indicate a potential for reduced timeconsumption by implementation of intelligentlydesigned computer-based data sources. According to the findings presented in the results and analysis section, companies can gain a lot of time in production systems analysis by keeping track of data describing their processes. This in turn enables DES to be used more frequently; hence increased performance in production is achieved. There are several ways of continuously having up-to-date information available, some examples are automated PLC-logging or previously performed time studies stored in databases. However, it is very important to note that the design of the majority of existing databases is not developed with the needs of analysis tools like DES in mind. No less than 10 out of 13 projects in the study, having some available data at hand, reported problems with extracting relevant information from the databases due to problems with understanding the data structures, mapping relevant data for their specific application and sorting out the information needed among an often huge set of data. These findings are also supported by earlier research performed by Perera and Liyanage [5]. Moreover, companies often overestimate their ability to provide data for analysis tools like DES, which might be a result of the extensive information flow in present production systems. However, when the projects start they frequently lack important data or find that data is measured and stored in a way that is unsuitable for simulation models. Consequently, a lot of time needs to be spent on identification of relevant information and on recalculations or complementary measurements. This common statement of respon-

109 +++ Time Consumption During Input Data Management Activites +++ t N dents has resulted in problems with keeping the time plans for input data management in the studied projects. Only 20% of the projects reported that their input data phases where completed in time. Requirements stated above are not just based on a DES perspective but also on the viewpoint of other production analysis methods. Companies could gain much productivity by keeping track of their production data more carefully. One way is to design future data systems having the viewpoints of production analysts in mind. But not to forget, today s purposes with the systems are also important to support, e.g. maintenance and process control. There are some factors in the study that might affect the precision of each individual case study result. Since the exact number of hours responded in the study was not documented in all cases, the reported time-consumption is dependent on each respondent s perception and memory. However, the possible impact of this factor is reduced by the choice of recently performed projects, for example 13 of the 15 projects are performed within two years from when the questionnaires were completed. Moreover, it is important to remember that the purpose of the study is to identify time-consuming activities and serve as a guideline for future research, rather than presenting the exact number of hours needed to carry out the activities. To increase the precision of the study some more samples would have been favourable to add. Another factor that has been hard to determine in every specific project is the input data precision and quality. Consequently, it s hazardous to exclude the quality dimension s influence on time-consumption from the survey results. However, all projects managed to validate their models according to the real world system, which indicates that the data quality was satisfying in all cases. Many of the projects (73%) also validated the input data separately to production follow-ups or to process expert knowledge. 6 Conclusions To summarise the findings from this study, some results deserve to be highlighted: The work to increase the support of input data to production analysis has not yet resulted in successful implementations in industry. The time needed for input data management in DES projects is still around 31% of the total project duration. Moreover, the percentage of companies having all data available for DES projects is as low as 7%. The three most time-consuming input data activities are data collection, mapping of available data, and data analysis and preparation, respectively. One major reason for the heavy timeconsumption is the need for manual gathering due to insufficient data availability. Another reason is the complex design of many computer based data systems, which slows down the identification of available data as well as the extraction of information from the systems. There is also a newly published paper related to this contribution [15], which proposes a methodology for increased efficiency in input data management. It aims to improve the present working procedures (mapped above) by describing good practice guidelines for each activity. Acknowledgements The funding for this research is granted by VIN- NOVA (Swedish Governmental Agency for Innovation Systems), and SSF (Swedish Foundation for Strategic Research). The authors would also like to thank all members of the DES projects included in the study. They shared a significant amount of time by taking part in the interviews. Additionally, Cecilia Berlin (PPU, Chalmers) has provided valuable suggestions for enhancing the presentation of this paper. References [1] H. Driva, K.S. Pawar, U. Menon: Measuring product development performance in manufacturing organisations. International Journal of Production Economics, vol. 63, 2000, pp [2] N.H. Robertson, T. Perera: Feasibility for automatic data collection. In: B.A. Peters, J.S. Smith, D.J. Medeiros, M.W. Rohrer (eds.): Proceedings of the 2001 Winter Simulation Conference (Washington, DC), IEEE, 2001, pp [3] J. Banks, J.S. Carson, B.L. Nelson: Discrete-Event System Simulation (2nd ed.), Prentice-Hall, [4] W. Trybula: Building simulation models without data. IEEE International Conference on Systems, Man, and Cybernetics. Humans, Information and Technology, vol. 1, 1994, pp [5] T. Perera, K. Liyanage: Methodology for rapid identification and collection of input data in the simulation of manufacturing systems. Simulation SNE 19/2, August

110 t N +++ Time Consumption During Input Data Management Activites +++ SNE 19/2, August 2009 Practice and Theory, vol. 7, 2000, pp [6] A.M. Law, W.D. Kelton: Simulation Modelling and Analysis (3rd ed.), McGraw-Hill, [7] M. Rabe, S. Spieckermann, S. Wenzel: A New Procedure Model for Verification and Validation in Production and Logistics Simulation. In: S.J. Mason, R.R. Hill, L. Mönch, O. Rose, T. Jefferson, J.W. Fowler, (eds.): Proceedings of the 2008 Winter Simulation Conference (Miami, Florida), IEEE, 2008, pp [8] T.H. Davenport, L. Prusak: Working Knowledge: How Organisations Manage What They Know, Harward Business School Press, [9] R. Van der Spek, A. Spijkervet: Knowledge Management: Dealing Intelligently with Knowledge. In: J. Liebowitz, L. Wilcox (eds.): Knowledge Management and its Integrative Elements, CRC Press, [10] J. Bernhard, S. Wenzel: Information Acquisition for Model Based Analysis of Large Logistics Networks. In: Y. Merkuryev, R. Zobel, E. Kerckhoffs (eds.): Proceedings of the 19 th European Conference on Modelling and Simulation (Riga, Latvia), ECMS, 2005, pp [11] Dassault Systèmes: DELMIA /products/delmia/ [12] SIEMENS: Teamcenter [13] L.G. Randell, G.S. Bolmsjö: Database Driven Factory Simulation: a Proof-of-concept Demonstrator. In: B.A. Peters, J.S. Smith, D.J. Medeiros, M.W. Rohrer (eds.): Proceedings of the 2001 Winter Simulation Conference (Washington, DC), IEEE, 2001, pp [14] M. Denscombe: The Good Research Guide for small-scale social research projects, Open University Press, [15] A. Skoogh, B. Johansson: A Methodology for Input Data Management in Discrete Event Simulation Projects. In: S.J. Mason, R. Hill, L. Moench, O. Rose (eds.): Proceedings of the 2008 Winter Simulation Conference (Miami. Florida), IEEE, 2008, pp [16] L.W. Schruben: Establishing the Credibility of Simulation. Simulation, vol. 34, 1980, pp [17] B. Johansson, J. Johnsson, A. Kinnander: Information structure to support Discrete Event Simulation projects. In: S. Chick, P.J. Sánchez, D. Ferrin, D.J. Morrice (eds.): Proceedings of the 2003 Winter Simulation Conference (New Orleans, Louisiana), IEEE, 2003, pp Corresponding author: Anders Skoogh Chalmers University of Technology. Dept. of Product and Production Development Hörsalsvägen 7A, Gothenburg, Sweden anders.skoogh@chalmers.se 46

111 Publication III Skoogh, A., T. Perera, and B. Johansson. Submitted. Input Data Management for Simulation - Industrial Practices and Future Trends, Simulation Modelling Practice and Theory. (Submitted for publication.)

112

113 INPUT DATA MANAGEMENT IN SIMULATION - INDUSTRIAL PRACTICES AND FUTURE TRENDS A. Skoogh a, T. Perera b, and B. Johansson a achalmers University of Technology, Gothenburg, Sweden bsheffield Hallam University, Sheffield, UK Corresponding author: Anders Skoogh, Department of Product and Production Development, Chalmers University of Technology, Gothenburg, Sweden. Tel.: +46 (0) , anders.skoogh@chalmers.se. ABSTRACT Discrete Event Simulation has been acknowledged as a strategically important tool in the development and optimization of production systems. However, it appears that companies are failing to reap full benefits of this powerful technology as the maintenance of simulation models has become very time-consuming, particularly due to vast amounts of data to be handled. Hence, an increased level of automation of input data handling is highly desirable. This paper presents the current practices relating to input data management and identifies further research and development required to achieve high levels of automation. A survey of simulation users shows that there has been a progress in the use of automated solutions compared to a similar study presented by Robertson and Perera in The results, however, reveal that around 80% of the users still rely on highly manual work procedures during input data management stage. Keywords: Simulation Data, Input Data Management, Data Collection, Integration, Interface, Enterprise Resource Planning (ERP). 1. INTRODUCTION Discrete Event Simulation, referred to as simulation hereafter, has proven to be the best modelling tool available for analyzing and improving performance of manufacturing systems. Over 60 years of simulation presence in manufacturing, has led to a wide spectrum of successful applications in different areas such as design, planning and control, strategy making, resource allocation, and training [5]. Proliferation of affordable and user-friendly simulation systems has immensely contributed to this rapid growth of applications. Ever increasing competitiveness and the need to reduce costs and lead times are continuing to drive the wider use of simulation. Producing credible simulation outputs within acceptable timescales is a key challenge. In order to ensure that all key steps are followed, various simulation project management frameworks have been produced [2][8][12]. Although there are some slight variations, all simulation project management frameworks embody key steps such as input data collection/analysis, model building, validation, verification etc. Most of these steps interact with either input or output data. Consequently, management of data within simulation projects often becomes a major challenge. Multiple scenario analysis, a typical use of simulation models, further escalates this problem as further data sets are added.

114 Within the context of this data management problem, collating, analysing and systematically recording simulation input data are vitally important. As the driver of simulation models, input data sets must be complete and accurate. If simulation models are to be re-used then it is also necessary to keep the data sets up-to-date. This is a time-consuming process and, consequently, the re-use of simulation models are often abandoned [15]. Robertson and Perera [14] extensively discuss the issued involved in handling input data and explore options available to gather and record input data. Their study also included a survey of data management practices. Since then there have been major shifts in simulation related software in terms of managing simulation data. Therefore, it is timely to review whether those shifts have a made an impact on practices. This paper aims to identify and discuss the changes in practices as the results of advances in the input data management process itself and in associated support systems such as manufacturing databases and simulation software. 2. INPUT DATA MANAGEMENT As shown in Figure 1, Input data may come from a variety of sources. Corporate Business Systems (CBS) such as Enterprise Resource Planning (ERP) typically host most of operational data. As an example, ERP systems deployed in manufacturing environments can provide key operational data such as machining times, set-up times, bill-of-materials etc. Simulation models may also use project specific data, which typically come from the simulation project team. They can include data items such as sales forecasts and future manpower levels. There are also instances where data needs to be gathered from external reference systems. Data relating to new machinery, for example, may have to be obtained from machine tool manufactures. The three sources discussed so far may not provide all necessary input data. In such situations, model builders need to observe/monitor processes and gather relevant data. For example, it may be necessary to collect a large amount of machining time samples to generate appropriate distributions. Figure 1: Input Data Sources.

115 2.1 DATA ISSUES The existence of multiple data sources, in combination with several inherent difficulties in the data collection process, present a number of challenges [14]: Data accuracy as data come from a variety of internal/external data sources, the accuracy of data may be questionable. This means extra effort needs to be made to ensure data accuracy. Data duplication same data items may come from different sources. For example, machining time for a specific job may come from ERP but it is quite possible that machine operators have their own records of machining times, possibly more accurate than ERP system. Again when data duplication occurs, models builder needs to make judgement on the most reliable source of data. Data timeliness different data from various sources are required depending on model purpose. A simulation model, continuously used for production systems development, can live through several iterations in both systems design and systems management. The model builder is therefore forced to collect data at multiple occasions from different systems. For example, a machining time may be estimated by the machine vendor in systems design and available in the ERP system or gathered directly from the shop floor in systems management purposes. 2.2 DATA INPUT METHODOLOGIES Once appropriate data sources and data items are identified, the next challenge is the find the best way to manipulate and store data. As shown in Figure 2, four possible methodologies can be used by simulation teams according to Robertson and Perera [14]. Methodology A has for many years been the most popular approach [14]. The project team, and especially the model builder, manually compile and process data from several sources. This work typically includes gathering of raw data from the shop floor, collecting data from computer-based systems, and interviewing individual domain experts. After further processing, the data is directly recorded within the simulation model. Although this is a simple approach, it presents a range of problems. Data is typically scattered within the simulation model hence locating data items can be very time-consuming. It is also not easy to spot any mistyped data values and the flexibility is limited for updating data between model iterations. In addition to its simplicity, the major benefit of Methodology A is the data are verified by the model builder when entered into the model code. Methodology B overcomes some of the limitations of the Methodology A. Instead of direct entry, data is stored in an external source, normally a spreadsheet. This makes data manipulation and validation much easier, which increases flexibility and facilitates model updates. Data is transferred to the simulation typically via VBA (Visual Basic for Applications) routines. However, the model builder and/or the project team still manually perform the collection and processing of data. Methodology B is currently a very popular approach as it enables simulation model users to manipulate input data through a spreadsheet based interface. Moreover, models can be run without specific knowledge and experience in model building.

116 Figure 2: Alternative methodologies for data manipulation and storage. Methodology C is an extension of Methodology B where the external data sources are linked to the model data storage in order to enable automated updates. The external data sources are typically exemplified by databases within the CBS and the model data is usually stored in an intermediary simulation database. Since the data are stored externally to the model, the same flexibility as for Methodology B applies. Thus, the intermediary step enables data processing and provides the possibility to set up what-if scenarios despite the close integration to external sources. The increased level of automation (compare Methodologies A and B) holds substantial potential to reduce the time taken to manage the input data. The major difficulty with increased integration and automation is to ensure the data validity and convince the stakeholders that the results are credible. Despite the advantages outlined above, only one industrial test implementation was identified by Robertson and Perera [14]. One example among Scenario 3 implementations is the concept of Manufacturing Data Acquisition (MDA). MDA solutions usually include the necessary equipment for raw data collection and use the intermediary database for data storage and basic data processing functionality; see for example [1].

117 Methodology D eliminates the need for an intermediate data source because the simulation model is directly linked to the relevant data sources. During model development, system entities are referred to sources within the CBS, which dramatically reduces time, effort and errors. The primary drawback is the limited availability of detailed simulation data in major databases (e.g. ERP) [10]. Consequently, Methodology D implementations tend to be extensive and complex. Additionally, there is a risk for data duplication due to the substantial amount of connections to various data sources. Robertson and Perera [14] identified one real-world case of this scenario. The implementation was intended for systems management purposes. An example of the close integration characteristic for Methodology D is the Product Lifecycle Management (PLM) packages containing simulation software [6]. 2.3 SUPPORT FOR INPUT DATA MANAGEMENT IN SIMULATION SOFTWARE PACKAGES The support for input data management, provided by commercial simulation software packages, has for years been described as insufficient. Perera and Liyanage [12] reported that simulation practitioners consider the limited facilities in simulation software to organize and manipulate data as one of the major pitfalls. Such missing facilities, required in data processing, include: extraction and categorization of data points, identification and removal of erroneous samples, correction of data formats or individual values, calculations (e.g. the time between failures), condensation to a suitable representation, etc. In addition to these common data processing operations, Robertson and Perera [14] argue for better integration between simulation software and ERP systems to facilitate complete automation of data collection (Methodology D). Fortunately, there have been advances during the latest decade. Several simulation software packages now provide solutions to facilitate the bi-directional transfer of data with external sources. The embedding of VBA is one example and some vendors also provide direct links to spreadsheets and databases without the need for VBA. As a result, the number of case studies describing simulation models fed with deterministic data from ERP systems and other external sources has increased. It is, however, more difficult to find similar implementations covering the more extensive handling of data for stochastic parameters. Mertins, Rabe and Gocev [9] exemplify that it is often considered more appropriate to use an intermediary application (e.g. database or spreadsheet) for compilation of data from different sources and for performing the required data processing operations. It should here be clarified that ERP systems seldom contain all necessary data for simulation [10]. The other significant improvement is that most packages now provide support for distribution fitting [17], either by means of self-developed analysis functionality or in cooperation with special purpose software, e.g. Stat::Fit [3] and ExpertFit [7]. As indicated in the previous paragraph, it is still a complex task to automate the complete chain of extraction, correction, calculations and condensation within the simulation software, especially for stochastic representation of varying processing times or breakdown patterns when raw data contains an extensive number of samples. Therefore, the distribution fitting functionality is frequently applied as a separate step. 3. SURVEY DESIGN Given the developments in simulation software and other support systems for input data management, the authors wanted to investigate what impact they have made on simulation projects. It was therefore decided to repeat a survey published in 2002 [14]. The new survey was initiated during the Winter Simulation Conference (WSC) in Baltimore, Maryland, USA,

118 December WSC is one of the world s major forums for simulation specialists representing industry, academia and government. A questionnaire was distributed to conference attendees and the industrial representatives were asked to answer 12 questions (see Table 1) about the simulation procedures at their specific companies, mainly focused on input data management. Researchers with close connection to industry (i.e. through a recent case study) were also asked to complete the form with information obtained at the case study company. Reminders were sent out by containing a link to a web-questionnaire (exact copy of the original form). Responses from 86 companies were collected, including different business areas such as: manufacturing, logistics, health care and military applications. Data were analyzed using descriptive statistics to show how many companies using the different approaches to automated input data management etc. Note that there are significant similarities between the questions designed by Robertson and Perera [14] and the questionnaire used in this study. This overlap enables comparison between the studies in order to map the progress of input data management during the last decade. Table 1: Questions included in the survey. Survey Questions 1. Please specify your major areas of application for DES. 2. What makes you to use simulation in your business? 3. Do you apply a structured approach to input data management? 4. Which is the main source of input data to DES models? 5. Which sources of input data are commonly used? 6. What is your major approach for selection between duplicate data sources (if you have multiple sources for the same data item)? 7. How is data accuracy, reliability and validity mainly assured? 8. Models develop and evolve, how is data validity maintained? 9. How is data (information) supplied to the simulation model? 10. Where is the majority of data (information) held, i.e. where does the processed data reside? 11. Consider the entire input data management process, which is the most common methodology? Please explain your answer! 12. Which methodology do you think will be used in ten years? Please explain your answer! 4. SURVEY RESULTS The main data collected from the survey is summarized in Table 2. Additional reflections on interested findings are presented in more detail to bring forth new interesting findings in order to elaborate on them in the discussion chapter. Some of the questions in the survey are stated the same way or in a very similar way as was done by Robertson and Perera [14] almost ten years ago. The correlation and development during the last decade regarding industrial use of simulation are then presented. Additionally, development trends and future lookouts for the use of different methodologies for input data management in industry are presented from the concluding two questions is the questionnaire.

119 Table 2: Complete questionnaire results. Question All Mfg. Answers and Alternatives 1. Application area Manufacturing, 14 Health care, 11 Logistics, 9 Military applications, 4 Finance and business, 4 Academia, 3 Human resources, 2 Energy, 4 Other. 2. Why use simulation? 3. Structured approach to input data mgmt? 4. Main source of input data. 5. Common sources of input data. 6. Approach for selection between duplicate data sources. 7. Methods for assuring data accuracy, reliability and validity. 8. How is data validity maintained? 9. Supply of data to the simulation model? 10. Where is the majority of data stored? 11. Current input data management methodology. 12. Input data management methodology used in 10 years. 35% 57% 8% 44% 56% 18% 15% 2% 32% 33% 53% 66% 16% 67% 58% 8% 19% 9% 18% 33% 11% 2% 19% 13% 7% 5% 52% 4% 21% 27% 17% 19% 15% 23% 48% 23% 5% 1% 24% 2% 63% 10% 1% 17% 61% 21% 1% 4% 20% 40% 37% 40% 54% 6% 37% 63% 14% 11% 0% 34% 40% 57% 74% 20% 80% 77% 3% 29% 9% 20% 37% 3% 0% 29% 9% 3% 3% 54% 3% 9% 31% 9% 14% 20% 14% 57% 23% 3% 3% 20% 0% 74% 6% 0% 17% 63% 17% 3% 3% 18% 41% 38% Simulation is used to address a specific business need such as design of a new factory. Model is not re-used once the project is completed. Simulation is regularly used to improve business operations. Models are often re-used. Use of simulation is mandatory within the business in every improvement project. Yes. If yes, please specify what is used. No Manual gathering (e.g. stop watch, movie recording). People based systems (e.g. interviews, expert knowledge). Paper based systems (Brochures etc.). Local computer based systems (e.g. spreadsheets). Computer based corporate business systems (e.g. ERP, MES, PLM). Manual gathering (e.g. stop watch, movie recording). People based systems (e.g. interviews, expert knowledge). Paper based systems (Brochures etc.). Local computer based systems (e.g. spreadsheets). Computer based corporate business systems (e.g. ERP, MES, PLM). Data duplication is never encountered. Select the most recent data. Base the selection on personal experience. Combination of data sources. Base the selection on team knowledge. Select data most local to the source/origin. Other. Interviewing area experts. Basic sanity checks. Personal experience. The internal or external customer s responsibility. Model validation runs. Other. Continuous manual efforts for data collection. Manual efforts for data collection, only initiated when the model will be used. Automated collection for parts of the data. Continuous automated collection of all necessary data. Models are not maintained and reused. Manually written in the model code. Via an external spreadsheet (automatically connected to the model) or similar. An off-line database automatically connected to the model. Direct link between corporate business systems and simulation model. Other. In the simulation model. In a paper based system. In a local computer based system (e.g. a spreadsheet). A computer based corporate business system (e.g. ERP, MES). Other. Methodology A Methodology B Methodology C Methodology D Methodology A Methodology B Methodology C Methodology D

120 In this survey as well as in Robertson and Perera [14], most respondents represent companies from the manufacturing area. Manufacturing applications are therefore separately presented in Table 2. However, there are also many answers from areas such as health care, logistics and military. The reader can easily see that the difference in methodologies and work procedures is limited between different application areas. Some findings from Table 2 need to be highlighted and further described. The results of question 3 show that almost two out of three companies lack structured procedures to input data management and, thus, adapt their work procedure to specific model characteristics. Among companies using structured approaches, the most common are data templates and checklists. Another interesting finding is that the main source of data is reported to be a computer-based system in 65% of the companies (74% for manufacturing), which is promising for automated input data management. However, it is also reported that several sources are required to find all necessary data for simulation. A majority of companies is dependent on manual gathering and people-based systems (e.g. interviewing domain experts). A detailed data analysis shows that 80% of the companies reported that more than one type of data source is needed to satisfy the data extensive data requirements in dynamic simulations. Table 3: Comments on present methodology collected from question 11. Now Methodology A Methodology B Methodology C Methodology D Used -Common business -Reduce risk to tamper for/since practice data - The connectivity is not easy in other methodologies. -Small scale models -Few data items needed -Elementary -More accurate -Simpler projects -Analyze data very clearly -Expert providing interface for others -Lack of experience in automated tools -Effortless in terms of computer knowledge -Supported by simulation software - Unavailable or inconsistent data in ERP systems -Security reasons -Easy to implement and update data -Presently the best technology available -Most comprehensive solution -Most automatic that is feasible - Possible to set up what-if scenarios - Extensive amounts of data -All the software modules within a company can communicate with each other. This also includes DES software. Table 3 shows the most common motivations to use the Methodologies A to D stemming from the survey results. Comments on future use of these methodologies shows setbacks and positive aspects of them as presented in Table 4. Both Table 3 and Table 4 show real comments from the respondents, which means that statements can be contradicting each other. The variations are typically dependant on how the respondent is utilizing simulation and the software choices.

121 Table 4: Comments on the use of future methodologies collected from question 12. Future Methodology A Methodology B Methodology C Methodology D Pros -Elementary -Easy input/output -Intermediate database -Less time consuming simulation as black box necessary Cons -Manual methods will still be used -Too much work -Cumbersome -Danger to tamper data - No support for data processing -Easy to use for the masses -Convenience -Workability -No real live update of most recent data -Ad Hoc. -Unstructured -Faster and more transparent -Data not tampered - Data processing required -Too difficult - ERP, MES and digital factory systems will merge into Enterprise Lifecycle Systems. -Data not tapered -Too difficult -No need to automate all steps -Manual interaction necessary - Data availability Figure 3 shows the trend in modeling methodologies used from 2000, 2010 and future prediction for The data from 2000 is described in Robertson and Perera [14]. The other two data sets are collected from the questionnaire results, namely questions 11 and 12; see the questions in Table 1 and the results in Table 2. During 2000 the Methodology A (manual input directly into the model) was the most used one with about 60%. Now, ten years later the most frequently used approach is Methodology B (spreadsheet connected to the model), which is in use by just above 60% of the practitioners. The prediction about the future shows that an increasingly automated data treatment is to expect. Methodology C (intermediary database connected to the model) is predicted to be most popular, with just above 40% of the practitioners. The increasingly automated data management, however, has some disadvantages expected by some practitioners; see Table 4. These problems create a more doubtful future scenario, which is discussed in the discussion section of this paper. 100% 90% 80% 70% 60% 50% 40% 30% Methodology D Methodology C Methodology B Methodology A 20% 10% 0% Figure 3: Trend on data input methodologies used 2000, 2010 and prediction for 2020.

122 Another interesting finding from comments in the survey results is for example the need for automated connections to public databases and external systems for input data. No question on this issue was included in the questionnaire, however, the need for simulation data in external databases was raised by a few respondents stating for example nationwide healthcare databases and web open source data as examples. Examples of other such public available databases are ELCD [4] and UPLCI [11]. 5. DISCUSSION The aim of this paper is to map the current practices in input data management for production simulation. Experiences from 86 companies worldwide were collected using a questionnaire initiated at the Winter Simulation Conference 2010 (WSC 10). The results indicate some advances but also highlight several problems with the integration of major business systems with simulation models. Initially, the authors main focus was to provide an update of solutions for automated input data management in manufacturing simulations. However, the results of the survey consistently show that there is no substantial difference between the situation in manufacturing industry and other application areas represented at WSC 10. This means that the results may well be of interest for increasing efficiency in for example health care, logistics and military simulations. The main findings prove that the manual involvement is still significant in input data management. 80% of the companies rely on the Methodologies A and B, including manual data collection and processing. 20% have implemented automated connections to the required data sources, mostly via an intermediary database (Methodology C). Of course, the categorization of input data management methodologies provided by Robertson and Perera [14], containing four different approaches, can be divided into sub-categories. For instance, the increasing number of MDA implementations [1] during the last decade are examples of the Methodology C. Furthermore, the well researched area of data integration using PLM packages [6] exemplifies an alternative for realizing Methodology D. In other words, companies have several alternatives when selecting a suitable solution for increasing efficiency in simulation studies. The survey, however, shows that Methodology C implementations hold the highest potential to succeed given present circumstances. Dynamic simulation models require very detailed data, seldom found in major business systems, according to both this survey and previous research [10][14]. Consequently, there is a need for combining sources within the CBS with local systems providing detailed processing times, stop times, etc.; see for example the diversity of sources reported in Table 2. Moreover, the respondents of this survey also highlight the possibilities to process and manipulate the data before supplying it to the simulation model as a major advantage of Methodology C. This is often required to extract the correct information from raw data, but also in order to set up what-ifscenarios for simulation analysis. Additionally, the intermediary step can also be utilized for security reasons assuring that interoperability problems do not affect data essential for other engineering applications nor the flow of information on the shop floor.

123 Figure 4: Example of Methodology C for automated input data management. Looking further into Methodology C (Figure 4), there are several alternative solutions presented in recent literature. The previously mentioned MDA is a popular solution, often including the technical equipment for the actual raw data collection. Another example focusing on the extraction and processing of already available raw data is the GDM-Tool presented by Skoogh, Michaloski and Bengtsson [16]. It should also be mentioned that there have been advances in the data management support provided by commercial simulation software packages, mainly in the interface to common database formats as well as in data processing and analysis. These features can be utilized in the set-up of both Methodology C and D applications. However, the close connections to major ERP vendors have not been established, which is also highlighted in the results of this survey; see the paragraph above. Therefore, these features are, according to the author s experience, more used for stand-alone purposes in work procedures categorized as Methodology A or B. As a comment on the survey design, the questionnaire was distributed to around 700 DES practitioners and researchers at the WSC responses were collected in total. This answering frequency might appear to be limited but the reader should keep in mind that the survey was mainly aimed to the representatives having performed a recent industrial simulation study. It is therefore most likely that many people, declining to submit an answer, were researchers without recent case studies in industry. Further analysis of the participating companies at the WSC 10 shows that many respondents represent large-sized organizations. Thus, the possible underrepresentation of Small and Medium-sized Enterprises (SME) implies that the actual use of automated solutions in input data management might be slightly lower in general. Despite the current majority of work procedures categorized as Methodology B, companies are very interested and motivated to increase the level of automation. Results show that 77% expect to implement solutions with automated connections to the required data sources within a 10 years period. There is naturally a significant potential to reduce the time-consumption in input data management, and consequently also in entire simulation studies, thanks to the reduction of

124 manual involvement. To maintain the model credibility, there are of course important issues associated with automated handling of large data sets, sometimes including a significant amount of erroneous samples. However, utilizing clever algorithms for data cleansing and analysis, there is a significant potential to reach a higher and more consistent data quality compared to manual input data management. 6. SUGGESTIONS FOR FURTHER DEVELOPMENT The discussion provided above invites to further development in two possible directions. The first option is to develop Methodology C solutions because their strength to deal with problems such as the combination of data from different sources and the extensive need for data processing and manipulation. The second solution is to influence the development of major business systems (ERP, MES, PLM, etc.) to include the detailed raw data necessary for dynamic simulations. This alternative, leading to complete integration of data sources with simulation models, holds strong potential because the completely eliminated need for human assistance. However, it will face more problems and require extensive research and development along the way as compared to the first alternative. Methodology C was the main suggestion already a decade ago [14] and due to the limited advances in support systems for input data management it is still envisaged as the main alternative. The intermediary database provides the possibility to merge data from major business system, local data sources, people-based systems and external reference systems, which is a prerequisite due to the lack of comprehensive and detailed data in single sources. An additional argument is the increasing dependency on external reference systems identified in this survey. Such sources are nowadays important, partly as a result of the sustainability analyses integrated in traditional manufacturing simulations. Data necessary for such purposes are often extracted from public data bases, e.g. EcoInvent [18], ELCD [4] and UPLCI [11]. The development of required solutions for direct integration of major business sources and manufacturing simulation applications (Methodology D) has not been sufficient during the last decade. This statement is supported by the finding in the presented survey showing that only 3% of the companies have implemented such solution. However, despite that the authors argue for Methodology C solutions, they encourage further research and development facilitating such implementations. The key factor is to collect, structure, and store detailed data in major business systems, which most likely includes standards and interoperability research to maintain the accuracy and comprehensibility of the data. The bottom line is that increased level of automation is important regardless of whether Methodology C or D is most appropriate for a specific company. High efficiency in input data management has a significant impact on the usability and profitability of manufacturing simulations. Succeeding with one of these solutions is therefore a prerequisite for increasing the use of simulation on a regular basis, from the 65% reported in the results of this survey; see question 2 in Table 2.

125 7. CONCLUSION Robertson and Perera [14] identified four methodologies of input data management shortly summarized below: a) Manual data processing and supply of information to the simulation model. b) Manual data processing and a spreadsheet interface to the simulation model. c) Automated connection between data sources and simulation model using an intermediary database. d) Direct link between the CBS and the simulation model. Since their study, published in 2002, there have been advances in the input data management process itself as well as in support systems such as data collection systems, databases, and simulation software. Therefore, this paper presents an update of the industrial practice in input data management in order to identify and describe a possible progress. The categorization above has served as a foundation and the results show: The most common input data management procedure still includes significant manual involvement in data processing and utilizes a separate spreadsheet interface to the simulation model. 61% of all companies use such approach (Methodology B). During the last decade there have been advances in automating the input data management procedure. Going from very few industrial examples ten years ago [14], 22% of the companies have now implemented the more automated Methodologies C and D. The vast majority of this subgroup (C and D) prefers an intermediary database between the data sources and the simulation model to handle their dependency on multiple data sources and extensive need for data processing and manipulation. Another argument for using the intermediary database is the lack of sufficient data processing features in commercial simulation software packages, despite some advances such as the integration of distribution-fitting functionalities. There is an increasing need for collection and processing of data from external reference systems such as public LCA databases. In manufacturing, this increasing need is most likely due to the integration of sustainability analyses in manufacturing simulation studies. Despite the progress identified in this paper, many companies ask for further support in elevating the level of automation in input data management. Almost 80% of all participating companies expect to implement Methodologies C or D within ten years. Researchers and industrial developers should focus on increasing the availability of detailed raw data in major business systems and to provide efficient solutions for data processing, e.g. provided in intermediary databases.

126 8. REFERENCES [1] M. Aufenanger, A. Blecken, C. Laroque, Design and Implementation of an MDA Interface for Flexible Data Capturing, Journal of Simulation, 4 (2010) [2] J. Banks, J.S. Carson, B.L. Nelson, Discrete-Event System Simulation, second ed., Prentice- Hall, Upper Saddle River, [3] Geer Mountain Software Corporation, Stat::Fit Commercial Webpage, available from [accessed on October 22, 2011]. [4] Institute for Environment and Sustainability, European Reference Life Cycle Database, available from [accessed on August 11, 2011]. [5] M. Jahangirian, T. Eldabi, A. Naseer, L.K. Stergioulas, T. Younga, Simulation in manufacturing and business: A review, European Journal of Operational Research, 203 (2010) [6] W. Kühn, Digital Factory Simulation Enhancing the Product and Production Engineering Process, in: Proceedings of the 2006 Winter Simulation Conference, (2006) [7] A.M. Law, M.G. McComas, How the ExpertFit Distribution-Fitting Software can make your Simulation Models more Valid, in: Proceedings of the 2003 Winter Simulation Conference, (2003) [8] A.M. Law, Simulation modeling & analysis, fourth ed., McGraw-Hill, New York, [9] K. Mertins, M. Rabe, P. Gocev, Integration of Factory Planning and ERP/MES Systems: Adaptive Simulation Models, in: T. Koch (ed.), IFIP International Federation for Information Processing - Lean Business Systems and Beyond, Springer, Boston, 2008, pp [10] Y.B. Moon, D. Phatak, Enhancing ERP system s functionality with discrete event simulation, Industrial Management & Data Systems, 105 (2005) [11] M. Overcash, J. Twomey, D. Kalla, Unit Process Life Cycle Inventory for Product Manufacturing Operations, in: ASME Conference Proceedings MSEC2009 (2009). [12] T. Perera, K. Liyanage, Methodology for rapid identification of input data in the simulation of manufacturing systems, Simulation Practice and Theory, 7 (2000) [13] M. Rabe, S. Spieckermann, S. Wenzel, A New Procedure Model for Verification and Validation in Production and Logistics Simulation, in: Proceedings of the 2008 Winter Simulation Conference, (2008) [14] N. Robertson, T. Perera, Automated data collection for simulation?, Simulation Practice and Theory, 9 (2002) [15] A. Skoogh, B. Johansson, Mapping of Time-Consumption During Input Data Management Activities, Simulation News Europe, 19 (2009) [16] A. Skoogh, J. Michaloski, N. Bengtsson, Towards Continuously Updated Simulation Models: Combining Automated Raw Data Collection and Automated Data Processing, in: Proceedings of the 2010 Winter Simulation Conference, (2010) [17] J.J. Swain, Software Survey: Simulation Back to the future. ORMS-Today, 38 (2011). [18] Swiss Centre for Life Cycle Inventories, The EcoInvent Database, available from [accessed on August 11, 2011].

127 Publication IV Skoogh, A., B. Johansson, and L. Hansson Data Requirements and Representation for Simulation of Energy Consumption in Production Systems. In: Proceedings of CIRP Manufacturing Systems 2011.

128

129 Data Requirements and Representation for Simulation of Energy Consumption in Production Systems A. Skoogh 1, B. Johansson 1, L. Hanson 1,2 1 Chalmers University of Technology, Dept. of Product and Production Development, Gothenburg, Sweden 2 Scania, Industrial Development, Södertälje, Sweden Abstract Recently, the application area for Discrete Event Simulation has been extended from focus on economical aspects to include ecologic sustainability. Therefore, this paper aims to specify how new input parameters, such as electrical power, should be represented in simulation models. The study includes more than power measurements from a production line with five multi-operational machines. The variability between production cycles is low, which indicates that statistical representations are unnecessary. Furthermore, results show that 33% of the total energy consumption stems from non-value-added activities, which can be reduced by optimizing the production flow using dynamic simulations. Keywords: Discrete Event Simulation, Sustainable Production, Input Data Management 1 INTRODUCTION Today, there is a shift from pure economic focus of production towards including sustainability aspects. One sign of this major trend is extensive research programs such as Factories of The Future, with top priority on the topic Sustainable Manufacturing, within the European Union 7th framework program [1]. However, efforts to increase sustainability thinking are also initiated in a smaller scale on a company level. The potential is significant regarding direct benefits such as lower energy costs and reduced pollution, but there are also companies reporting indirect profits from goodwill effects, thus, resulting in an increased value of trademark. For example, a Scandinavian company in the food industry reported a reduction of CO 2 emissions by 44% and an increased value of trademark by 13 times compared to its main competitor after launching a strategic companywide project towards more sustainable operations [2]. In production systems, this work towards sustainability is currently performed in several different areas. One is of course to develop new, or increase effectiveness of existing, production processes, e.g. reducing energy consumption by introducing hi-tech materials in cutting tools. However, research shows that there is more potential in reducing the energy consumed during nonvalue-added activities, such as waiting times caused by balancing and systems losses in production flows [3]. A common tool for improvement of production flows, and thus to reduce the losses related to line balancing and machine interactions (system losses), is discrete event simulation (DES). One major reason is that DES provides the possibility to include dynamic aspects of production, which is necessary to analyze the interaction of system entities in detail [4]. Naturally, the trend towards sustainable production is also visible in the development of DES software packages as well as in DES users application procedures. Some examples are: the Green Theme of latest Witness release [5]; research performed on direct links between simulation models and Key Performance Indicators (KPIs) for ecologic and social sustainability [6]; and DES models aiming to reduce the energy consumption in production [7]. The current gap is that no specification exists regarding the handling of new input parameters required for these novel simulation purposes. Since input data management is crucial both with regard to the time-consumption of simulation projects as well as model reliability, it is particularly important to apply an efficient and quality assured procedure for all included input parameters. No previous studies have mapped the variability in input data describing sustainability aspects in DES models. Consequently, it is currently difficult to know how many samples are required during data collection and if these parameters should be represented as stochastic or deterministic. The aim of this paper is to specify how electrical power should be represented in DES models. This is performed by measuring the variability in electrical power utilization within and between machine state cycles, i.e. busy, idle, down and stand-by. The variability is a key factor for collection and representation of DES parameters since significantly variable parameters have to be stochastically represented using statistical distributions or similar approaches. Furthermore, such parameters also require more data samples measured in production, compared to those just represented with mean values. 2 INPUT DATA MANAGEMENT IN DES The input data management phase is arguably the most crucial and time-consuming in a DES project. More than 30% of the total time in simulation projects is spent on collecting raw data, transforming the data to information and to supply the inputs to the simulation model [8]. The major reason is that an extensive number of raw data samples are required to credibly represent stochastic parameters in dynamic simulations. For example, Perrica et al. [9] suggest using more than 200 samples when selecting a statistical distribution and calculating the mean and variance for Mean Time To Failure (MTTF) and Mean Time To Repair (MTTR). On the other hand, a deterministic parameter, remaining unchanged during a significant time-span, does not require continuous data collection and fewer samples are enough to calculate a credible mean value. In practice, numerous data points are collected for stochastic parameters using automated collection systems (e.g. Manufacturing Data Acquisition (MDA) [10]) or manual gathering, often simply using a stop-watch

130 [11]. The data points are then reviewed for possible errors before they are used in further calculations. One example of calculations is that the start time of one machine breakdown has to be subtracted by the start time for the previous stop in order to obtain the MTTF. Thereafter, the data samples are condensed to statistical distributions if not represented as traces or bootstraps [12][13]. The bottom line is that much time and effort can be saved if parameters regarded as deterministic are allotted less attention than stochastic parameters, significantly important for model behavior. It is therefore important to examine whether electric power utilization should be represented as a stochastic or deterministic parameter. 3 RELATED WORK At present there is a lot of research performed on applying various production development tools in sustainability analyses. Environmental Value Stream Mapping (EVSM) [14] is one example among the static tools. For the dynamic tools, there are various publications regarding scheduling algorithms maximizing the energy utilization [3]. In DES, which is also including dynamic aspects, Solding et al. [7] have presented case studies in energy-intensive foundries. They identified a significant potential to reduce energy consumption by improving the production planning. Furthermore, their study detected a general lack of detailed production data necessary for describing environmental aspects of production, which supports the need for research on collection and representation of such parameters. A common purpose for all these contributions is that they focus on reducing the environmental effects of non-valueadded time in production, instead of moderating the impact of specific production processing techniques. A possible source of inputs for the above mentioned production development tools is the Unit Process Life Cycle Inventory (UPLCI). In this work, a new approach to the manufacturing unit process is used as the basis for life cycle inventory (LCI) analyses of production systems. In this way, energy and mass profiles for different production operations can be extracted from a portal [15]. As an addition to the use in LCI analyses, this type of data can, for example, be used as inputs to the individual entities of a DES model, given that the parameters can be regarded as deterministic. Regarding the storage and representation of simulation data, a new standard has recently been accepted within the Simulation Interoperability Standards Organization (SISO) [16]. The standard is titled Core Manufacturing Simulation Data (CMSD) and includes the traditional input parameters required for DES modeling. However, the format for representation of environmental parameters is not yet decided, but the development group plans an extension of the standard for this purpose. Note that many papers, including these regarding dynamic simulation and scheduling, presume that electric power utilization can be represented as a deterministic input. However, there is one contribution categorizing manufacturing processes in foundry industry with regard to how power data should be represented [17]: 1. A stochastically represented load when working, while idling and while off. 2. One stochastic representation during simulation. 3. A parameter that varies over time and/or with the situation. 4. A special logic, due to special or complex use of resources, which does not fit into the first three categories. The same paper states that category 1 is the most common and it suggests using machine states (working, idling and off) to incorporate the stochastic behavior in the simulation model. Further, it is mentioned that stochastic distributions can be applied to represent the power levels for each machine state, given that enough data is present. Consequently, it is still unexplored whether stochastic distributions are required when representing the power levels for the different machine states. That is exactly the aim of this paper. 4 METHOD This study is performed at a company in the automotive industry and includes measurements of the electrical power utilization for five multi-operational tooling machines. All five machines perform milling operations in a production line producing engine components. The first machine (OP 20) has a lower cycle time than the other machines and is therefore performing its operation alone, while the other four equipments are grouped two by two in the operations called OP 30 and OP 40; see Figure 1. All five machines are completely automated during normal production. Operators are required only for repairs when minor break-downs occur. A separate maintenance organization is responsible for repairing major break-downs, but such major disturbances did not occur during the time of measurements in this study. Products are moved between the machines by portal robots, also referred to as material handling (MH) equipment in this text. OP 20 Op 30_2 OP 30_1 OP 40_2 OP 40_1 Figure 1: Flowchart of the production line. Three measurement devices were used during this study, two Dranetz BMI PowerGuide 4400 and one Dranetz BMI PowerVisa, all with a sampling frequency of 1 Hz (one sample per second). The measuring devices were plugged in via Y connections to the incoming electrical three-phase connection on each machine. Incoming connection means that all functions of the machines were measured as one unit, including contributions from major machine systems (e.g. machine spindle) as well as from peripheral functions such as lights, control system and pumps. The measurements were performed during two work days and the night in between. Consequently, since three measuring equipments were available, three machines have samples from both day and night and two only from daytime. In total, more than samples were collected, including 108 busy, 116, idle, 11 down and 3 stand-by states. See details in Table 1. After collecting the data, all samples from the measurement equipment were assigned to one of the following four machine states: Busy state is defined as the time between the MH equipment reports that a product is loaded into the machine until the same MH equipment gives the signal that the product is collected after finishing the production cycle. Possible time for machine breakdowns is subtracted (see the description of down state).

131 Idle state is defined as the time between the MH equipment reports that a product is collected and that a new product is loaded into the machine. Thus, it includes time when the machines are starved or blocked by other resources. Down state is defined as the time between the machine control system reports that a failure occurred (by lighting a red lamp on the roof of the machine) and that the same red light was switched off. Stand-by state is defined as the time between the machine enters low energy consumption mode (during non-production night time) and that normal production mode is resumed (in the beginning of the morning shift). energy cost associated to each machine state during the time period of measurements. This was possible because the real power utilization and cycle time were available from the measuring devices and machine control systems. 5 RESULTS The first result compiled from the data analysis shows that on average 67% of all the energy consumption, during the time of measurements, can be derived to value-added processing time in the machines, i.e. busy state. Processing time is here considered as the time between a product is loaded into the machine until it is unloaded. Consequently, actual cutting time is grouped together with the internal moving time for machining tools. Table 2 shows the distribution of energy consumption between the remaining 33% consumed in non-value-added activities (idle, down and stand-by). It is very important to understand that these numbers must be regarded as indicative. They are specific to the production circumstances during the time of the study, and there are also different sample sizes for the various machines. Table 1: Number of samples and cycles included in the study. This work was performed by synchronizing the clocks of the three measuring devices with the control systems for each of the five machines (with a resolution of 1 second, which is the same resolution as the measurement frequency). Thereafter, control logs from each of the five machines were extracted, covering all events during the two days of measurements. They included all time stamps necessary to delimit each machine state according to the definitions given above. The time stamps for switches between events were manually merged into the file including power measurements in order to tag each individual sample. Descriptive statistics were calculated for all five machine states, both for the individual samples within a cycle and for the calculated average power utilization of cycles; see Figure 2. Further, an example of machine state cycles for busy and idle states is illustrated in the same figure. The descriptive statistics, reported later in the results section, are the average values and the standard deviations as a measurement of variability. Additionally, the authors calculated and compiled information about the total Table 2: Total energy consumption per machine, distributed between the four different machine state cycles. In Table 2: EiB = Energy in Busy; EiI = Energy in Idle; EiD = Energy in Down; EiS = Energy in Stand-by. Next table (Table 3) shows the average power utilization between the different machines, machine states and product variants (below called V1 and V2). Note that machines 30_1 and 30_2 perform exactly the same operation and the same applies for 40_1 and 40_2. For example, the difference between 40_1 and 40_2 is just 0.4 kw in average value while the difference between 30_1 and 40_1, performing different milling operations, is 6.5 kw in busy state. Figure 2: Illustration of machine state cycles and the two types of variation evaluated in this study.

132 Machine PiB (kw) PiB (V1) PiB (V2) PiI (kw) PiD (kw) PiS (kw) OP OP 30_ OP 30_ OP 40_ n/a n/a n/a OP 40_ n/a Average Table 3: Average power utilization per machine, distributed between the four different machine state cycles. In Table 3: PiB = Power in Busy; PiB (V1 & V2) = Power in Busy given one of the two product variants; PiI = Power in Idle; PiD = Power in Down; PiS = Power in Stand-by. All numbers represent the average power utilization. Table 4 contains the results for the variability analysis. They show that the standard deviation is normally between 1-2% of the average power utilization in busy cycles of a specific machine. It is also indicated that the variability is even smaller when both the factors resource and product variant are known, despite some missing measurements supporting this statement. Between idle cycles, the variability is slightly higher, while the power utilization in down states is more stable. The standard deviations between stand-by cycles are calculated from very few samples. OP 30_2 and OP 40_2, written in italic, contain only two samples each. Machine SbBc (kw) SbBc (V1) SbBc (V2) SbIc (kw) SbDc (kw) SbSc (kw) OP n/a OP 30_ n/a n/a OP 30_ n/a OP 40_ n/a n/a 0.37 n/a n/a OP 40_ n/a Average n/a Table 4: Standard deviations for the average power utilization between each individual machine state cycle. In Table 4: SbBc = Standard Deviation (StDev) between Busy cycles; SbBC (V1 & V2) = StDev between Busy Cycles given one of the two product variants; SbIc = StDev between Idle cycles; SbDc = StDev between Down cycles; SbSc = StDev between Stand-by cycles. Table 5 shows the variability of samples within a cycle. Note that the sample frequency was 1 Hz, which introduces some uncertainty for these numbers. Still, it is possible to see that the variability within cycles is quite extensive, especially in busy state where the machine continuously switches between acceleration and deceleration of the spindle. Machine SwBc (kw) SwBc (V1) SwBc (V2) SwIc (kw) SwDc (kw) SwSc (kw) OP OP 30_ OP 30_ OP 40_ n/a n/a n/a OP 40_ n/a Average Table 5: Average standard deviations for the individual samples within machine state cycles. In Table 5: SwBc = StDev within Busy cycles; SwBc (V1 & V2) = StDev within Busy cycles given one of the two product variants; SwIc = StDev within Idle cycles; SwDc = StDev within Down cycles; SwSc = StDev within Stand-by cycles. 6 DISCUSSION The results of this study, in combination with previous research [3], confirm that a major part of the energy consumption in production flows stems from non-valueadded activities, i.e. idle time, down time and stand-by. In this case 33% of the energy consumption, and consequently also the energy costs, are related to such activities. Depending on where in the world a production site is located, the direct energy cost might not yet be significant but energy prices are globally following an increasing trend. In addition, early adopters of energy reducing methods and tools have experienced other competitive advantages such as goodwill effects from customers [2]. In traditional discrete event modeling the unit of analysis is usually set to a production line level or similar. This means that output statistics such as number of finished products per hour or energy consumption for the entire line are central. For that kind of purpose, this study shows that there is not enough variation in power utilization, between the different machining cycles, to spent time and effort on representing this factor as a stochastic input. The power utilization for each machine state can therefore be collected using few measurements in production or extracted from process libraries such as the UPLCI [15]. Thus, it is enough to model the variation in energy consumption using stochastic machine state times as proposed in [17]. However, there is a trend towards an increased focus on the product as the unit of analysis. This means that DES models are used to calculate the individual lead time or production cost for each product. There are even efforts examining whether it is reasonable to calculate the environmental footprint of products using DES [18]. For such detailed purposes, the variation between busy cycles (StDev around 2%) would be relevant to include using stochastic representation. When electric power is regarded as a stochastic input parameter, several cycles have to be measured using the same technique as in this paper. When comparing with the collection of downtime data [9][11], where more than 200 cycles are suggested, the reader easily understands that this is a timeconsuming effort. Differing between samples within cycles is interesting only when regarding the impact of the internal operations in a machine, e.g. in process preparation when considering reallocation of cutting operations between machines. In that case the samples have to be regrouped into subgroups representing the different cutting operations. However, in conventional DES modeling, when representing the power utilization for an entire cycle, it is inappropriate to prepare a distribution based on the individual samples. Selecting a value from such distribution would introduce variability significantly higher than in reality. In the processing of power measurements the authors assumed (based on discussions with the electrical engineer performing the measurement) that all five machines can return electricity back to the distribution grid. Consequently, negative values measured during spindle deceleration are kept in original form and, thus, neither deleted nor adjusted to zero. Furthermore, for down and stand-by states, the measurements included few complete cycles, which resulted in difficulties to calculate the variability for these states. However, since the variability of samples within these states is moderate,

133 it is reasonable to assume that the variability between cycles is also low. To summarize, the collection, processing and storage of environmental data are important activities to obtain an efficient procedure of input data management in future DES projects. It is also crucial for the quality of DES results used for increased efficiency and sustainability in production flows. 7 CONCLUSIONS The aim of this paper is to specify how electrical power should be represented in DES models. This is done by evaluating whether the power utilization varies or remains consistent between and within machine state cycles. For example, high variation between different busy cycles would indicate that a stochastic representation should be applied for simulating the energy consumption in machining operations. More than power measurements from five automated tooling machines were collected in order to analyze the variability in power utilization between and within the machine cycles: busy, idle, down and stand-by. The driving force behind the study is that simulation of material flows in production is an increasingly popular method for analysis of environmental impact of production. The results show that the standard deviation, for the average power utilization in busy cycles, ranges from 1 to 2% for the five machines included in this study. For idle and down cycles, the same values are 9% and 1% respectively (average standard deviations for all five machines). Looking further into the busy state, the variability between product cycles is even smaller when also considering the product variant a known factor. The standard deviation in this case is 1.5%. These numbers show that the variability between product cycles is limited and unnecessary to include in conventional DES modeling, despite that DES models are of dynamic nature. The reason is that the variability in processing time has considerably higher impact on the final result. This means that the collection of raw data describing power utilization can be limited to a few samples, just enough to calculate a credible mean value. However, if DES models are or will be used to analyze the environmental footprint, using the individual product as the unit of analysis, electric power likely needs to be considered a stochastic variable. An additional result obtained in this study is that 33% of the total energy consumption for the five machines stems from non-production time. In other words, a significant part of the energy cost and related environmental impact is related to non-value-added time explained by balancing and systems losses. Note that this result is specific to the particular production system and the time of measurements for this particular study. However, it is a strong indication that improvement of production flows is a very important area among efforts towards more sustainable production systems. Dynamic simulation is a potent tool in such improvement work. 8 FUTURE RESEARCH Of course, there are other sustainability issues important to address, but this study is delimited to the electric power utilization of fully automatic tooling machines. The authors plan to perform further studies on how to address other sustainability parameters, such as global warming, acidification, and toxicity, in DES models. Moreover, the same study will be repeated in several other companies. It has to be complemented with measurements on other machining processes, in other production equipment and in production systems with different levels of automation. Furthermore, the authors would like to contribute to the extension of CMSD [16] with environmental data representation. The CMSD development group needs information about the variability of environmental parameters in order to select an appropriate format for data storage. 9 ACKNOWLEDGMENTS The funding for this research is granted by VINNOVA (Swedish Governmental Agency for Innovation Systems). This work has been performed within the Sustainable Production Initiative and the Production Area of Advance at Chalmers. The support is gratefully acknowledged. The authors would also like to thank Vesa Rosvall, the Electrical Engineer performing the power measurements. Finally, Edward Williams (University of Michigan and PMC, Dearborn, Michigan, U.S.A.) has kindly provided valuable suggestions for enhancing the presentation of this paper. 10 REFERENCES [1] EC 7 th framework programme. [accessed 29 Nov. 2010]. [2] Max annual climate report Available at [accessed 27 Nov. 2010] (in Swedish). [3] Cao, H.-J., Chou, Y.-C., Cheng, H. H., 2009, Mobile Agent Based Integration Framework for Flexible Dynamic Job Shop Scheduling, In: ASME Conference Proceedings IDETC/CIE2009, San Diego, CA, USA. [4] Standridge, C. R., Marvel J. H., 2006, Why Lean Needs Simulation, In: Proceedings of the 2006 Winter Simulation Conference, Monterey, CA, USA. [5] Lanner Group Limited, Witness Green Release, [accessed 22 Nov. 2010]. [6] Heilala, J., Vatanen, S., Montonen, J., Tonteri, H., Johansson, B., Stahre, J., Lind, S., 2008, Simulation-Based Sustainable Manufacturing Systems Design, In: Proceedings of the 2008 Winter Simulation Conference, Miami, FL, USA. [7] Solding, P., Thollander, P., 2006, Increased Energy Efficiency in a Swedish Iron Foundry through Use of Discrete Event Simulation, In: Proceedings of the 2006 Winter Simulation Conference, Monterey, CA, USA. [8] Skoogh, A., Johansson, B., 2009, Mapping of the Time-Consumption During Input Data Management Activities, Simulation News Europe 19/2: [9] Perrica, G., Fantuzzi, C., Grassi, A., Goldoni, G., Raimondi, R, 2008; Time to Failure and Time to Repair Profiles Identification, In: Proceedings of the 5 th FOODSIM conference, Dublin, Ireland. [10] Aufenanger, M., Blecken, A., Laroque, C., 2010, Design and Implementation of an MDA Interface for Flexible Data Capturing, Journal of Simulation 4/4: [11] Williams, E. J., 1994, Downtime Data -- its Collection, Analysis, and Importance, In: Proceedings of the 1994 Winter Simulation Conference, Orlando, FL, USA. [12] Robinson, S., 2004, Simulation: The Practice of Model Development and Use, John Wiley & Sons Ltd, Chichester. [13] Skoogh, A., Johansson, B., 2009, A Methodology for Input Data Management in Discrete Event Simulation Projects, In: Proceedings of the 2008 Winter Simulation Conference, Miami, FL, USA.

134 [14] Torres Jr., A. S., Gati, A. M., 2009, Environmental Value Stream Mapping (EVSM) as Sustainability Management Tool, In: PICMET 2009 Proceedings, Portland, OR, USA. [15] Overcash, M., Twomey, J., Kalla, D., 2009, Unit Process Life Cycle Inventory for Product Manufacturing Operations, ASME Conference Proceedings MSEC2009, West Lafayette, IN, USA. [16] SISO Simulation Interoperability Standards Organization, CMSD Product Development Group, 2009, Standard for: Core Manufacturing Simulation Data - UML Model, Feb. 13, [17] Solding, P., Thollander, P., Moore, P.R., 2009, Improved energy-efficient production using discrete event simulation, Journal of Simulation 3: [18] EcoProIT, Chalmers University of Technology, Dept. of Product and Production Development, [accessed 27 Nov. 2010].

135 Publication V Skoogh, A., B. Johansson, and J. Stahre. Submitted. Automated Input Data Management: Evaluation of a Concept for Reduced Time-Consumption in Discrete Event Simulation. Simulation: Transactions of the Society for Modeling and Simulation International. (Under 2 nd review.)

136

137 Automated Input Data Management: Evaluation of a Concept for Reduced Time-Consumption in Discrete Event Simulation Anders Skoogh Björn Johansson Johan Stahre Department of Product and Production Development Chalmers University of Technology Hörsalsvägen 7A Gothenburg, SE , Sweden anders.skoogh@chalmers.se Abstract Input data management is a crucial and time-consuming part of a simulation project. Consequently, improvement of this process has substantial potential to increase rapidity of simulation projects, thus enabling more detailed analyses in design and development of production systems. This paper presents the development of a software solution called the Generic Data Management Tool (GDM-Tool), which automates several critical and time-consuming input data activities. More specifically, raw data is extracted from multiple sources, with different data structures, and transformed to simulation input through data cleaning, calculations and distribution fitting, all done in one automated process. Finally, the simulation input is presented in the Core Manufacturing Simulation Data (CMSD) format, for further use in simulation applications. As a first step towards validation, the GDM-Tool was evaluated during a case study in automotive industry. Results show that time needed for input data management was reduced by 78%, as relative to a traditional manual approach. Key Words: Input data management, data collection, CMSD, discrete event simulation, DES 1. Introduction Discrete Event Simulation (DES) is one of the most powerful tools for planning, designing and improving material flows in production. The tool mimics dynamic aspects of manufacturing systems, which is a great advantage compared to other production analysis methods such as queuing theory or static capacity calculations. This ability contributes to the fact that simulation is ranked among the top three tools for decision support, with regard to pay-off rate in operations development projects [1]. However, despite its potential, industries worldwide have not yet adopted DES completely in their production development process [2][3]. One reason is arguably that production simulation projects tend to be slow in providing the clients with model results [4][5]. This is a significant disadvantage, since manufacturing development and design projects usually rely on rapid responses from analyses [6]. Hence, renouncing precision in favor of quick response, organizations are tempted to choose less complex tools. A significant part of the time consumed in production simulation projects can be derived from management of input data. There are very high quality requirements on input data for simulation projects, in order to deliver results of substantial value for decision-making. This, in combination with problems to find, collect, and analyze raw data, contributes to the fact that

138 activities in the input data management process constitute around one-third of the total timeconsumption in DES projects [7][8]. Major issues, which significantly increase the timeconsumption, include insufficient raw data availability; problems with extracting raw data from original sources; and extensive manual workload to transform raw data into simulation input [7][9]. Focus on solving these major issues for input data management is an efficient way of significantly contributing to the reduction of time-consumption for entire DES projects. By extension, this would enable more detailed analyses and, thus, result in improved design of future production systems. Currently, suggested solutions primarily strive towards establishing automated connections between simulation models and Corporate Business Systems (CBS) as the main source of simulation data [10][11]. Today, CBS is represented by an Enterprise Resource Planning (ERP) system in many companies. The automated connection enables efficient supply of simulation data, since manual work is reduced and extracted data is up-to-date. However, many companies do not have all required data in ERP or other major business systems, especially not data describing the dynamics of production systems [10][12][13]. Therefore, simulation practitioners still have to complement information from ERP systems with data from other sources. The problem of adding information implies additional handling steps and, consequently, the input data process remains dependent on manual involvement. This paper presents one approach to reduce the time-consumption by automating a complete chain of activities in the input data management process, without being dependent solely on large business systems such as ERP. The concept is demonstrated as a software solution, called the GDM-Tool (Generic Data Management), which automatically extracts raw data from several sources, transforms data to information and presents it in accessible format for a range of DES software. In this case, the output format is exemplified by the Simulation Interoperability Standards Organization (SISO) standard Core Manufacturing Simulation Data (CMSD) developed at the US National Institute of Standards and Technology (NIST) [14][15]. Moreover, the software is easily configurable and can therefore handle raw data from different CBS sources including fully customized systems. The aim of this paper is to evaluate the time-consumption using this automated approach instead of manually performing the input data management process and, also, to compare the data quality between the two approaches. Note that the GDM-Tool is not aimed to reach the functionality requirements of a commercial software solution. It is developed as a demonstrator for the automated input data management concept, comparable to the intermediary simulation database solution (methodology 3) presented by Robertson and Perera [10]. 2. Input data management In this paper, input data management is defined as a concept that embraces all activities needed to raw data to information that is ready to use in DES models. To be ready to use in a simulation model, some values need to be added to raw data describing different events in the modeled system. Hence, input data to simulation models may be considered to be information for the simulation model, despite the fact that input data is the most recognized term. Davenport and Prusak highlight five important methods to transform data into information [16], and all of them are relevant also in the context of transforming raw data into information for simulation models; see Table 1.

139 Table 1: Important methods for adding value to raw data in order to obtain information for simulation models [16]. Method Contextualization Categorization Correction Calculation Condensation Purpose Knowledge about what purpose the data was gathered for Knowledge about units of analysis or key components of the data Errors are removed from the data Mathematical calculations or statistical analysis of the data The data have been summarized in a more concise form In input data management for DES, the context of raw data is ideally added when the collection system or gathering procedures are designed. In these cases, the data itself, and the collection process, is well explained. However, in numerous real-world situations, context is not added until the raw data is reviewed in connection with the establishment of a conceptual model. Categorization is also, in ideal cases, added before or even during actual collection of raw data. But the lack of well-structured databases, containing DES data, usually implies additional work of understanding and grouping raw data [7][9]. Moreover, correction and filtering of data and calculations, such as independence and correlation analyses, are common activities for practitioners during input data preparation. Finally, condensation is normally done using statistical or empirical distributions for parameters including variability [17]. Depending on circumstances such as a company s previous experience in DES, or advancement in implementation of computerized data systems, input data management starts with different prerequisites. In some cases no data is available at all, and hence, the work starts with manual measurements of shop-floor behavior. However, in other projects, an extensive part of the required data is previously measured and maybe even analyzed and prepared for DES. Robinson and Bhatia classify data in three categories (Table 2) with regard to availability and collectability [18]. Note that the classification does not take into consideration the amount of analysis work performed previously on the data. Category Category A Category B Category C Table 2: Classification of data [18]. Data availability Available Not available but collectable Not available and not collectable To summarize, the input data management procedure starts with data in a crude form, or in good circumstances with previously analyzed data, and ends with a data-sheet including all necessary input parameters for a specific DES model, in a format that is ready to use. Moreover, to delimitate research presented in this article, the input data management process is mainly intended to convert quantitative data, for example cycle times and downtimes, into simulation input. Consequently, the principles discussed here are not primarily designed to suit qualitative data and information about logical relationships between model entities, for instance part routing and facility layout. 2.1 Present Industrial Practice At present, the input data management process, almost exclusively implies some manual involvement. Robertson and Perera identified only one case of a completely automated connection to computer applications within the CBS for the bulk of data [10]. The reason is arguably

140 the lack of simulation data in these applications, especially data describing the dynamics of production systems. Moreover, finding all necessary input data in a condensed form suitable for simulation is very unlikely [7]. Thus, the same study also states that most of the practitioners need to collect data from several sources. The majority responded that simulation data usually reside in local systems such as spreadsheets. Hence, simulation practitioners have to rely on manual procedures to collect and extract raw data and also to supply the final information to DES models (Table 3). However, the transformation process itself is often supported by different tools such as MS Excel for categorization, calculation and cleaning of raw data, and distribution fitting tools for condensation [19]. Many DES software packages also provide the possibility to extract data from databases, perform basic transformation operations, and condense data sets to statistical distributions; see for example [20]. However, due to erroneous data points, strange data formats, and complex data categories, additional data operations are necessary in order to automate the entire chain. The bottom line is that there is a gap between the transformation requirements for present data sources and the extraction and transformation support provided by common commercial tools. Especially if the aim is to automate the data management of all parameters required in DES models. Note that regardless of whether the process is completely or partly manual, the human involvement is a major reason for the extensive time-consumption related to input data management [7]. 2.2 Possible Solutions The authors have observed some initiatives in industry striving towards more automated data handling, but the solutions are exclusively designed as completely customized applications. Hence, the extraction of data from various sources is only developed to suit certain local systems and, thus, does not support a generic strategy. Moreover, the automated connection to CBS, described as methodology 4 in [10], is a potent solution for companies having advanced significantly in their design, implementation and use of applications within the CBS and at the same time have a high proportion of category A data. Despite the very small number of successful companies in this sense, the prerequisites for this connection will be improved in the future. Table 3: Proposed solution compared to present and future use of input data management in industry. Extraction Transformation Supply to DES model Present Situation Future State? Proposed Solution Manual Customized No extraction needed if data resides in the CBS Flexible configurable to different data sources. Manual Customized, e.g. by means of MS Excel macros Partially automated, e.g. in distribution fitting tools or additional functionality in DES software packages. Automated within the CBS Automated using a middleware solution. Middleware - Entire chain is linked and completely automated Manual Automated using customized spreadsheet Direct connection to the CBS Automated using neutral formats like CMSD.

141 This paper is delimited from research on the actual collection of raw data, as well as on the structure and content of different CBS applications. Instead, the proposed solution starts from present industrial circumstances and aims to automatically bridge existing data sources, despite their above specified problems, with simulation software packages. The solution is based on a middleware that automates the three functions of data extraction, transformation of data into information, and presentation of input data to the simulation model [21]. A similar architecture is described as methodology 3, including an intermediary off-line database, by Robertson and Perera [10]. The same concept has also previously been exemplified, with the limitation of one original data source, with positive effects on time-consumption [11]. To summarize Table 3, both the future state and the proposed solution are dependent on sources of available data (category A). However, due to the lack of detailed simulation information in single CBS applications, the middleware solution is required to combine all necessary raw data from several sources and to prepare it for use in simulation models. This does not mean that the authors reject the future state but they argue for an alternative solution until raw data collection and storage systems are homogenized, which is outside the scope of this research. There is also related work worth to mention in closely related research fields. For example, Manufacturing Data Acquisition (MDA) is close to the proposed solution in this paper. MDA incorporates both the collection and some initial processing of raw data from production resources [22]. This gives a good consistency in data formats, which is very convenient but requires that the concept, and its related technical solutions, is applied already in the collection of raw data. Another interesting approach is the ontology-driven framework for data transformation, so far mainly applied in ecological and biodiversity research [23]. Their framework for mapping and transforming data between different sources, using so called services, is comparable with the operations required to combine and correct data in various CBS applications in manufacturing industry. 2.3 Data Format Standardization CMSD Information management problems affect many aspects of manufacturing operations. They are a particular hindrance to the creation and reuse of manufacturing simulations. For the sake of automated input data management, a standardized format for presenting the data for the simulation models is highly desirable. Otherwise, the link between the simulation model and the data supply system has to be customized at every different occasion. One effort by researchers from NIST and Chalmers University of Technology, in collaboration with industrial partners, is done on a standard development effort titled Core Manufacturing Simulation Data. The outcome resulted in a standard, launched in September 2010, which follows the guidelines, policies, and procedures of the SISO [14][24]. The CMSD specification describes a CMSD information model using the Unified Modeling Language (UML) [25]. The primary objective of this information model is to provide a data specification for efficient exchange of manufacturing life-cycle data in a simulation environment. Since a standardized format of presenting data is a vital enabler for automated data handling, the CMSD data format standardization effort is well aligned with the efforts presented in this paper. The CMSD objective aims to: Foster the development and use of simulations in manufacturing operations Facilitate data exchange between simulation and other manufacturing software applications Enable and facilitate better testing and evaluation of manufacturing software

142 Increase manufacturing application interoperability. The CMSD information model addresses issues related to information management and manufacturing simulation development and provides means to define information about many kinds of manufacturing objects. It facilitates the exchange of information between manufacturing-oriented simulations and other applications in manufacturing domains, such as process planning, scheduling, inventory management, production management, and plant layout. The information model is not intended to be an all-inclusive definition of either the entire manufacturing domain or simulation domain. The key features of CMSD are: (1) it specifically facilitates the integration of simulation applications by providing a means to define aspects of manufacturing entities that are governed by stochastic processes, in such a way that the information can be exchanged and shared, and; (2) CMSD information elements may be extended with additional properties and other information. This subsection provides a description of some major means to specify manufacturing entities included in the CMSD. Resource information describes the people and equipment that perform activities. Order information specifies an external request to the manufacturing enterprise. Calendar information specifies time periods when production is and is not ongoing. Skill definition information describes the skills and proficiency levels a resource has. Setup definition describes time- to configure a resource, and to change configuration. Part information specifies materials, subcomponents, and end product. Bill-of-materials information specifies the subcomponent parts and quantities. Process plan information specifies production activities needed to make products. Maintenance plan information specifies maintenance processes for a resource. Job information specifies an internal request for production activities to take place. Schedule information specifies a time-plan for production activities. Distribution information specifies statistical distributions. Layout information specifies spatial data and relationships between resources. For the CMSD information model to work properly when taking an applied industrial starting point, some data treatment will be needed from each of the IT-systems involved, in order to achieve the data in an interoperable format such as CMSD. The authors can foresee that the GDM-Tool presented in this paper is one way to solve that transformation from any format to e.g. CMSD. For the sake of input data management in discrete event simulation projects, the CMSD information model can serve as neutrally formatted data storage. Any system is allowed to read and write to the CMSD instantiated documents. Tests and actions to prove that CMSD is a feasible neutral format and, thus, compatible with numerous simulation software packages have been performed in e.g. [26][27][28][29][30][31][32]. Note that CMSD specifies the data formats in the interface between IT-sources, but does not include descriptions on how to act on the data to transform it from data to information for a receiving source. Additionally, CMSD is focused on the exchange of data and does not include descriptions of any logical relations describing model behavior or execution. Such information is usually entered directly in the simulation software or by means of standards for this purpose, e.g. as proposed in [33]. Therefore, data described in CMSD format is not fixed to any specific or formalized type of simulation models [34].

143 3. Method A deductive analysis strategy is adopted in this paper, which means to go from theory to empirical finding and back to the theoretical domain again. Thus, the assumption that a software solution for automated input data management reduces the time-consumption in DES projects is stated on the basis of the literature review presented in chapter 2. The design of the software solution, however, stems from participation in a case study [35] in automotive industry, which is further explained in section 3.2. Hence, the collection of design requirements such as functionality and human-software interaction, along with gathering of information about the available production data sources, was performed with the authors participating in the project team. Finally, the evaluation of time-consumption and data quality was performed using quantitative measurements within the case study. The case study involved researchers together with simulation, production and IT engineers from the automotive company. This case study approach enabled collection of important research materials and documentation, which probably would have been hard to acquire using other research strategies. As a result, the design and validation of the GDM-Tool are based on experiences, extracts and descriptions from production data sources, meeting minutes, and quantitative measurements from the validation runs. However, the possibility to generalize the results is a commonly mentioned disadvantage with case studies [35]. Therefore, several case studies in different companies will be performed in the future, in order to make sure that the design and validation results are applicable in other organizations as well. 3.1 Validation Methodology The purpose of this study is to reduce time-consumption during input data management, compared to a traditional manual approach, without negative effects on data quality. Hence, the performance of the GDM-Tool needs to be validated by comparing the required timeconsumption and the data quality to a conventional procedure. In this case, the conventional procedure is represented as shown in Figure 1. Raw data, describing all necessary behavior on the shop-floor was manually extracted from off-line copies of two production databases, mainly using copy and paste functions. Moreover, filtering of irrelevant or incorrectly measured data was manually performed in MS Excel by simply identifying and deleting data points using process knowledge. In order to convert raw data, e.g. PLC time-stamps, into simulation parameters, some additional calculations were required. For example, the simulation parameter Time To Failure (TTF), requires an extensive number of calculations subtracting the stop time of one failure from the start time of the subsequent failure. All these kinds of calculations were executed using multiple copies of simple spread-sheet formulas. Finally, when all data was extracted and converted, ExpertFit was used for distribution fitting. Data Extraction Reading and filtering of raw data Manual procedure using MS Excel Calculation Calculating simulation parameters from raw data Semi-automated using formulas in MS Excel Distribution Fitting Statistical representations of simulation data Semi-automated using ExpertFit Figure 1: Example of a conventional approach to input data management, used as a reference to the GDM-Tool throughout the validation process. The actual comparison between the automated and the manual approach to input data management is performed using a simulation model of an assembly department in automotive

144 industry, described in section 3.2. Time-consumption is measured during preparation of input data for the model using both approaches, and the correspondence in data quality is tested by comparing the output from the simulation model using manually and automatically prepared input data. Moreover, there was also output data available from the real-world production system for the same time period during which the input data was collected. Therefore, the automated approach to input data management can also be compared to the real-world production system. The output parameter used for these tests is the total output of products per time unit. The time unit is unimportant for validation purposes and is therefore unpublished due to confidentiality restrictions. A hypothesis test for a difference in mean [36] is applied to evaluate if there is any difference in model output using the automated approach (the GDM-Tool) and the manual approach. In this test the null hypothesis is: H : The test statistic T 0 has a t distribution with n 1 + n 2 2 degrees of freedom. T 0 X S 1 p X n n 2 The pooled estimator of σ 2 is called 2 S p ( n 1 2 1) S1 ( n2 1) S n n S p and is defined by: To make sure that the test above is valid, the two model variances have to be compared for at least approximate equality. This is done using a hypothesis test on the ratio of two variances [36], where the null hypothesis: 2 H 0 : The test statistic F 0 has an F distribution with (n 1-1), (n 2-1) degrees of freedom. S F0 S Both hypothesis tests are applied using the significance level α = Description of the Production System The development and validation of the software solution presented in this paper are performed at a sub-assembly department in an automotive company. The department is responsible for final assembly of engines, just before they are assembled into vehicles at the main line. Work is performed at two identical parallel lines with nine stations on each line; see Figure 2. The preassembled engine enters the department on a roller conveyor and is placed on an AGV (Automated Guided Vehicle) at station 1. After that, the AGV transports the engine to the following eight stations where assembly work is performed. The detailed task allocation cannot be reported due to confidentiality reasons but the work includes assembly of all external components such as gear-box and hose packages. All tasks include manual assembly tasks, mainly supported by basic hand held tools and various lifting devices for ergonomic purposes.

145 The AGVs move from station to station at the same time in a one piece flow after a predetermined work duration, called pace or takt [37]. Hence, if one workstation is delayed, all other AGVs will wait. There is one worker assigned per station except from one additional helper per line and a team of five workers assigned to station 7 and 8 for both lines. Finally, downstream from station 9 there are three buffer places for each line, where pre-assemblies wait to be transported to the main line by forklifts. Figure 2: Schematic figure over the engine assembly lines. For planning and improvement purposes, a lot of data describing the production is collected and stored in databases, either by means of automated logging systems or manual storage of production design information. Two of these databases contain raw data for the necessary input data parameters in this particular case study: cycle times, MTBF (Mean Time Between Failures) and MTTR (Mean Time To Repair). Examples of data from these two databases are start and stop times of breakdowns, shift times for the workers, and cycle times for each individual engine. Having downtime data included in the validation case is important due to its significant impact on model behavior [38]. Furthermore, it is relevant to notice that both databases used in the validation case are structured as relational databases. Information about each event is logged and stored in an entire row and initially chronologically sorted. The columns contain different information about the events, for example start time, stop time and event description; see Figure 3. To avoid interference with the information flow used for present production, the company prefers engineering tools to interact with identical off-line copies via MS Excel and text files. Therefore, the GDM-Tool uses such files, provided by the company, to extract the necessary raw data.

146 Figure 3: Example of database structure in the validation case. 4. Results The concept for automated input data management presented in this paper (demonstrated by the GDM-Tool) can be classified as a solution for, and example of, the category using an intermediary simulation database for data processing [10]. A significant difference, however, is that this tool is not integrated into the database. No data resides in the application since it processes data stored in original sources and exports the results directly to a separate CMSD file, serving as an interface to the simulation software and the final user. 4.1 Concept Overview Error! Reference source not found. illustrates the proposed concept of automated input data management, including three major functions: data extraction, data transformation and output preparation. Firstly, a key feature is the ability to extract data from several sources with different internal structures. Secondly, when all raw data is imported, a series of activities is required to transform data from a crude form into relevant simulation input. As previously stated, almost all companies rely on a number of data sources to complete a simulation project. The sources are usually not structured according to standards and many of them are even developed as customized applications within companies. Hence, the variety of data structures makes automated extraction of raw data a complex issue. The GDM-Tool provides an opportunity to connect an unlimited number of differently designed data sources and gather all required raw data within the tool. This case study uses different real-world sources to demonstrate this opportunity. They are provided in spread-sheets and text-files and originate in data systems within the CBS (see section 3.2). However, the GDM-Tool itself is not limited to these specific types of data source, in other production environments the tool can be connected to other data-bases or file formats.

147 Figure 4: Schematic figure showing the data treatment process using the GDM-Tool. In this case study Enterprise Dynamics (ED) was used to demonstrate the final destination of data. Traditionally, the series of activities includes many of the most time-consuming steps in input data handling, as identified by [7]. However, the GDM-Tool automates activities like filtering measurements, creating relations and performing mathematical operations on data from the different origins. Furthermore, statistical representations are calculated for all parameters relevant to the simulation model. Finally, the results are prepared and exported to finally reside in a file, where simulation models can automatically access data during simulation runs. To enable efficient sharing of data among different simulation tools and other production analysis applications, CMSD is suggested as the primary output format. However, other output options are also supported to avoid hindrance to use of customized solutions. 4.2 Configuration and Automation The proposed method, and consequently also its demonstrator the GDM-Tool, is divided into two very central user activities; configuration and automation (see Figure 5). Configuration is required one time to map the content of the different data sources to each other and to specify all transformation operations required to obtain the simulation information. Once this mapping is performed, data processing can be repeated in automation mode without further efforts, as long as the modeled system remains unchanged. This reduces the time-consumption for input data management significantly and enables industrial engineers to use updated simulation models as every-day-tools. Continuously updated Data source A Required 1 time GDM-Tool config. mode Production Data source B Specification Data source C GDM-Tool automation mode Simulation model Request new data Figure 5: The GDM-Tool has two different modes: configuration and automation. Configuration includes connecting the GDM-Tool to all necessary sources, setting up a sequence of activities to transform the raw data to simulation information and exporting the results to a CMSD document. All this is done by applying a series of tools (area B in Figure 6) for input, transformation and output (further exemplified in section 4.3). The series of tools

148 is continuously stored in a configuration path (area C), which can automatically be repeated for updated data sets in automation mode. A B C Figure 6: Overview of the user interface in configuration mode. Automation mode is intended to be more frequently applied than configuration mode in order to gain advantage compared to the traditional manual approach. Every time the simulation engineer plans to run the model, he or she loads the previously specified configuration, specifies the location of the data sources and selects a target for the output file. All steps along the configuration path will be executed when the user clicks the Run button, and the CMSD file will be updated with the most recent production data; see Figure 7. Figure 7: User interface in automation mode. The automation mode is maneuvered separately from the configuration mode, which makes this process significantly quicker than the configuration and requires less detailed knowledge about the data sources and the production systems. Consequently, the reduction of timeconsumption is more significant the more frequently a simulation model is used. Therefore, it is also important to distinguish between two different users of the GDM-Tool. The user who sets up the configuration (user 1) must be trained in using the different functions within the GDM-Tool and also needs extensive knowledge about the internal data sources, the require-

149 ments of the simulation model, the production system and, finally, the background statistics used in the process of transforming data into simulation information. The second user (user 2), who is probably an industrial engineer working with production improvements, just needs to know the location of raw data and how to use the final results during the simulation analyses. 4.3 Architecture for Flexibility and Future Development The GDM-Tool is a Windows -based desktop program written in C#.NET, and the thoughts behind the software solution are based on two important principles: 1. The software solution shall be generic in the sense that it must be possible to integrate with raw data sources of varying structure and format, regardless of organization or company. 2. The software solution shall be extensible without the need for a re-compile of the entire software. Furthermore, no programming skills shall be required, neither for usage in automation mode nor for configuration. Due to the lack of standardized data structures or use of existing standards among raw data sources, it is unreasonable to strive towards a completely generic interface between the data sources and the simulation applications. Instead, the GDM-Tool uses a plug-in based architecture to facilitate the configuration process described in previous section. All functions needed to connect to each specific data source and to transform the data into information correspond to a plug-in. In this case study, user 1 can apply the plug-ins in Table 4, accessible from area A in Figure 6, to set up a configuration. Note that each plug-in corresponds to one transformation method (Table 1) except from the plug-ins handling the extraction of data end export of information. Table 4: Data operations, extraction and export options implemented as plug-ins in. Name CMSD Create Numeric Data Column Create Relation Create String Data Column Data Converter Date Interval Excel Extract Distinct Values Merge Tables Numeric Converter Functionality Exports information prepared by a configuration in the GDM-Tool to an XML-file following the CMSD standard. Uses either a predefined CMSD XML schema or user defined tags. Performs basic user defined calculations on the raw data columns. For example used to change time units of data points. Relates data from different sources to each other. The user specifies a parent and a child table, and one key column in each table. Used to label data points, usually with the purpose of mapping data points between different sources. Specific plug-in to create homogeneity between time-stamp units in various data sources. Used to filter data points based on their time of collection. Import data sources provided as MS Excel spread-sheets. Extracts unique value combinations from one table and stores them in a new table. Merges two data sources (tables) into one and prevents data duplication. Converts a data column into a requested data type.

150 Numeric Interval PI Data Value Corrector Remove Column Remove/Replace Sequence Remove Row Rename Table Rows to columns Split Column Statistics Table to Text File Text File Time Difference Filters data points based on user defined upper and lower bounds. Recalculates a duration based on a work schema provided in a second data source (table). Cleans data files from irrelevant information or data points. Used to remove or replace entire or parts of data labels. Usually to create homogenous descriptions of data points between different sources. Used in data cleaning to remove samples based on a data value or a related description on the same row in the table. Used for labeling or description of a table. Translates a row of data points into a column. Splits a column of data or descriptions into several columns, based on a user defined separating character. A plug-in that condenses columns of data points into statistical distributions. Provides the possibility to export information in customized formats (other than CMSD). Import data sources provided as text files. Calculate the time difference between time-stamps. Thus, by combining plug-ins which all performs operations on the data tables in a sequence, the user can build complex systems structuring data; see Figure 8. Note the correspondence to ports and services described in ontology-driven data transformation [23]. Figure 8 shows the configuration path for obtaining Time To Repair (TTR) information from the data sources in the case study. Remove Column and Remove Row by Column Value are used to clean the data from irrelevant samples. Split Column, Numeric Converter and Create Numeric Data Column are all applied to obtain homogeneity between the data types and units in the two different data sources. PI Data Value Corrector and Statistics are plug-ins responsible for calculations required to create simulation information. The CMSD plug-in exports the information to a neutral format possible to use in various simulation software packages. Excel Remove Column (repeted) Remove Row by Column Value Split Column Numeric Converter Create Numeric Data Column Excel PI Data Value Corrector Statistics CMSD Figure 8: Example of plug-in based software architecture used in GDM-Tool. The plug-in structure is also a mean to fulfill the second principle, to enable easy extension of the GDM-Tool. At least during the development and validation process, there will be needs for extensions and modifications in order to make the solution more generic. This is possible since plug-ins are separately developed and compiled, and the application will automatically detect new plug-ins and allow users to apply them without modifying or re-compiling the main program.

151 4.4 Test Results The test results, comparing a complete use of the GDM-Tool with the previously described manual procedure (Figure 1), are shown below. Both the manual process and the GDM-Tool are compared to throughput statistics from the real-world process. The output statistics are collected during the same weeks as the cycle times were mapped. However, the stop times are collected during a longer period of time to obtain sufficient samples for rigorous statistical analysis. Table 5: Test results describing the difference in time-consumption between a manual and an automated approach to input data management. Process/Activities User Tools Time-consumption Traditional manual Extraction, categorization, calculations, cleaning Simulation engineer (similar to user 2) Condensation, Simulation engineer documentation (similar to user 2) Total manual MS Excel Distribution fitting tool 6 hours, 15 minutes 3 hours 9 hours, 15 minutes Automated The GDM-Tool Configuration User 1 Configuration mode 2 hours Automated User 2 Automatic mode < 1 min Total automated Difference 2 hours 7 hours, 15 minutes Table 5 shows the results with regard to time-consumption measured over the entire process, starting with extraction of raw data and ending with simulation data residing in an interface ready to use in a simulation model. The time-consumption was reduced by 78%, including the configuration steps, provided that all necessary plug-ins are available. In other words, no further development of the software is necessary. There were slightly more than rows of raw data for breakdowns and around 7200 for cycle times. The work load for the traditional simulation engineer (user 2 in the automated process), who is usually responsible for all data handling activities in the manual approach, will be dramatically reduced. Some of this work is instead replaced by the configuration required by user 1 of the GDM-Tool. Thus, the total time reduction of 78% includes both users. Table 6: Test results describing the difference in model output between the GDM-Tool, manual input data management and the real-world production system. Period with known cycle times Extended simulation period Output Manual Real-world GDM-Tool Manual Real-world GDM-Tool Mean Std Dev Table 6 shows the output data from the real-world process and from one and the same simulation model with input data prepared both manually and by means of the GDM-Tool. All results are given in products per time unit, but the time unit is unpublished due to confidentiality. To the left in Table 6, the simulation results are compared to real-world data from the same period of time that during which the raw data for cycle time was collected. To the right, the

152 simulation period was extended to six months but still used the same data for cycle times. Hence, in this case the cycle time data is expected to be generally applicable. The results show that the data prepared by the GDM-Tool underestimates the total output of products by 2% during the period with correct cycle times. During the same time, the manually prepared data overestimates the output by 2%. Using the hypothesis test described in section 3.1, it is stated that there is no statistical basis for inferring a difference between the two approaches of input data management. For the extended simulation period the same differences are 4% and 2% in comparison to real-world data for the GDM-Tool and the manual process respectively. Table 6 also shows that the simulation model does not perfectly mimic the variability in the production process, either with manually or with automatically prepared data. The standard deviation in the model is 44% lower than in the real-world system using the GDM-Tool. This deviation stems from deliberate decisions made by the company regarding disturbances included in the simulation model. This is further discussed in section Discussion The GDM-Tool is a demonstrator for an intermediary computer application transforming available production data to relevant information for simulation models. The software solution automates two of the three most time-consuming input data management activities [7]. These two activities are: extraction of raw data from original sources; and transformation of raw data into information by means of calculation and cleaning operations along with distribution fitting. Additionally, the presentation of input information to the simulation model is automated by the GDM-Tool and this information can be exported to an interface selected by the user. The default interface in this study is CMSD [14], which enables automated import of processed input data to different simulation applications, such as Enterprise Dynamics [26][27][28], Plant Simulation [28], Visual Components [29][30], ARENA [31] and QUEST [32]. Thus, thanks to the use of CMSD, the proposed solution is not dependent on specific simulation software packages, but in this particular case study Enterprise Dynamics was used as an example. Furthermore, the output data from the GDM-Tool is not necessarily customized for specific DES purposes in terms of formalism, since the CMSD standard is independent of model logics. Thus, data formats are specified via CMSD and this data can be used for several different modeling purposes or even for other production analyses with similar data requirements, such as line balancing, layout planning, value stream mapping etc. Instead, logical relations specifying the model formalism, e.g. Discrete Event System Specification (DEVS) [34], are specified in separate standards. One example of such standard is provided by Taylor [33], who elaborates on logics needed to run DES models distributed on multi processors. This terminology was, together with CMSD, recently standardized under SISO [24]. There is no obvious contradiction, rather benefits, from utilizing an even more standardized modeling methodology by combining logics and data in a structured effort. The result from the case study of the concept, reporting a time-reduction of 78% compared to a manual approach, indicates that automation is a potent way of reducing time-consumption in input data management. This article includes one case study, which alone is insufficient to propose the GDM-Tool as a generic framework for its purpose. However, this case description should be regarded as one of several tests for the concept of automated input data management, using an intermediary data processing application, originally proposed as methodology 3 in [10]. The same strategy has also been previously tested with positive results in other

153 case studies [11]. One of the differences between these two studies [10][11] and the demonstrator presented in this paper is that the GDM-Tool does not rely entirely on simulation data from major business systems. Moreover, the GDM-Tool also enables transformation of raw data into statistically represented simulation input. These features are advantageous since ERP systems, and other major databases within the CBS, in many cases lack required information, especially data describing dynamics of the modeled system [9][10][12]. Several real-world data sources and data operations have been evaluated in this study. However, there are numerous data sources in global industries, which are neither structured exactly as in this study nor provided in the same format. Our solution relies on the assumption that data sources are, or can be, represented as tables that are related to each other. Based on the sources in this case study and on the authors previous experiences, an overwhelming majority of production databases are of relational type. Still, further investigation of compatibleness with other industrial sources or with fictive demo sources is needed. This is however delimited from the case study and addressed as future research. One interesting format for future studies is XML, based on hierarchical representation, which is also possible to express as relational structures. Despite the results of this and previous studies, automated solutions are arguably not the optimal choice in all situations. For instance, if a simulation model is developed to analyze one specific problem, and is not intended for continuous use, the configuration step might take as long a time as to manually manage all data. But, we have shown that it takes only around two hours to set up a configuration with similar proportions as in our test case, using the GDM- Tool. In many situations this is less time than needed to perform the same process in a traditional way, for instance by using macros. And most importantly, the configuration is done only once. For further use, the process will be automatically performed in a minute with updated data from the same sources. Moreover, the use of CMSD as an output format also enables rapid supply of input data to various simulation applications, which also leads to more rapid DES projects. The data quality using the automated approach is proven to be equally credible as for the manual approach. 2% difference in mean output compared to the real-world system is satisfactory for most kind of analyses using DES. On the other hand, the variability in the production process is not exactly corresponding between the model and the real-world system. After further investigation, the reason was found in deliberate decisions made by the company regarding disturbance categories included in the simulation model. Meeting time, overtime work and exceptionally long breakdowns were not included, due to that the model is intended for improvements under normal production conditions. These exceptions are however still included in the real-world data used for validation, resulting in higher variation than in the model. Due to the fact that such disturbances were excluded even in the raw data, the same phenomenon was discovered also in during the manual data preparation process used as a reference. In future research, three research activities are prioritized in the development of our proposed framework for automated input data management. Firstly, the rationale for a middleware solution is the extensive diversity of production data sources. Research on including MDA tools [22] in data collection or in other ways achieve homogeneity in data structure is important for reaching the future state outlined in section 2.2. Secondly, additional case studies are needed to further validate the feasibility of automated data management and the proposed solution in particular. Finally, efforts towards a more formal terminology regarding data structures being consumed or produced by the different operations (plug-ins) would facilitate the communication between the two specified users of the GDM-Tool. Good inspiration for this is available in the ontology-driven framework for data transformation [23].

154 6. Conclusion This paper describes and exemplifies the possibilities to automate the process of input data handling in DES projects, where production data is available in a crude form in several differently structured data sources. The authors developed a software solution, the GDM-Tool, whose purpose is to demonstrate necessary functionality for automated transformation of production data to information for simulation models. Thus, the software solution is not aimed to be a commercial product but to act as a framework for design of intermediary computer applications between already existing raw data sources and simulation models. In a case study performed at an automotive company, the use of the GDM-Tool reduced required time for preparing input data from nine hours and fifteen minutes to two hours (78%) compared to a traditional manual approach. This time includes configuration of the GDM- Tool to required data sources, which is necessary only the first time of use. Hence, the timereduction is even more significant for repeated use of the model. Moreover, the data quality was compared using statistical hypothesis testing and there was no inference for a difference in data quality between the two approaches. Since input data management is considered to be one of the most time-consuming steps in simulation studies, the reported results are expected to enable shorter lead-times in simulation projects. Acknowledgement The funding for this research is granted by VINNOVA (Swedish Governmental Agency for Innovation Systems). The authors would also like to thank Marcus Johansson (former guest researcher at NIST), Jimmy Balderud and Anders Olofsson for their extensive and valuable contributions on design and development of the GDM-Tool [39]. Finally, Edward Williams (University of Michigan and PMC, Dearborn, Michigan, U.S.A.) and Cecilia Berlin (Chalmers University of Technology) have kindly provided valuable suggestions for enhancing the presentation of this paper. References [1] Abdel-Malek L., C. Wolf, F. Johnson, and T. Spencer Ш OR Practice: Survey results and reflections of practising INFORMS members. Journal of the Operational Research Society. 50(10): [2] Eriksson, U Diffusion of Discrete Event Simulation in Swedish Industry: On the way to an increased understanding. Doctorial dissertation, Department of Materials and Manufacturing Technology, Chalmers University of Technology, Gothenburg, Sweden. [3] Akkermans, H., and W. Bertrand On the usability of quantitative modelling in operations strategy decision making. International Journal of Operations & Production Management. 17(1): [4] Murphy, S.P., and T. Perera Successes and failures in UK/US development of simulation. Simulation Practice and Theory 9(6-8): [5] McLean, C., and S.K. Leong The expanding role of simulation in future manufacturing. In: Proceedings of the 2001 Winter Simulation Conference. B.A. Peters et al. (eds), , IEEE, Piscataway, New Jersey.

155 [6] Johansson, B., J. Johnsson, and A. Kinnander Information structure to support discrete event simulation in manufacturing systems. In: Proceedings of the 2003 Winter Simulation Conference. S. Chick et al. (eds), , IEEE, Piscataway, New Jersey. [7] Skoogh, A., and B. Johansson Mapping of the Time-Consumption During Input Data Management Activities. Simulation News Europe 19(2): [8] Trybula, W Building simulation models without data. In: 1994 IEEE International Conferance on Systems, Man, and Cybernetics. Humans, Information and Technology 1: [9] Perera, T., and K. Liyanage Methodology for rapid identification of input data in the simulation of manufacturing systems. Simulation Practice and Theory 7(7): [10] Robertson, N., and T. Perera Automated data collection for simulation?. Simulation Practice and Theory 9(6-8): [11] Randell, L.G., and G.S. Bolmsjö Database driven factory simulation: a proof-ofconcept demonstrator. In: Proceedings of the 2001 Winter Simulation Conference. B.A. Peters et al. (eds) , IEEE, Piscataway, New Jersey. [12] Moon, Y.B., and D. Phatak Enhancing ERP system s functionality with discrete event simulation. Industrial Management & Data Systems 105(9): [13] Fowler, J.W., and O. Rose Grand Challenges in Modeling and Simulation of Complex Manufacturing Systems. SIMULATION 80(9): [14] SISO Simulation Interoperability Standards Organization, CMSD Product Development Group Standard for: Core Manufacturing Simulation Data - UML Model, Feb. 13, 2009 [15] Riddick, F.H., and Y.T. Lee Representing Layout Information in the CMSD Specification. In: Proceedings of the 2008 Winter Simulation Conference. S.J. Mason et al. (eds) , IEEE, Piscataway, New Jersey. [16] Davenport, T.H., and L. Prusak Working knowledge: How organizations manage what they know. Harvard Business School Press: Boston, Massachusetts [17] Robinson, S Simulation: The Practice of Model Development and Use. John Wiley & Sons Ltd: Chichester. [18] Robinson, S., and V. Bhatia Secrets of successful simulation projects. In: Proceedings of the 1995 Winter Simulation Conference. C. Alexopoulos et al. (eds) , IEEE, Piscataway, New Jersey. [19] Law, A. M., and M. G. McComas How the ExpertFit Distribution-Fitting Software can make your Simulation Models more Valid. In: Proceedings of the 2003 Winter Simulation Conference, S. Chick et al. (eds) , New Orleans, Louisiana, USA.

156 [20] Altiok, T., and B. Melamed Simulation Modeling and Analysis with ARENA. Academic Press: Electronic Resource [21] Skoogh. A., J.-P. André, C. Dudas, J. Svensson, M. Urenda Moris, and B. Johansson An Approach to Input Data Management in Discrete Event Simulation Projects: A Proof of Concept Demonstrator. In: Proceedings of the 6th EUROSIM Congress on Modelling and Simulation. B. Zupančič et al. (eds). ARGESIM: Vienna [22] Aufenanger, M., A. Blecken, and C. Laroque Design and Implementation of an MDA Interface for Flexible Data Capturing. Journal of Simulation 4(4): [23] Bowers, S., and Ludäscher, B An Ontology Driven Framework for Data Transformation in Scientific Workflows. In: International Workshop on Data Integration in the Life Sciences (DILS), Leipzig, Germany. [24] Simulation Interoperability Standards Organization (SISO): SISO Policies and Procedures. Available via (Accessed ). [25] UML Resource Page: Unified Modeling Language. Available via < [accessed January 3, 2009]. [26] Johansson, M., and R. Zachrisson Modeling automotive manufacturing process. M.Sc. Thesis. Department of Product and Production development. Chalmers University of Technology, Gothenburg, Sweden. [27] Johansson, M., S.K. Leong, Y. T. Lee, F.H. Riddick, G. Shao, B. Johansson, A. Skoogh, and P. Klingstam A Test Implementation of the Core Manufacturing Simulation Data Specification, In Proceedings of the 2007 Winter Simulation Conference, S.G. Henderson et al. (eds) IEEE, Piscataway, New Jersey [28] Johansson, M., B. Johansson, S.K. Leong, F.H. Riddick, Y.T. Lee A Real World Pilot Implementation of the Core Manufacturing Simulation Data Model. In: Proceedings of the Summer Computer Simulation Conference, SCSC08. Edinburgh, Scotland. [29] Johansson, B., Å. Fasth, J. Stahre, J. Heilala, S.K. Leong, Y.T. Lee, and F.H. Riddick Enabling Flexible Manufacturing Systems by Using Level of Automation as Design Parameter. In: Proceedings of the 2009 Winter Simulation Conference (M.D. Rossetti, R.R. Hill, B. Johansson, A. Dunkin and R.G. Ingalls, (eds.)), , Austin, Texas, USA. [30] Johansson, B., A. Skoogh, M. Mani, S.K. Leong Discrete Event Simulation to generate Requirements Specification for Sustainable Manufacturing Systems Design. In: Proceedings of the 2009 Performance Metrics for Intelligent Systems Workshop (Per- MIS'09), Gaithersburg, Maryland, USA. [31] Bengtsson, N.E., G. Shao, Y.T. Lee, S.K. Leong, C.R. McLean, B. Johansson, and A. Skoogh Input Data Management Methodology for Discrete Event Simulation. In: Proceedings of the 2009 Winter Simulation Conference (M.D. Rossetti, R.R. Hill, B. Johansson, A. Dunkin and R.G. Ingalls, (eds.)), , Austin, Texas, USA.

157 [32] Kibira, D. and S. K. Leong Test of Core Manufacturing Simulation Data Specification in Automotive Assembly. In: Proceedings of the Simulation Interoperability Standards Organization (SISO) and Society for Modeling and Simulation (SCS) International European Multi Conference, Orlando, Florida, USA. [33] Taylor S.J.E Interoperating COTS Simulation Modelling Packages: A Call for the Standardisation of Entity Representation in the High Level Architecture Object Model Template. In: Proceedings of the 2002 European Simulation Symposium, Dresden, Germany. Society for Computer Simulation, San Diego, CA, USA. [34] Zeigler, B. P., H. Praehofer, and T. G. Kim Theory of modeling and simulation: Integrating discrete event and continuous complex dynamic systems. 2 ed. San Diego, USA: Academic Press. [35] Yin, R.K Case study research: design and methods (2 nd ed). SAGE Publications: Thousand Oaks. [36] Montgomery, D.C., and G.C. Runger Applied statistics and probability for engineers (2 nd ed). John Wiley & Sons, New York [37] Liker, J.K The Toyota Way: 14 Management Principles from the World s Greatest Manufacturer. McGraw-Hill, New York [38] Williams, E J. (1994). Downtime Data -- its Collection, Analysis, and Importance. In: Proceedings of the 1994 Winter Simulation Conference (J.D. Tew, M.S. Manivannan, D.A. Sadowski, and A.F. Seila (eds)), , Orlando, Florida, USA. [39] Balderud, J., and A. Olofsson A Plug-in Based Software Architecture for Generic Data Management. M.Sc. Thesis, Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden.

158

159 Publication VI Skoogh, A., J. Michaloski, and N. Bengtsson Towards Continuously Updated Simulation Models: Combining Automated Raw Data Collection and Automated Data Processing. In: Proceedings of the 2010 Winter Simulation Conference, eds. B. Johansson, S. Jain, J. Montoya- Torres, J. Hugan, and E. Yücesan,

160

161 Proceedings of the 2010 Winter Simulation Conference B. Johansson, S. Jain, J. Montoya-Torres, J. Hugan, and E. Yücesan, eds. TOWARDS CONTINUOUSLY UPDATED SIMULATION MODELS: COMBINING AUTOMATED RAW DATA COLLECTION AND AUTOMATED DATA PROCESSING Anders Skoogh Product and Production Development Chalmers University of Technology Gothenburg, , SWEDEN John Michaloski National Institute of Standards and Technology 100 Bureau Drive Gaithersburg, MD , USA Nils Bengtsson Production Modeling Corporation Gullbergs Strandgata 36D Gothenburg, , SWEDEN ABSTRACT Discrete Event Simulation (DES) is a powerful tool for efficiency improvements in production. However, instead of integrating the tool in the daily work of production engineers, companies apply it mostly in single-purpose studies such as major investment projects. One significant reason is the extensive timeconsumption for input data management, which has to be performed for every simulation analysis to avoid making decisions based upon obsolete facts. This paper presents an approach that combines automated raw data collection and automated processing of raw data to simulation information. MTConnect is used for collection of raw data and the GDM-Tool is applied for data processing. The purpose is to enable efficient reuse of DES models by reducing the time-consumption for input data management. Furthermore, the approach is evaluated using production data from the aerospace industry. 1 INTRODUCTION Compared to desirable values presented by production researchers, the utilization of production resources is generally low in industry. For example, the Overall Equipment Efficiency (OEE) is often found to be 50-60% in companies across various lines of business (Ingemansson 2004). Nakajima (1988) states that an OEE figure of 85% should be possible to achieve and that 90% is the ideal target. Companies are well aware of the potential of reducing this gap and, thus, continuously work with system improvements in a systematic manner. However, despite the presence of powerful computer support, Information Technology (IT) tools are underutilized in support of continuous improvement processes. One example is Discrete Event Simulation (DES), which is applied in fewer projects than it should be when considering its capabilities (McLean and Leong 2001). When correctly applied, DES works well in combination with continuous improvement philosophies like the Theory Of Constraints (TOC) (Goldratt 1990). During the philosophy s five iterative steps (Rahman 1998), simulation can be profitably applied in the following manner:

162 Skoogh, Michaloski and Bengtsson 1. Identify the system s constraint(s): DES results can be used to identify bottlenecks, for example by studying buffer sizes, resource utilization and machine states. 2. Decide how to exploit the system s constraint(s): DES can be used to optimize buffer sizes around the bottleneck. 3. Subordinate other resources to the constraint(s): DES can be used to evaluate what-if scenarios on how workers should prioritize the different production resources. 4. Elevate the system s constraint(s): DES is a powerful tool for investment analysis. 5. If in any of the previous steps a constraint is broken, go back to step 1. Based on the arguments above, and its capability of analyzing dynamic aspects of production flows, DES is an excellent tool for support of continuous improvement efforts. However, for various reasons, DES is more often applied in well-delimited projects than on a daily basis (Williams 1996). One reason is the extensive time-consumption for raw data collection and for the processing of raw data to input information for simulation models. These steps, i.e. the input data management process, consume on average 31% of the time for an entire simulation study (Skoogh and Johansson 2010). This problem becomes even more evident when DES is use on a daily basis, since the shop floor status changes continuously and therefore requires repetitive efforts for input data management Previous studies show that the main reason for the extensive time-consumption is that most companies collect raw data manually and convert it to information each time a simulation study is launched. One solution is to automate as many activities as possible and thereby enable automatic updates. The purpose of this paper is to enable reuse of DES models by reducing the time-consumption for input data management. The aim is to evaluate the feasibility of combining automated raw data collection and automated data processing into a push-button solution for DES. In this study, MTConnect (MTConnect Institute 2010) is used to log time-stamps containing machine status directly from the machines. These raw data are then submitted to a tool for automated data processing, called the Generic Data Management Tool (GDM-Tool) (Skoogh 2009); see Figure 1. The GDM-Tool is a middleware solution that automates all data processing steps, including categorization, cleaning, calculation and condensation to statistical distributions. Using this solution, companies can apply DES in their daily work with continuous improvements, without spending time and effort on the repetitive work of input data management. GDM- Tool Simulation model Machine Raw Data XML file Simulation data, e.g. CMSD Automated supply Figure 1: Schematic data flow from machine to simulation model. 2 REVIEW OF INPUT DATA MANAGEMENT TOOLS Previous research has examined the possible levels of automation in input data management for simulation. A good summary of four different approaches is provided by Robertson and Perera (2002): 1. Tailor-made solution: Data is primarily collected and processed by the project team, manually supplied to the model and resides in the simulation tool.

163 Skoogh, Michaloski and Bengtsson 2. Spreadsheet solution: Data is primarily collected and processed by the project team, manually supplied to an intermediate spreadsheet interface and then automatically read by the simulation model. Data resides in the spreadsheet. 3. Off-line database solution: Data primarily derived from Corporate Business System (CBS), automatically supplied to an intermediate off-line database and then automatically read by the simulation model. Data resides in the database. 4. On-line database solution: Data primarily derived from CBS and automatically supplied to the simulation model without intermediate steps. Data resides in the CBS. Robertson and Perera (2002) found that the manual solutions (1 and 2) were still most applied in industry, but there have been advances towards steps three and four after publication of their article. However, integration of high level corporate databases such as Enterprise Resource Planning (ERP) systems has proved to be difficult because they contain data too imprecise for dynamic simulations (Moon and Phatak 2005). As displayed in Figure 1, a data management system contains two main components: the collection of raw data and the processing of raw data into appropriate inputs for a simulation model. For collection, there are a few alternatives presented in previous publications. Ingemansson et al. (2005) presents a case study in an automotive company where an automated data collection system was used to log breakdown time-stamps from NC (Numerically Controlled) machines. The communication technology used in their case study and the format of the raw data is, however, unspecified. In addition to reduced timeconsumption, their case study shows that automated collection of raw data also increases the data quality. See below for a brief review of other solutions for production data logging. 2.1 OPC for Raw Data Collection One specific communication technology applicable for raw data collection is OPC (OLE (Object Linking and Embedding) for Process Control), which is a leading worldwide specification in enabling connectivity and interoperability of factory floor equipment. OPC is an integration technology developed by the OPC Foundation that defines a standard interface to control devices (OPC Foundation 2010). OPC promotes interoperability both horizontally and vertically in the enterprise so that it can cut integration costs, speed deployment and promote increased operating efficiency. The OPC software specification describes a client/server object model to allow communication between client applications (OPC Clients) and control device servers (OPC Servers). OPC handles integration by creating a software bus so that applications need only know the data required from OPC data sources, not how to get it. OPC Foundation has defined several OPC interface specifications, including Data Access (DA), Event and Alarm Management, and Historical Data Access. The Data Access specification provides a standard mechanism for communicating to data sources on a factory floor. The Event and Alarm Management specification defines a means for transmitting alarm and event information between OPC servers and clients. The Historical Data Access specification allows OPC clients to access historical archives to retrieve and store data in a uniform manner. The major weakness of OPC is that it is primarily a standard IO communication mechanism, and does not offer any standard device information models. For example, OPC Clients are required to make assumptions that certain items would be available from the CNC (Computed Numerically Controlled) OPC Server, but there is not a guarantee to the tag availability, name, or type of the data. 2.2 DPWS Combined with Automated Data Processing A similar proposal as the one presented in this paper is provided by Aufenanger et al. (to be published). They used the Devices Profile for Web Services (DPWS) for collection of raw data from manufacturing resources such as conveyor belts and machines. DPWS enables collection of raw data, which is then stored in a database of an in-house developed application called Machine Data Acquisition (MDA). Fur-

164 Skoogh, Michaloski and Bengtsson thermore the MDA application also includes algorithms for processing the raw data to serve as input to a simulation model. Raw data collection using DPWS is generic and automatically detects production resources connected to the local network, but the algorithms applied for data processing are specifically developed for each implementation. The paper states that their concept shows promising results in a test evaluation in laboratory environment. 3 METHOD This paper combines the capabilities of two existing technologies, MTConnect and the GDM-Tool, in order to achieve a push-button solution for input data management in DES. The solution is designed and tested in a case study at a manufacturing company in order to ensure that it is applicable in a real world context. In addition to designing and demonstrating the solution, this study also compares it to alternative procedures of input data management. The comparison is performed by measuring the time-consumption for completing the input data management process using a completely manual approach and the proposed approach combining the capabilities of MTConnect and the GDM-Tool. The manual approach is, according to the authors experience as well as Robertson and Perera (2002) (solutions 1 and 2), the most commonly applied in industry. It includes raw data collection from available data sources when present but also complimentary manual gathering, e.g. using a stop-watch. Necessary data cleaning such as elimination of outliers is performed using basic formulas in MS Excel or similar applications. This is also the case for calculations, e.g. for the time between failures by finding the difference in start times of two failures. The data is finally condensed (usually to statistical distributions) using a distribution-fitting software or in worst case by assuming appropriate distribution families. This case study is also a part of the validation of the GDM-Tool, which was originally designed and developed in a case study at a company in the automotive industry. The strategy is to test the tool in various case studies in different lines of business and thereby evaluate whether its approach to automated input data management is feasible. Moreover, the different case studies will introduce further requirements on the tool and result in implementation of additional functionality. As a result, the tool will be increasingly robust and the vision is that it finally will be applicable for any DES study. 4 OUTLINE AND EVALUATION OF THE CONCEPT A test implementation has been performed at a large aerospace manufacturing company in collaboration with NIST to explore what new opportunities that data from MTConnect offers for DES. A work cell in a job shop environment with four high-speed, five-axes CNC machines together with one pallet shuttle system was simulated. In the work cell a large variety of aluminum parts with individual cycle times are manufactured with high speed machining. The scope of the test implementation has been to show how MTConnect data can be used together with automated data processing and DES to improve production. Using data processing the Cycle Time, Mean Time Between Failure (MTBF), Mean Time To Repair (MTTR) and Energy usage could be calculated directly from the MTConnect data. Note however that the automated data processing evaluated in this paper is delimited to MTBF and MTTR. An overview of what has been done can be seen in Figure 2. The goal for the entire research project was to study how energy usage data from MTConnect can be used for sustainability analysis in DES. It turned out that the data itself was useful for modeling but the savings in energy usage was unsatisfying. The data showed that the cost of running one of the machines in terms of electricity was around 50 cents. However, despite the low savings in this particular case the authors foresee an interesting future for combined DES and sustainability analysis.

165 Skoogh, Michaloski and Bengtsson GDM-Tool Discrete Event Simulation CMSD file MTBF MTTR Discrete Event Simulation What If? VB Script MT Connect Factory Integration Specification XML + HTTP MT Connect Figure 2: Overview of the test implementation. 4.1 MTConnect for Automated Raw Data Collection MTConnect is a new standard for data exchange on the shop floor. MTConnect is a specification based upon prevalent Web technology including Extensible Markup Language (XML) (The World Wide Web Consortium 2010) and Hypertext Transport Protocol (HTTP) (The Internet Society 1999). Using prevailing technology and providing free software development kits minimizes technical and economic barriers to MTConnect adoption. Http & XML Agent Client MTConnect Device Figure 3: MTConnect architecture. Figure 3 shows the MTConnect architecture, in which an MTConnect Device is a piece of factory equipment organized as a set of components that provide data. An MTConnect Agent is a process that acts as a bridge between a device and a Client Application. The MTConnect Agent receives and stores single or a time series of data samples or events that are made available to the Client Application. An MTConnect Client is typically a factory application, including shop floor dashboard visualization, Overall Equipment Effectiveness (OEE), and data mining of asset and process knowledge. The basic MTConnect specification provides a CNC machine tool information models including definitions for position, feeds, speeds, program, control logic, and some tooling. The MTConnect Information

166 Skoogh, Michaloski and Bengtsson Model is not hardwired; rather users assemble an XML information model to match their devices. MTConnect further provides XML attributes in which to help refine the Device Information models. Such XML attributes include Category, Name, Type, Subtype and Units. New additions to the Information Model are backwards compatible, in that MTConnect Clients will not break when new information is supplied to them, but will only be unaware of its existence. In previous work performed by the National Institute of Standards and Technology (NIST), data provided by MTConnect has been mapped according to the data requirements of DES. Table 1 shows how the raw data given by MTConnect can be converted to DES parameters. The table also includes sustainability metrics as a result of NIST s promising research on combining DES and sustainability analysis of production systems. Note however, that this test implementation regards only the down time and the time between down times, which are the most important parameters to mimic the dynamics of production systems (Williams 1994). Table 1: Mapping MTConnect data into possible DES parameters including sustainability metrics. MTConnect Data Timestamp(ts), Machine, Power, Mode, Execution, Program, Line, Sload, Xload, Yload, Zload, Aload, Bload, Cload, Toolnum, RPM, Alarm, AlarmState, AlarmSeverity, PartCount, Feedrate Cycle Time MTC mode =Auto & MTC rpm > 0 & MTC feed > 0 Setup Time MTC program(t)!= MTC program(t-1) =>T(MTC mode = Manual) excluding pallete shuttle program Machining Time Cycle Time Off Time MTC power = Off Down Time MTC alarm = true Idle Time T(MTC execution =Paused MTC mode =Manual) Mean Time Between E(x), where x = T(MTC alarm!=active, MTC alarm =active) Failure Mean Time To Repair E(x), where x = T(MTC alarm =active, MTC alarm!=active) Coolant Energy MTC mode =Auto & MTC rpm > 0 => Coolant max rated kw load, else 0.0 Power (kwh) = (MTC spindle MTC Xload MTC Yload MTC Zload 3.5+ MTC Aload 1+ MTC Cload 1)/ Baseline CO 2 Emissions lbs CO 2 per kwh NO x Emissions lbs NO x /kwh 4.2 The GDM-Tool for Automated Data Processing The GDM-Tool is a computer application that demonstrates the concept of automated input data management. All steps in the input data management process are automated, except for the collection of raw data; see Figure 4. This exception is the reason for the potential of integrating the tool with MTConnect. The GDM-Tool is a MS Windows desktop application developed in C# (Visual Studio) and was originally developed for a case study in automotive industry as a part of a Swedish research project called FACTS (Skoogh, 2009). Its main data processing features include: Data cleaning: Removal of irrelevant data points. One common example is in collection of machine stop times, where several stop categories usually have to be removed to suit the needs of dynamic simulations. Data calculations: One example is the calculation of times between failures (TBF) as a subtraction of two different failure start-times. Data condensation: Distribution fitting. Files of DES raw data often contain several thousand data points, and that is generally considered too much to supply to the simulation model. Hence, practitioners often prefer to use statistical distributions.

167 Skoogh, Michaloski and Bengtsson MT- Connect GDM-Tool Textfile Data source B Data source C Data extraction Data conversion Output preparation CMSD (XML) Spread -sheet Figure 4: Outline of the GDM-Tool functionality. Notice that the GDM-Tool is a middleware solution that works similarly to solution 3 presented in chapter 2 (Robertson and Perera, 2002) with the difference that the GDM-Tool does not store data itself. Instead, raw data is still stored in the original data sources, e.g. MTConnect XML-files, CBS or regular spreadsheets. The simulation information, which is the output from the GDM-Tool, can reside in any application selected by the user. A standardized format such as a CMSD (Core Manufacturing Simulation Data) (SISO 2009) file is preferred for increased interoperability between applications. An initial goal when designing the GDM-Tool was to make it completely generic and possible to automatically connect to any source of raw data. However, due to the diversity of data sources, with a huge amount of in-house legacy systems in industry, such a solution was not selected. Instead a user with good knowledge about the production process, the data sources and the simulation model has to set up a sequence of actions that the GDM-Tool will perform for a given simulation model. In other words a configuration is performed using tools for data extraction, data conversion and output preparation. These tools are implemented using a plug-in based architecture enabling convenient development and extension of the GDM-Tool functionality if required. Later, a configuration can be executed in automation mode and fed with updated data sources. The automation mode is intended to be the most frequently used among the two user environments. Thus, the more times a configuration can be used without changes, the more beneficial is the approach of automated input data management. Finally, an old configuration can also be opened in the configuration mode for further development by selecting edit mode in the dialogue box. 4.3 Execution of Data Conversion Using the GDM-Tool As previously stated, the production data used to evaluate the concept is taken from a real production system in the aerospace industry. The authors developed a script to poll raw data from the machines via MTConnect. Furthermore, a Visual Basic (VB) script was developed to extract the relevant data (see Table 1) from the MTConnect XML file and present it as state changes over a sufficient period of time; see Figure 5. Figure 5: Example of list of state changes.