Getting insights about life cycle cost drivers: an approach based on big data inspired statistical modelling



Similar documents
Classification of Bad Accounts in Credit Card Industry

Data Mining - Evaluation of Classifiers

Random forest algorithm in big data environment

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Data Mining. Nonlinear Classification

How To Make A Credit Risk Model For A Bank Account

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Software project cost estimation using AI techniques

MACHINE LEARNING IN HIGH ENERGY PHYSICS

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data

Pentaho Data Mining Last Modified on January 22, 2007

Classification Problems

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

Azure Machine Learning, SQL Data Mining and R

MS1b Statistical Data Mining

Gerard Mc Nulty Systems Optimisation Ltd BA.,B.A.I.,C.Eng.,F.I.E.I

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress)

Course Syllabus. Purposes of Course:

EST.03. An Introduction to Parametric Estimating

An Introduction to Data Mining

Applying Data Science to Sales Pipelines for Fun and Profit

Better decision making under uncertain conditions using Monte Carlo Simulation

Data Mining Techniques for Prognosis in Pancreatic Cancer

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Data Mining mit der JMSL Numerical Library for Java Applications

Benchmarking of different classes of models used for credit scoring

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

RAVEN: A GUI and an Artificial Intelligence Engine in a Dynamic PRA Framework

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

Knowledge Based Descriptive Neural Networks

Customer Classification And Prediction Based On Data Mining Technique

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Comparison of K-means and Backpropagation Data Mining Algorithms

Risk pricing for Australian Motor Insurance

Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement

MSCA Introduction to Statistical Concepts

USING DATA MINING FOR BANK DIRECT MARKETING: AN APPLICATION OF THE CRISP-DM METHODOLOGY

Dynamic Predictive Modeling in Claims Management - Is it a Game Changer?

Audit Data Analytics. Bob Dohrer, IAASB Member and Working Group Chair. Miklos Vasarhelyi Phillip McCollough

Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring

An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics

The Scientific Data Mining Process

SOA 2013 Life & Annuity Symposium May 6-7, Session 30 PD, Predictive Modeling Applications for Life and Annuity Pricing and Underwriting

TOTAL ASSET MANAGEMENT. Life Cycle Costing Guideline

The Predictive Data Mining Revolution in Scorecards:

Prediction of Stock Performance Using Analytical Techniques

Social Media Mining. Data Mining Essentials

Mimicking human fake review detection on Trustpilot

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

DATA MINING TECHNIQUES AND APPLICATIONS

Data Isn't Everything

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Big Data Text Mining and Visualization. Anton Heijs

MHI3000 Big Data Analytics for Health Care Final Project Report

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Forecasting stock markets with Twitter

Chapter 6. The stacking ensemble approach

Introduction to Engineering System Dynamics

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

Fuzzy Signature Neural Network

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Predictive Modeling Techniques in Insurance

Predictive Modeling and Big Data

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

A Comparison Between Data Mining Prediction Algorithms for Fault Detection (Case study: Ahanpishegan co.)

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Unlocking Value from. Patanjali V, Lead Data Scientist, Tiger Analytics Anand B, Director Analytics Consulting,Tiger Analytics

Program Life Cycle Cost Driver Model (LCCDM) Daniel W. Miles, General Physics, June 2008

What methods are used to conduct testing?

Spend Enrichment: Making better decisions starts with accurate data

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

An innovative approach combining industrial process data analytics and operator participation to implement lean energy programs: A Case Study

Learning is a very general term denoting the way in which agents:

A DIFFERENT KIND OF PROJECT MANAGEMENT

The Operational Value of Social Media Information. Social Media and Customer Interaction

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

Generating analytics impact for a leading aircraft component manufacturer

Predict Influencers in the Social Network

Sales Forecast for Pickup Truck Parts:

Random Forest Based Imbalanced Data Cleaning and Classification

Learning outcomes. Knowledge and understanding. Competence and skills

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

How To Get A Masters Degree In Logistics And Supply Chain Management

A Property & Casualty Insurance Predictive Modeling Process in SAS

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

Transcription:

Introduction A Big Data applied to LCC Conclusion, Getting insights about life cycle cost drivers: an approach based on big data inspired statistical modelling Instituto Superior Técnico, Universidade de Lisboa, Portugal jean-loup.loyer@tecnico.ulisboa.pt EDAM Engineering Design Roundtables February 24, 2015

Introduction A Big Data applied to LCC Conclusion Table of Contents 1 Introduction Context Challenges in LCC 2 A Big Data applied to LCC LCC framework Manufacturing cost Maintenance cost 3 Conclusion Summary Discussion Extension

Introduction A Big Data applied to LCC Conclusion Context Challenges in LCC Context and motivation Industrial context of jet engine manufacturing Competitive pressure, challenging customer requirements, demanding airworthiness regulations Development of quality and profitable products Trading lifecycle cost (LCC) with performance, weight, quality, etc. Importance of LCC modeling in early design of jet engines [Haskins, 2010] Major part of the cost typically committed during this early phase [Barton, 2001] Multi-attributes decision problem requiring a Systems Engineering (SE) approach Ability to leverage large volumes of historical data to predict future LCC

Introduction A Big Data applied to LCC Conclusion Context Challenges in LCC Context and motivation Industrial context of jet engine manufacturing Competitive pressure, challenging customer requirements, demanding airworthiness regulations Development of quality and profitable products Trading lifecycle cost (LCC) with performance, weight, quality, etc. Importance of LCC modeling in early design of jet engines [Haskins, 2010] Major part of the cost typically committed during this early phase [Barton, 2001] Multi-attributes decision problem requiring a Systems Engineering (SE) approach Ability to leverage large volumes of historical data to predict future LCC

Introduction A Big Data applied to LCC Conclusion Context Challenges in LCC Jet engine 101

Introduction A Big Data applied to LCC Conclusion Context Challenges in LCC Definition of LCC NASA s definition of LCC «(LCC is) the total of the direct, indirect, recurring, nonrecurring, and other related expenses incurred, or estimated to be incurred, in the design, development, verification, production, operation, maintenance, support, and disposal of a project.» [NASA, 2010] Categories of (LCC) modeling approaches [Asiedu, 1998] Parametric (regression on a few variables) Analogous (extrapolation from similar items) Analytical (detailed, based on activity or physics) Challenges in LCC estimation in early design [Meisl, 1988] Insufficient details in requirements or contextual parameters Lack of quality data

Introduction A Big Data applied to LCC Conclusion Context Challenges in LCC Definition of LCC NASA s definition of LCC «(LCC is) the total of the direct, indirect, recurring, nonrecurring, and other related expenses incurred, or estimated to be incurred, in the design, development, verification, production, operation, maintenance, support, and disposal of a project.» [NASA, 2010] Categories of (LCC) modeling approaches [Asiedu, 1998] Parametric (regression on a few variables) Analogous (extrapolation from similar items) Analytical (detailed, based on activity or physics) Challenges in LCC estimation in early design [Meisl, 1988] Insufficient details in requirements or contextual parameters Lack of quality data

Introduction A Big Data applied to LCC Conclusion Context Challenges in LCC Definition of LCC NASA s definition of LCC «(LCC is) the total of the direct, indirect, recurring, nonrecurring, and other related expenses incurred, or estimated to be incurred, in the design, development, verification, production, operation, maintenance, support, and disposal of a project.» [NASA, 2010] Categories of (LCC) modeling approaches [Asiedu, 1998] Parametric (regression on a few variables) Analogous (extrapolation from similar items) Analytical (detailed, based on activity or physics) Challenges in LCC estimation in early design [Meisl, 1988] Insufficient details in requirements or contextual parameters Lack of quality data

Introduction A Big Data applied to LCC Conclusion LCC framework Manufacturing cost Maintenance cost Table of Contents 1 Introduction Context Challenges in LCC 2 A Big Data applied to LCC LCC framework Manufacturing cost Maintenance cost 3 Conclusion Summary Discussion Extension

Introduction A Big Data applied to LCC Conclusion LCC framework Manufacturing cost Maintenance cost Expression of LCC Two-terms (simplified) equation of LCC For a given mechanical component, at time t 0 : LCC t0 = MC t0 + i th shop visit R t i xmc ti where: MC t : Manufacturing cost at a given time t R ti : Replacement rate at the i th maintenance visit Main features of the LCC model LCC as a function of time: Manufacturing: materials rates, quantity, productivity... Maintenance: engine deterioration, fleet structure... Simplified equation of LCC No R&D (fuzzy) or disposal costs (negligible) Manufacturing cost: no labor or consumables considered Maintenance cost: repair not considered

Introduction A Big Data applied to LCC Conclusion LCC framework Manufacturing cost Maintenance cost Expression of LCC Two-terms (simplified) equation of LCC For a given mechanical component, at time t 0 : LCC t0 = MC t0 + i th shop visit R t i xmc ti where: MC t : Manufacturing cost at a given time t R ti : Replacement rate at the i th maintenance visit Main features of the LCC model LCC as a function of time: Manufacturing: materials rates, quantity, productivity... Maintenance: engine deterioration, fleet structure... Simplified equation of LCC No R&D (fuzzy) or disposal costs (negligible) Manufacturing cost: no labor or consumables considered Maintenance cost: repair not considered

Introduction A Big Data applied to LCC Conclusion LCC framework Manufacturing cost Maintenance cost Description of the case study Compressor blades Dozens of parts in each of the 14 stages Fleet of 2000+ engines currently in service Parts manufactured according to forging processes Deterioration drivers Temperature near combustion chamber Vibration, impacts...

Introduction A Big Data applied to LCC Conclusion LCC framework Manufacturing cost Maintenance cost Data availability Large maintenance dataset More than 2 millions of parts considered 100 input variables Irregular "dirty" data Small manufacturing dataset Manufacturing cost for the year of 2012 as the output variable 520 parts manufactured 6 covariates

Introduction A Big Data applied to LCC Conclusion LCC framework Manufacturing cost Maintenance cost Estimation of manufacturing cost Probabilistic equation of manufacturing cost For a given mechanical component, at time t: MC t = f (L, W, P, A, M, Q t, CR t ) where: MC t : Manufacturing cost (continuous output variable) f (.): Complex nonlinear function to estimate (non time dependent) Important preliminary remarks Model at the component level Time dimension due to economic-related variables Probabilistic aspect due to variability in the input variables

Introduction A Big Data applied to LCC Conclusion LCC framework Manufacturing cost Maintenance cost Estimation of manufacturing cost Probabilistic equation of manufacturing cost For a given mechanical component, at time t: MC t = f (L, W, P, A, M, Q t, CR t ) where: MC t : Manufacturing cost (continuous output variable) f (.): Complex nonlinear function to estimate (non time dependent) Important preliminary remarks Model at the component level Time dimension due to economic-related variables Probabilistic aspect due to variability in the input variables

Introduction A Big Data applied to LCC Conclusion LCC framework Manufacturing cost Maintenance cost Characteristics of the manufacturing cost model Seven input variables depending on each mechanical component Technical input variables Component length L and width W Manufacturing process P Alloy type A and machinability M Economics-related input variables Accumulated production volume Q t Materials cost rate CR t Procedure and criteria for model assessment Training the model with the data on 80% of the dataset and evaluate its generalization on remaining 20% Prediction accuracy measured by percentage of good prediction on kept data

Introduction A Big Data applied to LCC Conclusion LCC framework Manufacturing cost Maintenance cost Characteristics of the manufacturing cost model Seven input variables depending on each mechanical component Technical input variables Component length L and width W Manufacturing process P Alloy type A and machinability M Economics-related input variables Accumulated production volume Q t Materials cost rate CR t Procedure and criteria for model assessment Training the model with the data on 80% of the dataset and evaluate its generalization on remaining 20% Prediction accuracy measured by percentage of good prediction on kept data

Introduction A Big Data applied to LCC Conclusion LCC framework Manufacturing cost Maintenance cost Results Overall predictive accuracy Goodness-of-fit (R 2 ) Accuracy (MAPE) >0.80 >90% Relative influence of each covariate on manufacturing cost

Introduction A Big Data applied to LCC Conclusion LCC framework Manufacturing cost Maintenance cost Equation of maintenance cost Probabilistic equation of maintenance cost For a given mechanical component, at time t i : R ti = g(t j T, Pj T, Sj T, V j T,..., AT T, AP T, H T, C T,...) where: R ti : Average replacement rate of the part (binary output) g(.): Complex nonlinear function to estimate T = [t i 1, t i ]: Time between two maintenance visits Important remarks Hundreds of input variables extracted from complex time series (air temperature AT T or pressure AP T...) Six binary classifiers used as statistical models: logistic regression, support vector machines, decision trees, random forests, boosted trees, neural networks

Introduction A Big Data applied to LCC Conclusion LCC framework Manufacturing cost Maintenance cost Equation of maintenance cost Probabilistic equation of maintenance cost For a given mechanical component, at time t i : R ti = g(t j T, Pj T, Sj T, V j T,..., AT T, AP T, H T, C T,...) where: R ti : Average replacement rate of the part (binary output) g(.): Complex nonlinear function to estimate T = [t i 1, t i ]: Time between two maintenance visits Important remarks Hundreds of input variables extracted from complex time series (air temperature AT T or pressure AP T...) Six binary classifiers used as statistical models: logistic regression, support vector machines, decision trees, random forests, boosted trees, neural networks

Introduction A Big Data applied to LCC Conclusion LCC framework Manufacturing cost Maintenance cost Characteristics of the maintenance cost model Pre-processing to extract features from raw historical sensor data Min/max, mean, variance Curve trend Discontinuities Class based on patterns Signature analysis (wavelets, splines) Criteria assessing model performance % of correctly predicted outcomes Area Under the ROC curve (AUC) more robust

Introduction A Big Data applied to LCC Conclusion LCC framework Manufacturing cost Maintenance cost Results Procedure and criteria for model assessment similar to the ones used for manufacturing cost Training with 80% of the data and prediction accuracy on remaining 20% Binary classifiers in lieu of regression models in manufacturing cost Goodness-of-fit measured by TPR Prediction accuracy measured by AUC Goodness-of-fit (TPR) Accuracy (AUC) >85% >85%

Introduction A Big Data applied to LCC Conclusion LCC framework Manufacturing cost Maintenance cost Results Procedure and criteria for model assessment similar to the ones used for manufacturing cost Training with 80% of the data and prediction accuracy on remaining 20% Binary classifiers in lieu of regression models in manufacturing cost Goodness-of-fit measured by TPR Prediction accuracy measured by AUC Goodness-of-fit (TPR) Accuracy (AUC) >85% >85%

Introduction A Big Data applied to LCC Conclusion Summary Discussion Extension Table of Contents 1 Introduction Context Challenges in LCC 2 A Big Data applied to LCC LCC framework Manufacturing cost Maintenance cost 3 Conclusion Summary Discussion Extension

Introduction A Big Data applied to LCC Conclusion Summary Discussion Extension Take-home messages A valid innovative framework Validity of the data-driven, statistics-based approach Scalable, reusable, maintainable model of LCC Contribute to structure the data within the IT system Satisfactory results Complementary insights from ensemble models Good fit (R 2 > 0.8) and prediction accuracy (MAPE > 0.8 & AUC > 85%) on new data, for manufacturing and maintenance costs Ranking of most relevant drivers LCC, challenging traditional engineering approach

Introduction A Big Data applied to LCC Conclusion Summary Discussion Extension Take-home messages A valid innovative framework Validity of the data-driven, statistics-based approach Scalable, reusable, maintainable model of LCC Contribute to structure the data within the IT system Satisfactory results Complementary insights from ensemble models Good fit (R 2 > 0.8) and prediction accuracy (MAPE > 0.8 & AUC > 85%) on new data, for manufacturing and maintenance costs Ranking of most relevant drivers LCC, challenging traditional engineering approach

Introduction A Big Data applied to LCC Conclusion Summary Discussion Extension Benefits and limits of the statistical approach Benefits Generic framework Applicable at system, subsystem and component levels Applicable to many (mechanical) products and sectors Probabilistic, data-driven approach Risk in decision-making and financial evaluation Estimation based on actual data (less dubious assumptions needed) Limits Data-driven (!) Requires sufficient amount of actual good quality data Pre-existing databases and data-rich IT infrastructure Requires statistical training amongst engineers Less applicable to families of products with low commonality

Introduction A Big Data applied to LCC Conclusion Summary Discussion Extension Benefits and limits of the statistical approach Benefits Generic framework Applicable at system, subsystem and component levels Applicable to many (mechanical) products and sectors Probabilistic, data-driven approach Risk in decision-making and financial evaluation Estimation based on actual data (less dubious assumptions needed) Limits Data-driven (!) Requires sufficient amount of actual good quality data Pre-existing databases and data-rich IT infrastructure Requires statistical training amongst engineers Less applicable to families of products with low commonality

Introduction A Big Data applied to LCC Conclusion Summary Discussion Extension Potential next steps Refine the models Add more input variables, better extracted from time series (maintenance cost) Use more sophisticated models (Dynamic Bayesian networks, Functional Data Analysis, Markov-like models...) Interactions between drivers of manufacturing and maintenance costs Benchmark and extend the results Compare with accuracy and variable importance obtained from current «manual» estimates of LCC by Rolls-Royce cost engineers Verify validity on different components or sectors

Introduction A Big Data applied to LCC Conclusion Summary Discussion Extension Potential next steps Refine the models Add more input variables, better extracted from time series (maintenance cost) Use more sophisticated models (Dynamic Bayesian networks, Functional Data Analysis, Markov-like models...) Interactions between drivers of manufacturing and maintenance costs Benchmark and extend the results Compare with accuracy and variable importance obtained from current «manual» estimates of LCC by Rolls-Royce cost engineers Verify validity on different components or sectors

Introduction A Big Data applied to LCC Conclusion Summary Discussion Extension References Asiedu Y. and P. Gu. 1998. Product life cycle cost analysis: state of the art review. International Journal of Production Research. 36(4): 883 908 Barton J. A., D. M. Love and G. D. Taylor. 2001. Design determines 70% of cost? A review of implications for design evaluation. Journal of Engineering Design 12 (1): 47-58. Haskins, C. 2010. Systems Engineering Handbook: A Guide for System Life Cycle Processes and Activities. Version 3.2. Revised by M. Krueger, D. Walden, and R. D. Hamelin. San Diego, CA (US): INCOSE Liu H., V. Gopalkrishnan, W.K. Ng, B. Song and X. Li. 2008. An intelligent system for estimating full product Life Cycle Cost at the early design stage. International Journal of Product Lifecycle Management 2 (2-3): 96-113 Meisl C.J. 1988. Techniques for cost estimating in early program phases. Engineering Costs and Production Economics 14 (2), 95 106. NASA. 2008. NASA Cost Estimating Handbook. Washington, DC (US): National Aeronautics and Space Administration NASA Headquarters - Cost Analysis Division. Seo K.-K., J.-H. Park, D.-S. Jang and D. Wallace. 2002. Approximate Estimation of the Product Life Cycle Cost Using Artificial Neural Networks in Conceptual Design. The International Journal of Advanced Manufacturing Technology. 19 (6): 461-471