CPAC Proposal Diagnosis-Oriented Monitoring of Interdependent KPIs



Similar documents
Course Requirements for the Ph.D., M.S. and Certificate Programs

SureSense Software Suite Overview

In Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering, pages , Warsaw, Poland, December 2-4, 1999

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Exploration and Visualization of Post-Market Data

VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Healthcare Measurement Analysis Using Data mining Techniques

MONITORING AND DIAGNOSIS OF A MULTI-STAGE MANUFACTURING PROCESS USING BAYESIAN NETWORKS

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

DATA MINING TECHNIQUES AND APPLICATIONS

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Principles of Data Mining by Hand&Mannila&Smyth

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

Inference, monitoring and recovery of large scale networks

Gerard Mc Nulty Systems Optimisation Ltd BA.,B.A.I.,C.Eng.,F.I.E.I

Maximizing return on plant assets

Information Management course

MSCA Introduction to Statistical Concepts

Doctor of Philosophy in Computer Science

Artificial Intelligence and Politecnico di Milano. Presented by Matteo Matteucci

A Probabilistic Causal Model for Diagnosis of Liver Disorders

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network

Big Data, Physics, and the Industrial Internet! How Modeling & Analytics are Making the World Work Better."

A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM

Method of Fault Detection in Cloud Computing Systems

Sanjeev Kumar. contribute

A Bayesian Network Model for Diagnosis of Liver Disorders Agnieszka Onisko, M.S., 1,2 Marek J. Druzdzel, Ph.D., 1 and Hanna Wasyluk, M.D.,Ph.D.

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

CHAPTER 1 INTRODUCTION

Healthcare, transportation,

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

INVESTIGATIONS INTO EFFECTIVENESS OF GAUSSIAN AND NEAREST MEAN CLASSIFIERS FOR SPAM DETECTION

Prediction of Heart Disease Using Naïve Bayes Algorithm

Big Data Analytics for SCADA

Intrusion Detection via Machine Learning for SCADA System Protection

RESEARCH INTERESTS Modeling and Simulation, Complex Systems, Biofabrication, Bioinformatics

MSCA Introduction to Statistical Concepts

A Conceptual Approach to Data Visualization for User Interface Design of Smart Grid Operation Tools

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Application of Adaptive Probing for Fault Diagnosis in Computer Networks 1

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network

An Agent-Based Concept for Problem Management Systems to Enhance Reliability

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML

Learning Performance Analysis of Engineering Graduate Students from Two Differently Ranked Universities Using Course Outcomes

Health Informatics Student Handbook

Better planning and forecasting with IBM Predictive Analytics

Course Requirements for the Ph.D., M.S. and Certificate Programs

Statistics for BIG data

ONLINE HEALTH MONITORING SYSTEM USING ZIGBEE

SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH

Information Visualization WS 2013/14 11 Visual Analytics

Data Mining and Machine Learning in Bioinformatics

Chia-Yen Lee ( 李 家 岩 )

June Zhang (Zhong-Ju Zhang)

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR.

Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends

DATA MINING - SELECTED TOPICS

Working with telecommunications

Random forest algorithm in big data environment

Graduate School of Informatics

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Professional Station Software Suite

Advanced analytics at your hands

life science data mining

Data Mining On Diabetics

MA2823: Foundations of Machine Learning

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Operational Performance Metrics in Manufacturing Process: Based on SCOR Model and RFID Technology

IMPLEMENTING A SPC INTEGRATED SYSTEM TO IMPROVE MANUFACTURING PROCESSES IN AUTOMOTIVE INDUSTRY

A SECURE DECISION SUPPORT ESTIMATION USING GAUSSIAN BAYES CLASSIFICATION IN HEALTH CARE SERVICES

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Fault Analysis in Software with the Data Interaction of Classes

Victoria Kostina Curriculum Vitae - September 6, 2015 Page 1 of 5. Victoria Kostina

Process Modelling from Insurance Event Log

Benjamin M. Marlin Department of Computer Science University of Massachusetts Amherst January 21, 2011

Manufacturing Analytics: Uncovering Secrets on Your Factory Floor

The Big Data methodology in computer vision systems

Keywords data mining, prediction techniques, decision making.

Big Data Analytics and Decision Analysis for Manufacturing Intelligence to Empower Industry 3.5

A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan

Introduction. Background

Prediction of Stock Performance Using Analytical Techniques

Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results

An innovative approach combining industrial process data analytics and operator participation to implement lean energy programs: A Case Study

How To Use Neural Networks In Data Mining

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

Fault Localization in a Software Project using Back- Tracking Principles of Matrix Dependency

Curriculum Vitae. Xinghao (Shaun) Yan

Biomarker Discovery and Data Visualization Tool for Ovarian Cancer Screening

Industrial Roadmap for Connected Machines. Sal Spada Research Director ARC Advisory Group

Introduction to Data Mining

Department of Psychology

ProfessionalPLUS Station Software Suite

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS

IJCSES Vol.7 No.4 October 2013 pp Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS

Transcription:

Principal Investigator Shuai Huang, Assistant Professor, Industrial and Systems Engineering, UW, 206-685-2953, shuaih@uw.edu Research Personnel Yan Jin, Research Assistant, Industrial and Systems Engineering, UW, yanjin@uw.edu Executive Summary Existing process monitoring tools like control charts in Statistical Process Control (SPC) theory have the difficult for root-cause diagnosis, particularly for complex manufacturing processes that involve a range of interdependent key performance indicators (KPIs, e.g., key process/product features could be the KPIs). In the proposed research, we will investigate the use of a powerful artificial intelligence model, called, the Bayesian network (BN), to monitor complex manufacturing processes and conduct root-cause diagnosis whenever abnormal process pattern is detected. The BN can provide a structured representation of the relationships between the process variables and quality outcomes as a cascade diagram shown in Figure 1. The statistical dependencies of the variables can be parameterized and the parameters can be estimated and calibrated by process measurements acquired by sensors. Recent rapid advances in sensor and information technologies have provided unprecedented opportunities for monitoring many manufacturing systems that have many important process variables. Statistical monitoring of these multivariate processes have shown to be challenging due to the complexity of these multivariate systems and the high anticipation of the decision-making capabilities that requires the process monitoring methods can not only provide timely fault detection but also identify the root-cause variables that are responsible for the fault signal. Goals/Objectives Our primary goal is to develop computational algorithms and software to convert the rich sensor data that is readily available in many manufacturing applications into the system models (as the BN depicted in Figure 1) and develop process monitoring and diagnosis methods that can exploit the relationships between the variables. For example, the understanding of the relationships of the variables can help us figure out how the variations propagate from upstream variables to downstream variables, and how to use this cascade information to identify the root-cause variables that may be responsible for any identified process abnormality. Specifically, this includes 1) develop the algorithms for learning the BN structure and estimate the parameters for different manufacturing processes; 2) develop the algorithms for monitoring the manufacturing processes and detect process abnormality; 3) develop the algorithms for root-cause diagnosis by exploiting the relationships between the process variables. Budget for 2015-2016 $ 15,000 Funds would provide partial support (25%) to a research assistant and support the analytical work and purchase supplies needed for this research. 1

Background In a manufacturing system, the information flow is determined by the nature of each physical action and the topology of the physical system (Fig. 1). The information related to key process/product features (generally referred as Key Performance Indicators KPI hereafter) evolves in the system following engineering principles. From product/process designs, some engineering knowledge exists in the help to identify the key variables and potential causal relationships (with different confidences, or even qualitatively only). Meantime, the data captured by in-process sensors record the process changes and interrelationship among the variables during the operations. By integrating those two sets of information (information flow and data), a causal model (e.g., a BN model) can be discovered from observational data with the fusion of engineering knowledge. Therefore, engineering knowledge plays an important role in causal discovery as it can effectively constrain model search, reduce computational complexity, increase model accuracy, help validate and interpret the results. Figure 1: An example of multi-stage manufacturing processes As an example, the temporal order of variables in a multistage manufacturing system (Fig.1) can significantly improve the efficiency in the causal modeling. The relative position of sensors in the production flow provides information on the temporal order of variables, that is, a complete or partial order of variables according to the time or sequence that they are measured. Based on this information, any pair of variables can be either temporally distinguishable or indistinguishable. The knowledge of this cascade information holds critical value for process monitoring and particualry for root cause diagnosis. Figure 2 gives an illustration of this fault propagation. While is the root cause variable, the variables and will also show out-of-control signals. Without knowledge of the cascade between these variables, it is difficult to identify the true root cause variable. On the other hand, the predictive relationships between the variables (as the relationships between the mean levels of the variables shown in Fig.2) could be very valuable information for enhancing the root-cause diagnosis. 2

Figure 2: Illustration of the process fault propagation: the fault in the root cause variable will propagate to its descent variables, and ; with knowledge of the BN that can characterize the multivariate process, the relationships between the mean levels of the variables can be derived, which could be very valuable information for enhancing the root-cause diagnosis Significant Progress to Date The PI and his students have been investigating the use of BN for modeling different real-world complex systems, including applications in manufacturing and health care. Recently, we have developed a highdimensional diagnostic monitoring method, called, LASSO-BN, which can scale up to high-dimensional manufacturing processes having a large number of process variables such as tens to ~100 in terms of computational efficiency and statistical accuracy. The algorithm is applied to the challenging Tennessee Eastman Process (TEP) which is shown in Figure 3. The Tennessee Eastman Process (TEP) has been a benchmark process for evaluating process monitoring and fault diagnosis methods ever since the Eastman Chemical Company created this process simulator. The TEP is a chemical process that is composed of 12 input variables (manipulated variables) and 41 output variables (measurement variables). A BN of the TEP has been built that focused on 22 selected variables among the 41 measurement variables. The BN structure was identified by prior process knowledge and process flow sheet. The cascade of the 22 variables was known, so these 22 variables were sorted in terms of process flow order from upstream to downstream units and then placed into network hierarchy as nodes without any arcs. Then, the interactions among the variables are analyzed based on the prior process knowledge and used to determine where to place the arcs, leading to the completion of the BN structure as shown in Fig.4. With the knowledge of BN structure, in-control process data from TEP data archive at University of Washington (http://depts.washington.edu/control/larry/te/download.html) can be used to estimate the parameters of the BN. Other related research in process analytics and machine learning: The PI s research focused on highdimensional machine learning and decision-making challenges with applications in manufacturing and healthcare. The PI s major accomplishments include: (1) high-dimensional graphical models, including both undirected network models such as GGM [] and directed network models such as BN []; (2) highdimensional quality control and diagnosis with applications for surface monitoring of 3

microelectromechanical systems []; (3) data fusion and sensor studies for monitoring, diagnosis, and prognosis of complex manufacturing systems: []; (4) data-driven healthcare process modeling with applications in nursing care coordination quality monitoring []. The PI has also established close collaboration with health professionals from multiple areas such as Alzheimer s disease, Type 1 diabetes, depression, and surgical quality research, focusing on how to convert the massive biomedical datasets into knowledge and lead to evidenced-based decision-makings such as disease monitoring, diagnosis, and prognostics. Although the healthcare and manufacturing applications appear very differently, the analytic challenges underlying these applications bear remarkable similarities and can be translated between different application domains. Figure 3: Illustration of the Tennessee Eastman Process Figure 4: The BN of the TEP constructed by engineering knowledge of the process; the name for each node represents a specific process variable defined in the original TEP problem and can be found in Jin and Huang (2015) 4

Proposed Research The objective of this research project is to develop BN-based process monitoring and diagnosis methodologies and associated computational tools, with applications in real-world manufacturing areas by working with CPCA members. Specifically, we will conduct the following developments to maximize the value of representing the manufacturing process as a BN: On-line quality prediction and inference: The current manufacturing system is off-line, pre-designed through simulation and optimization with ideal process settings. During the production, the product quality is not known until it has been made. With nowadays sensing capabilities, the process data can be readily available in a real-time manner while a product is being produced. Thus, it is desirable to conduct an on-line product quality prediction and inference, so proactive actions can be made for defect prevention. On-line root cause diagnosis: Currently, the sensors on a single station detect localized physical phenomenon, and those at the end of the production line (end-of-line sensing) are often used as go/nogo gages for product inspections. System-wide distributed sensors can diagnose and trace down root causes of process faults more quickly and accurately; it can also predict faulty process conditions at downstream stations or at a future time, with a high detection sensitivity to both local (e.g., withinstation ambient conditions) and global (e.g., across-station fault propagation) process changes. On-line active control and intervention: Most of the current control capabilities in a manufacturing system are designed for machine control purposes with few emphases on quality improvements. With a DSN, various process data are readily available, which provides essential information for active control for quality improvement. Thus, methodologies are needed to address those issues by providing on-line intervention capabilities. In addition, cautious control strategies should be adopted. 5

EDUCATION Shuai Huang, Ph.D. University of Science and Technology of China, Hefei, China Statistics B.S. 1977 Arizona State University, Tempe, AZ, Industrial Engineering Ph.D. 2012 PROFESSIONAL EXPERIENCE 08/2014-present Assistant Professor, Industrial and Systems Engineering, University of Washington 08/2012-07/2014 Assistant Professor, Industrial and Management Systems Engineering, University of South Florida 08/2007-07/2012 Research Associate, Industrial Engineering Program, Arizona State University SELECTED PUBLICATIONS 1. Liu, K. and Huang, S., 2015, Integration of Data Fusion Methodology with Degradation Modeling Process to Improve Prognostics, IEEE Transactions on Automation Science and Engineering, in press. 2. Huang, S., Kong, Z., Huang, W., 2014, High-dimensional Process Monitoring and Change Point Detection Through Embedding Distributions in Reproducing Kernel Hilbert Space, IIE Transactions, Vol. 46 (10), 999-1016 (IIE Magazine Feature Article). 3. Yampikulsakul, N., Byon, E., Huang, S. and Sheng, S.W., 2013, Condition Monitoring of Wind Power System with Non-Parametric Regression Analysis, IEEE Transactions on Energy Conversion, Vol. 29 (2), 288-299. 4. Huang, S., Li, J., Chen, K., Wu, T., Ye, J., Wu, X., and Li, Y., 2013, A Transfer Learning Approach for Network Modeling, IIE Transactions, 44, 915-931 (IIE Transactions Best Paper Award). 5. Huang, S., Li, J., Lamb, G., Schmitt, M., and Fowler, J., 2012, Multi-data Fusion for Enterprise Quality Improvement by a Multilevel Latent Response Model, IIE Transactions, Vol. 46 (5), 512-525. 6. Lin, Y., Liu, K., Byon, E., Qian, X., Huang, S., 2015, Domain-Knowledge Driven Cognitive Degradation Modeling for Alzheimer s Disease, The SIAM International Conference on Data Mining (SDM 2015), Apr. 30 May 2, 2015, Vancouver, CA. (historical paper acceptance rate < 25 %) 7. Ren, S., Huang, S., Papademetris, X., Onofrey, J. and Qian, X., 2015, A Scalable Algorithm for Structured Kernel Feature Selection, The 18 th International Conference on Artificial Intelligence and Statistics (AISTAT 2015), May. 9-12, 2015, San Diego, USA (paper acceptance rate for oral presentation 6.8 %) 8. Huang, S., Ye, J., Fleisher, A., Chen, K., Reiman, E., Wu, T., and Li, J., 2013, A Sparse Structure Learning Algorithm for Bayesian Network Identification from High-dimensional Data, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(6), 1328-1342. 6

9. Huang, S., Li, J., Ye, J., Fleisher, A., Chen, K. and Wu, T., 2011, Brain Effective Connectivity Modeling for Alzheimer s Disease by Sparse Bayesian Network, The 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2011) (paper acceptance rate 17.5%), Aug. 21-24, 2011, San Diego, USA. 10. Huang, S., Li J., Ye, J., Chen, L., Wu, T., Fleisher, A. and Reiman, E., 2011, Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis, Proceedings of Neural Information Processing Systems Conference (NIPS) (paper acceptance rate for oral presentation 4.8%), Dec, 2011, Granada, Spain. SELECTED HONORS, AWARDS & PROFESSIONAL ACTIVITIES Feature Article in IIE Magazine, for paper High-dimensional Process Monitoring and Change Point Detection using Embedding Distributions in Reproducing Kernel Hilbert Space (RKHS), Oct. 2014 Best Paper Award, IIE Transactions Best Paper Quality & Reliability Engineering, for A Transfer Learning Approach for Network Modeling, 2014 Outstanding Graduate Award, School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, 2012 University Graduate Fellowship Award, School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, 2012 Finalist of Best Student Paper Competition in Quality, Statistics and Reliability Subdivision of INFORMS Annual Conference, Charlotte, NC, Nov. 2011, paper title: Hypergraph-based Gaussian Process Models with Qualitative and Quantitative Input Variables Finalist of Best Student Paper Competition in Data Mining Subdivision of INFORMS Annual Conference, Charlotte, NC, Nov. 2011, paper title: Brain Effective Connectivity Modeling for Alzheimer's Disease by Sparse Gaussian Bayesian Network Best Poster Award (2 nd Place), in Quality, Statistics and Reliability Subdivision of INFORMS Annual Conference, Charlotte, NC, Nov. 2011 Dissertation Poster Award (2 nd Place), Doctoral Colloquium Dissertation Poster Competition of Industrial Engineering Research Conference (IERC), Reno, NV, May. 2011 Feature Article in IIE Magazine, for paper Regression-based Process Monitoring with Consideration of Measurement Errors, Jan. 2010 SELECTED FUNDED GRANTS PI: National Science Foundation, Collaborative: Smart Monitoring for Alzheimer s Disease via Data Fusion, Personalized Prognostics and Selective Sensing, Total award: $215,000, Sep 2014 Sep 2017. PI: Juvenile Diabetes Research Foundation, A Rule-Based Prognostic Model for Type I Diabetes by Characterizing and Synthesizing Rules from Longitudinal Data, Total award: $110,000, June 2014 May 2015. Co-PI (in collaboration with Arizona State University): US. Army Electronic Proving Ground, Big Data in Large Communication Networks Mining and Visualization, Aug 2012 Oct 2013, Total award: $282,467 (15% budget share). SUPERVISED DOCTORAL and MASTER'S STUDENTS Mona Haghighi (3 rd year), Yan Jin (2 nd year), Ying Lin (3 rd year), Yazhuo Liu (3 rd year) 7