Principal Investigator Shuai Huang, Assistant Professor, Industrial and Systems Engineering, UW, 206-685-2953, shuaih@uw.edu Research Personnel Yan Jin, Research Assistant, Industrial and Systems Engineering, UW, yanjin@uw.edu Executive Summary Existing process monitoring tools like control charts in Statistical Process Control (SPC) theory have the difficult for root-cause diagnosis, particularly for complex manufacturing processes that involve a range of interdependent key performance indicators (KPIs, e.g., key process/product features could be the KPIs). In the proposed research, we will investigate the use of a powerful artificial intelligence model, called, the Bayesian network (BN), to monitor complex manufacturing processes and conduct root-cause diagnosis whenever abnormal process pattern is detected. The BN can provide a structured representation of the relationships between the process variables and quality outcomes as a cascade diagram shown in Figure 1. The statistical dependencies of the variables can be parameterized and the parameters can be estimated and calibrated by process measurements acquired by sensors. Recent rapid advances in sensor and information technologies have provided unprecedented opportunities for monitoring many manufacturing systems that have many important process variables. Statistical monitoring of these multivariate processes have shown to be challenging due to the complexity of these multivariate systems and the high anticipation of the decision-making capabilities that requires the process monitoring methods can not only provide timely fault detection but also identify the root-cause variables that are responsible for the fault signal. Goals/Objectives Our primary goal is to develop computational algorithms and software to convert the rich sensor data that is readily available in many manufacturing applications into the system models (as the BN depicted in Figure 1) and develop process monitoring and diagnosis methods that can exploit the relationships between the variables. For example, the understanding of the relationships of the variables can help us figure out how the variations propagate from upstream variables to downstream variables, and how to use this cascade information to identify the root-cause variables that may be responsible for any identified process abnormality. Specifically, this includes 1) develop the algorithms for learning the BN structure and estimate the parameters for different manufacturing processes; 2) develop the algorithms for monitoring the manufacturing processes and detect process abnormality; 3) develop the algorithms for root-cause diagnosis by exploiting the relationships between the process variables. Budget for 2015-2016 $ 15,000 Funds would provide partial support (25%) to a research assistant and support the analytical work and purchase supplies needed for this research. 1
Background In a manufacturing system, the information flow is determined by the nature of each physical action and the topology of the physical system (Fig. 1). The information related to key process/product features (generally referred as Key Performance Indicators KPI hereafter) evolves in the system following engineering principles. From product/process designs, some engineering knowledge exists in the help to identify the key variables and potential causal relationships (with different confidences, or even qualitatively only). Meantime, the data captured by in-process sensors record the process changes and interrelationship among the variables during the operations. By integrating those two sets of information (information flow and data), a causal model (e.g., a BN model) can be discovered from observational data with the fusion of engineering knowledge. Therefore, engineering knowledge plays an important role in causal discovery as it can effectively constrain model search, reduce computational complexity, increase model accuracy, help validate and interpret the results. Figure 1: An example of multi-stage manufacturing processes As an example, the temporal order of variables in a multistage manufacturing system (Fig.1) can significantly improve the efficiency in the causal modeling. The relative position of sensors in the production flow provides information on the temporal order of variables, that is, a complete or partial order of variables according to the time or sequence that they are measured. Based on this information, any pair of variables can be either temporally distinguishable or indistinguishable. The knowledge of this cascade information holds critical value for process monitoring and particualry for root cause diagnosis. Figure 2 gives an illustration of this fault propagation. While is the root cause variable, the variables and will also show out-of-control signals. Without knowledge of the cascade between these variables, it is difficult to identify the true root cause variable. On the other hand, the predictive relationships between the variables (as the relationships between the mean levels of the variables shown in Fig.2) could be very valuable information for enhancing the root-cause diagnosis. 2
Figure 2: Illustration of the process fault propagation: the fault in the root cause variable will propagate to its descent variables, and ; with knowledge of the BN that can characterize the multivariate process, the relationships between the mean levels of the variables can be derived, which could be very valuable information for enhancing the root-cause diagnosis Significant Progress to Date The PI and his students have been investigating the use of BN for modeling different real-world complex systems, including applications in manufacturing and health care. Recently, we have developed a highdimensional diagnostic monitoring method, called, LASSO-BN, which can scale up to high-dimensional manufacturing processes having a large number of process variables such as tens to ~100 in terms of computational efficiency and statistical accuracy. The algorithm is applied to the challenging Tennessee Eastman Process (TEP) which is shown in Figure 3. The Tennessee Eastman Process (TEP) has been a benchmark process for evaluating process monitoring and fault diagnosis methods ever since the Eastman Chemical Company created this process simulator. The TEP is a chemical process that is composed of 12 input variables (manipulated variables) and 41 output variables (measurement variables). A BN of the TEP has been built that focused on 22 selected variables among the 41 measurement variables. The BN structure was identified by prior process knowledge and process flow sheet. The cascade of the 22 variables was known, so these 22 variables were sorted in terms of process flow order from upstream to downstream units and then placed into network hierarchy as nodes without any arcs. Then, the interactions among the variables are analyzed based on the prior process knowledge and used to determine where to place the arcs, leading to the completion of the BN structure as shown in Fig.4. With the knowledge of BN structure, in-control process data from TEP data archive at University of Washington (http://depts.washington.edu/control/larry/te/download.html) can be used to estimate the parameters of the BN. Other related research in process analytics and machine learning: The PI s research focused on highdimensional machine learning and decision-making challenges with applications in manufacturing and healthcare. The PI s major accomplishments include: (1) high-dimensional graphical models, including both undirected network models such as GGM [] and directed network models such as BN []; (2) highdimensional quality control and diagnosis with applications for surface monitoring of 3
microelectromechanical systems []; (3) data fusion and sensor studies for monitoring, diagnosis, and prognosis of complex manufacturing systems: []; (4) data-driven healthcare process modeling with applications in nursing care coordination quality monitoring []. The PI has also established close collaboration with health professionals from multiple areas such as Alzheimer s disease, Type 1 diabetes, depression, and surgical quality research, focusing on how to convert the massive biomedical datasets into knowledge and lead to evidenced-based decision-makings such as disease monitoring, diagnosis, and prognostics. Although the healthcare and manufacturing applications appear very differently, the analytic challenges underlying these applications bear remarkable similarities and can be translated between different application domains. Figure 3: Illustration of the Tennessee Eastman Process Figure 4: The BN of the TEP constructed by engineering knowledge of the process; the name for each node represents a specific process variable defined in the original TEP problem and can be found in Jin and Huang (2015) 4
Proposed Research The objective of this research project is to develop BN-based process monitoring and diagnosis methodologies and associated computational tools, with applications in real-world manufacturing areas by working with CPCA members. Specifically, we will conduct the following developments to maximize the value of representing the manufacturing process as a BN: On-line quality prediction and inference: The current manufacturing system is off-line, pre-designed through simulation and optimization with ideal process settings. During the production, the product quality is not known until it has been made. With nowadays sensing capabilities, the process data can be readily available in a real-time manner while a product is being produced. Thus, it is desirable to conduct an on-line product quality prediction and inference, so proactive actions can be made for defect prevention. On-line root cause diagnosis: Currently, the sensors on a single station detect localized physical phenomenon, and those at the end of the production line (end-of-line sensing) are often used as go/nogo gages for product inspections. System-wide distributed sensors can diagnose and trace down root causes of process faults more quickly and accurately; it can also predict faulty process conditions at downstream stations or at a future time, with a high detection sensitivity to both local (e.g., withinstation ambient conditions) and global (e.g., across-station fault propagation) process changes. On-line active control and intervention: Most of the current control capabilities in a manufacturing system are designed for machine control purposes with few emphases on quality improvements. With a DSN, various process data are readily available, which provides essential information for active control for quality improvement. Thus, methodologies are needed to address those issues by providing on-line intervention capabilities. In addition, cautious control strategies should be adopted. 5
EDUCATION Shuai Huang, Ph.D. University of Science and Technology of China, Hefei, China Statistics B.S. 1977 Arizona State University, Tempe, AZ, Industrial Engineering Ph.D. 2012 PROFESSIONAL EXPERIENCE 08/2014-present Assistant Professor, Industrial and Systems Engineering, University of Washington 08/2012-07/2014 Assistant Professor, Industrial and Management Systems Engineering, University of South Florida 08/2007-07/2012 Research Associate, Industrial Engineering Program, Arizona State University SELECTED PUBLICATIONS 1. Liu, K. and Huang, S., 2015, Integration of Data Fusion Methodology with Degradation Modeling Process to Improve Prognostics, IEEE Transactions on Automation Science and Engineering, in press. 2. Huang, S., Kong, Z., Huang, W., 2014, High-dimensional Process Monitoring and Change Point Detection Through Embedding Distributions in Reproducing Kernel Hilbert Space, IIE Transactions, Vol. 46 (10), 999-1016 (IIE Magazine Feature Article). 3. Yampikulsakul, N., Byon, E., Huang, S. and Sheng, S.W., 2013, Condition Monitoring of Wind Power System with Non-Parametric Regression Analysis, IEEE Transactions on Energy Conversion, Vol. 29 (2), 288-299. 4. Huang, S., Li, J., Chen, K., Wu, T., Ye, J., Wu, X., and Li, Y., 2013, A Transfer Learning Approach for Network Modeling, IIE Transactions, 44, 915-931 (IIE Transactions Best Paper Award). 5. Huang, S., Li, J., Lamb, G., Schmitt, M., and Fowler, J., 2012, Multi-data Fusion for Enterprise Quality Improvement by a Multilevel Latent Response Model, IIE Transactions, Vol. 46 (5), 512-525. 6. Lin, Y., Liu, K., Byon, E., Qian, X., Huang, S., 2015, Domain-Knowledge Driven Cognitive Degradation Modeling for Alzheimer s Disease, The SIAM International Conference on Data Mining (SDM 2015), Apr. 30 May 2, 2015, Vancouver, CA. (historical paper acceptance rate < 25 %) 7. Ren, S., Huang, S., Papademetris, X., Onofrey, J. and Qian, X., 2015, A Scalable Algorithm for Structured Kernel Feature Selection, The 18 th International Conference on Artificial Intelligence and Statistics (AISTAT 2015), May. 9-12, 2015, San Diego, USA (paper acceptance rate for oral presentation 6.8 %) 8. Huang, S., Ye, J., Fleisher, A., Chen, K., Reiman, E., Wu, T., and Li, J., 2013, A Sparse Structure Learning Algorithm for Bayesian Network Identification from High-dimensional Data, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(6), 1328-1342. 6
9. Huang, S., Li, J., Ye, J., Fleisher, A., Chen, K. and Wu, T., 2011, Brain Effective Connectivity Modeling for Alzheimer s Disease by Sparse Bayesian Network, The 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2011) (paper acceptance rate 17.5%), Aug. 21-24, 2011, San Diego, USA. 10. Huang, S., Li J., Ye, J., Chen, L., Wu, T., Fleisher, A. and Reiman, E., 2011, Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis, Proceedings of Neural Information Processing Systems Conference (NIPS) (paper acceptance rate for oral presentation 4.8%), Dec, 2011, Granada, Spain. SELECTED HONORS, AWARDS & PROFESSIONAL ACTIVITIES Feature Article in IIE Magazine, for paper High-dimensional Process Monitoring and Change Point Detection using Embedding Distributions in Reproducing Kernel Hilbert Space (RKHS), Oct. 2014 Best Paper Award, IIE Transactions Best Paper Quality & Reliability Engineering, for A Transfer Learning Approach for Network Modeling, 2014 Outstanding Graduate Award, School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, 2012 University Graduate Fellowship Award, School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, 2012 Finalist of Best Student Paper Competition in Quality, Statistics and Reliability Subdivision of INFORMS Annual Conference, Charlotte, NC, Nov. 2011, paper title: Hypergraph-based Gaussian Process Models with Qualitative and Quantitative Input Variables Finalist of Best Student Paper Competition in Data Mining Subdivision of INFORMS Annual Conference, Charlotte, NC, Nov. 2011, paper title: Brain Effective Connectivity Modeling for Alzheimer's Disease by Sparse Gaussian Bayesian Network Best Poster Award (2 nd Place), in Quality, Statistics and Reliability Subdivision of INFORMS Annual Conference, Charlotte, NC, Nov. 2011 Dissertation Poster Award (2 nd Place), Doctoral Colloquium Dissertation Poster Competition of Industrial Engineering Research Conference (IERC), Reno, NV, May. 2011 Feature Article in IIE Magazine, for paper Regression-based Process Monitoring with Consideration of Measurement Errors, Jan. 2010 SELECTED FUNDED GRANTS PI: National Science Foundation, Collaborative: Smart Monitoring for Alzheimer s Disease via Data Fusion, Personalized Prognostics and Selective Sensing, Total award: $215,000, Sep 2014 Sep 2017. PI: Juvenile Diabetes Research Foundation, A Rule-Based Prognostic Model for Type I Diabetes by Characterizing and Synthesizing Rules from Longitudinal Data, Total award: $110,000, June 2014 May 2015. Co-PI (in collaboration with Arizona State University): US. Army Electronic Proving Ground, Big Data in Large Communication Networks Mining and Visualization, Aug 2012 Oct 2013, Total award: $282,467 (15% budget share). SUPERVISED DOCTORAL and MASTER'S STUDENTS Mona Haghighi (3 rd year), Yan Jin (2 nd year), Ying Lin (3 rd year), Yazhuo Liu (3 rd year) 7