A Hybrid Decision Tree Approach for Semiconductor. Manufacturing Data Mining and An Empirical Study

Transcription

1 A Hybrid Decision Tree Approach for Semiconductor Manufacturing Data Mining and An Empirical Study 1 C. -F. Chien J. -C. Cheng Y. -S. Lin 1 Department of Industrial Engineering, National Tsing Hua University cfchien@mx.nthu.edu.tw Abstract During semiconductor fabrication process, huge process data will be automatically or semi-automatically recorded and accumulated in database for monitoring the process, diagnosing faults and managing manufacturing. However, the manufacturing factors that affect the wafer yield are frequently interrelated. Domain engineers cannot easily find possible root causes of low yield rapidly and efficiently only using their own domain knowledge or applying rules of thumb. This study aims to construct a data mining framework for analyzing semiconductor manufacturing data and propose a hybrid decision tree approach that involves Kruskal-Wallis test, chi-square interaction detection, and the variance reduction splitting criterion to analyze huge multi-dimensional data and infer possible causes of faults for troubleshooting. The proposed hybrid decision tree approach can also eliminate the variable selection bias during the decision tree construction. We conduct an empirical study in a semiconductor company for validation. The results demonstrated the practical viability of the proposed method to help the engineers to diagnose the faults and improve yield efficiently and effectively. Keywords: Data mining, Decision tree, yield enhancement, Semiconductor manufacturing data 1. INTRODUCTION Semiconductor manufacturing is perhaps the most complex of modern manufacturing process. However, several manufacturing factors, which are frequently interrelated, impact the yield of silicon wafers. The complex fabrication processes and the daily accumulation of a large amount of raw data from various sources. Data mining is the process of discovering interesting knowledge from large amounts of data stored in databases, data warehouses, or other information repositories [1]. This study aims to construct a hybrid decision tree approach, involving a nonparametric Kruskal-Wallis test, chi-square interaction detection, and variance reduction splitting criterion to elucidate multi-dimensional semiconductor manufacturing data for mining (including SORT data and WIP data), troubleshooting and diagnosing processes. This 1

2 study addresses a real semiconductor problem in the fab, in which causal relationships exist between the machines used in the specific process and the yield. Then, after the relevant data are selected and appropriate preprocessing is performed, the proposed hybrid decision tree approach is employed to analyze the preprocessed data and thus derive a diagnostic model. Therefore, possible causes of manufacturing process abnormality are inferred and the results can be discussed with domain engineers for interpretation and implementation. This approach also can eliminate the variable selection bias when interrelated attributes existed during the decision tree construction. For validation, real data are used in an empirical study to compare the performance of the hybrid decision tree approach with that of some other decision tree algorithms. 2. A HYBRID DECISION TREE APPROACH TO DIAGNOSING PROCESSES This study proposes a framework and a hybrid decision tree approach for exploring huge sets of engineering data. This framework includes five phases as follows: defining the problem; selecting data, preprocessing data, constructing the decision tree, and evaluating and interpreting results, as illustrated in Figure Defining the problem The process of fabricating an integrated circuit in semiconductor manufacturing is very complex, and involves hundreds of operations at different stations. Typical semiconductor factories may include one to ten major fabrication process flows and produce 2,000 or more wafers every month. An averagely sized factory contains on average of 400 pieces of equipment. The complexity of wafer fabrication processes is such that engineers cannot easily locate rapidly and efficiently possible causes of fault from their own domain knowledge and experience. A number of studies have been done to employ data mining methods to improve the yield or diagnose the processes. Braha and Shmilovici [8] suggested a data mining method for improving a cleaning process used in the semiconductor industry, involving a decision tree, a neural network and a composite classificatory method. Zhou et al. [9] proposed a data mining system of rule induction as part of a drop test analysis of electronic products. Chien et al. [10] proposed a data mining approach that integrated the Kruskal-Wallis test and decision tree for diagnosing defects. Furthermore, Chien et al. [11] developed a data mining framework that consists of Kruskal-Wallis test, K-means clustering, and the variance reduction splitting criterion to investigate the huge amount of semiconductor manufacturing data and infer possible causes of faults and manufacturing process variations. 2.2 Selecting Data This study selects the SORT data and WIP (wafer in process) data. The CP (Circuit Probe) test is an electrical test that includes various functional tests of all the dies on each wafer. The CP yield rate from SORT data is defined as the target variable. SORT data 2

3 consists of the results of CP test including lot id, product name, and the location of the wafer. WIP data that consists of basic information of the fabricated wafer including lot id, product name, station, machine used at the process station, time and date are used as input variables. 1. Defining Problem Semiconductor Problem 2. Selecting Data Select Manufacturing Data from Engineering Database 3. Preprocessing Data Data Integration Data Cleaning Data Transformation 4. Constructing Decision Tree Indicate target variable and attributes Kruskal-Wallis test Interaction Detection Iterative growing Variance Reduction Final Decision tree Extract the decision tree rules 5. Evaluating and Interpreting Results Is any pattern existed? No Add more data Yes Revision and Advanced Analysis 3

4 Figure 1 Conceptual framework for diagnosing processes 2.3 Preprocessing data This study integrates different data sources from engineering database. The collected data is often affected by noisy, missing, and inconsistent data. Thus, data preparation is required to improve data quality and thus the effectiveness of data mining process. Preprocessing techniques include integrating, cleaning, and transforming data. Data integration merges data from multiple sources into a coherent data store. Data are cleaned to remove noisy data, identify outliers, fill in missing data, and correct inconsistencies. Data transformation transforms or consolidates data to make them appropriate for mining. In particular, SORT data and WIP data are integrated to diagnose processes if causal relationships exist between the machines used to perform a specific process and the CP yield. Yet, not all wafers pass all processes. Data cleaning methods are then used to handle missing values and remove some row and columns that are not pertinent to the problem. If too few wafer lots pass through the process station, then that process station can be ignored. Finally, the cleaned data are transformed into a format suitable for constructing a decision tree. 2.4 Constructing decision tree The fabrication of a semiconductor circuit is a multi-step process of up to hundreds of steps. The low yield problem may be caused by a single process station or machine in a particular period, and some local interactions may occur among various stations. However, when all the possible interactions among N stations are considered, the computation time may be very long. Detecting pair-wise interactions is more feasible. Since semiconductor manufacturing data are not all normally distributed, nonparametric Kruskal-Wallis test is used to determine whether significant differences exist among the yield of different machines in the same process station. Chi-square analysis of residuals is performed to detect local two-station interactions and eliminate variable selection bias from the construction of the decision tree. Then, the variance reduction splitting criterion is applied to grow the branches to minimize the total variance. This approach is designed to deal with continuous target variable and categorical input attributes. The proposed hybrid decision tree approach consists of the following four major steps: Step 1: Perform the Kruskal-Wallis test to identify which attributes have significant difference in continuous target of different levels. Step 2: Detect two-variable interactions between pairs of candidate attributes in step 1. Step 3: Apply variance reduction criterion to split on the candidate attributes and grow the branches. Step 4: Grow the tree iteratively by repeating Steps 1 to 3 until all attributes in the decision tree model are not significantly different or the following stopping rules are met. 1. The node includes fewer than five instances. 2. The values of all attributes are the same. 4

5 2.5 Evaluating and interpreting results The causal relationship can be extracted easily from the results of the decision tree model. The leaf nodes are extracted according to the target distribution. Each If-Then rule can be created on each path from the root node to a leaf node. The model can be interpreted and the results analyzed in a discussion among domain engineers and data experts. Then, revision or advanced analysis can be conducted to find better ways to solve the problem. 3. PERFORMANCE COMPARISON Before the empirical study, we first estimate the validity of this approach by comparing the performance of the hybrid decision tree approach with some current decision tree such as CART and CHAID based on a real dataset from the machine-learning-database website ( 3.1 Defining the problem Servo is a dataset collected from a simulation of a servo system, which involves a servo amplifier, a motor, a lead screw, and a sliding carriage of some sort. The output value is the required time of the system to respond to a step change in a position set point. Servo covers an extremely non-linear phenomenon - predicting the rise time of a servomechanism in terms of two continuous gain settings and two discrete choices of mechanical linkages. Servo dataset includes 167 instances with four attributes and one target. Table I shows the data information of Servo. 3.2 Selecting and preprocessing data Servo dataset is a single dataset without missing value. Selecting and preprocessing data are unnecessary. Table I The variable information of Servo Variable Levels Data Type motor A, B, C, D, E Categorical attribute screw A, B, C, D, E Categorical attribute pgain 3,4,5,6 Categorical attribute vgain 1,2,3,4,5 Categorical attribute class 0.13 to 7.10 Continuous target 3.3 Constructing the decision tree Then, hybrid decision tree approach is used to construct the decision tree model. Each step is described as follows. 5

6 All Data Avg: Size: 167 pgain-vgain p3-g1 p3-g2 motor p4-v1,p4-v2,p4-v3 p5-v1,p5-v2,p5-v3 p5-v4,p6-v1,p6-v2 p6-v3,p6-v5 screw-vgain N:6 AVE:5.166 DEV: screw A,B A,B,C screw C,D,E vgain N:10 AVE:1.72 DEV: A B 1 2 N:6 AVE:4.399 DEV: N:9 AVE:3.122 DEV: N:9 AVE:4.166 DEV: vgain D,E 1 2 N:10 AVE:2.08 DEV: vgain 4 sa-v1,sa-v2,sa-v3 sa-v4,sa-v5,sb-v1 sb-v3,sc-v3,sd-v3 se-v3 motor A,B,E N:24 AVE: pgain DEV: ,6 N:8 AVE: DEV: C,D 1 N:10 AVE: DEV: sb-v2,sb-v4,sb-v5 sc-v1,sc-v2,sc-v4 sc-v5,sd-v1,sd-v2 sd-v4,sd-v5,se-v1 se-v2,se-v4,se-v5 screw A,B vgain motor 2,4,5 C,D,E N:27 AVE: DEV: ,2 3 N:9 AVE: DEV: N:15 AVE: DEV: B N:6 AVE: DEV: motor C,D,E Figure 2 The final hybrid decision tree of Servo A N:9 AVE: DEV: B N:9 AVE: DEV: Firstly, the Kruskal-Wallis test is applied to the continuous target, clas, to determine which attributes have significant difference among different levels. The significance level of Kruskal-Wallis test is If the p-value of an attribute tested by Kruskal-Wallis test is below 0.05, then the levels in the attribute are said to differ significantly. From the Kruskal-Walis test results, the atributes pgain and vgain have significant difference among different levels. Secondly, the interactions of two-variable between pairs of attributes that are significant in step 1 are detected. The significance level of Chi-square test is The attributes pgain and vgain are determined to differ significantly among different levels in step 1. Step 2 detects whether a significant interaction exist between pgain and vgain. From the detection result, the interaction between pgain and vgain is significant. Then, a new combined attribute pgain-vgain is generated. The attributes pgain-vgain, pgain and vgain are the candidate attributes. Thirdly, the best variance reduction is selected for splitting in the candidate attributes. Finally, Steps 1 to 3 are repeated to grow the decision tree iteratively until all attributes in the decision tree do not differ significantly or the number of instances in the node is less than five. Figure 2 shows the final decision tree. 6

7 3.4 Evaluating and interpreting the results We compared the performance of three decision tree algorithms, CART, CHAID and the proposed hybrid decision tree approach. The prediction accuracy, the number of leaves, and the tree depth are compared. The prediction accuracy is measured by average squared error as follows: Average squared error = 1 N ( N n 1 y d ( n x n where d( x n ) is a predictor of y n. The number of leaves and the tree depth are for evaluating the size of decision tree. Table II summarizes the test results of these decision tree algorithms in term of average squared error, the number of leaves, the tree depth and the number of rules extract from decision tree. Table II Comparison of decision tree algorithms test Algorithm CART CHAID Our hybrid approach Ave. squared error Number of leaves Tree depth Number of rules )) 2 4. EMPIRICAL STUDY This research conducted an empirical study by using real semiconductor manufacturing data from a fab in Taiwan to validate the proposed hybrid decision tree approach for solving the low yield problem. 4.1 Defining the problem The data from the fab show that the CP yields of some lots are abnormal, and the CP yield is low, implying reduced productivity of the fab. The relevant fabrication data with large variations are examined to determine the root causes, and the problem is solved as quickly as possible, to improve the yield of the product and reduce the cost. The possible root causes may involve some machines at process stations or in particular periods during the semiconductor manufacturing process. 4.2 Selecting data The company of interest had built an engineering database system to perform data access and management. In this case, the CP yield was found to be out of the control limit of 156 lots, as plotted in Figure 3. Hence, the manufacturing data including CP test data, and other relevant data were extracted from the engineering database from 7/1~7/19 in one year. 7

8 CP Yield CP Yield Lot Lot Figure 3 Joint plot of CP yield of all wafer lots Figure 4 Joint plot of CP yield after data are preprocessed 4.3 Preprocessing data The SORT data and WIP data merge into a single dataset to construct the decision tree. The target is the CP yield rate, and relevant information on process stations includes attributes, such as machines and operating time. The selected data include some missing values and not all wafer lots passed through all process stations, so some wafer lots and some process stations were removed to improve the quality of the data and the efficiency of the decision tree construction steps. The possible root causes may involve machines at the process station or particular periods, so information about each process station is presented in terms of two attributes - one is the machine in the process station and the other is the machine and its date of operation. 4.4 Constructing the decision tree After the data are preprocessed, the training data include 150 wafer lots (instances), including one continuous target (CP yield) and 928 categorical attributes (464 process stations and 464 process stations with their dates of operation). The corresponding levels are the names of the machines in the station and the names of the machines combined with their dates of operation. Then, the hybrid decision tree approach is used to construct the decision tree model. Each step is described as follows. First, the Kruskal-Wallis test is applied to the continuous target, CP yield, to determine which attributes (stations) have significant difference among different levels (machines or operating time). The significance level of the Kruskal-Wallis test is If the p-value of an attribute tested by the Kruskal-Wallis test is below 0.05, then the levels in the attribute are said to differ significantly. Thirty attributes are significant at the 0.05 confidence level. Table III shows the Kruskal-Wallis test results. Second, the interactions of two-variable between pairs of attributes that are significant in step 1 are detected. If an interaction between a pair of attributes exists, their levels are combined and these attributes are considered as a new single attribute. The significance level 8

9 in the chi-square test is Therefore, if the p-value of an attribute in the chi-square test is under 0.05, the pair of attributes are said to significantly interact. Table III lists the detection results. The significant attributes in Step 1 and the combined attributes in Step 2 are the candidate attributes. Table III Kruskal-Wallis test results Station P-Value Station P-Value Station P-Value Tb Tb AH Tb Tb Tb Ta Tb Tb Tc Ta Tb Tb Tb Tb Tb Tb Tb Tb Tb Tb Tb Tb Ta Tb Tb Tc Tb Tb Tc Third, the best variance reduction is selected for splitting in the candidate attributes. Figure 4 shows the results at the first split. In the root node, the mean of all training data is After the first split, if the combined attribute of Ta60-Ta47 is the combined level AT02-BT25, the branch leads to the left child. The mean of the target is and the number of instances is 19 in the left child. The other instances lead to the right child, whose combined attribute Ta60-Ta47 is not the combined level AT02-BT25. In the right child, the mean of the target is and the number of instances is 131. Steps 1 to 3 are repeated to grow the decision tree iteratively until all attributes in the decision tree do not differ significantly or the number of instances in the node is less than five. Figure 5 presents the final decision tree. After the initial split, the right child continues to grow. The left child contains 35 instances with split values of T702 and T707 in the attribute Ta17. The mean of target is The remaining instances lead to the right child, whose mean of the target is Then, the decision tree result can be provided to domain engineers for trouble-shooting and diagnosing processes. 9

10 Table IV Detection of interaction results Combined Attributes p-value Tc04 Ta Ta60 - Ta Tb366 - Ta AH4 - Ta Tc05 - Tb Ta17 - Tb Tb393 Tc Tc04 - Tb Evaluating and interpreting results From the decision tree result, the left leaf in the first split and left leaf in the second split can be identified as low yield groups. The right leaf in the second split can be regarded as a normal yield group. In the first split, the combined attribute of Ta60-Ta47 and combined level of AT02-BT25 imply that 19 wafer lots of the left leaf node passed through the machine AT02 in station Ta60 and the machine BT25 in station Ta47. The machine AT02 in station Ta60 and the machine BT25 in station Ta47 may be causing the low yield. On the left leaf in the second split, the machines T702 and T707 in station Ta17 are also responsible for other low yield. The decision tree model shows that these two production paths cause low yield. The Kruskal-Wallis test yields insignificant results and the instances in the leaf node are fewer than five, so the decision tree stops growing. Then, some revision or advanced analysis is conducted. The yield in the two low-yield groups is viewed in order of time of operation, to determine whether causal relationships exist between the operating time and the yield. On the left leaf node in the first split, the low yield of machine BT25 in station Ta47suddenly drops between 7/16 and 7/17. Seven wafer lots are produced during this period, of which four have a poor yield rate under 40%. Figure 7 shows the results. On the left leaf node in the second split, the low yield rate suddenly drops from 7/10 to 7/15 at station Ta17. Thirteen wafer lots are produced during these periods, of which four have a poor yield rate under 60%. Figure 8 presents the results. A discussion with domain experts revealed that the root cause machine is BT25 in station Ta47 from date 7/15 to 7/17. Another machine T707 in station Ta17 during 7/9 to 7/14 is not the root cause but should still be noticed. 10

11 CP Yield CP Yield Operation time of Ta47 Operation time of Ta17 Figure 7 Yield performance of leaf node in first split Figure 8 Yield performance of leaf node in second split 5. CONCLUDING REMARKS The hybrid decision tree approach is designed for diagnosing the machines or stations in a complex semiconductor manufacturing process that cause low yield. Fabrication processes include hundreds of operations in various stations, especially some local interactions may occur among stations, affecting the accuracy of the results of the decision tree. The empirical results validate the proposed approach as practically viable, and demonstrate that this hybrid decision tree approach effectively assists engineers in trouble-shooting and diagnosing processes. The hybrid decision tree can eliminate variable selection bias during the growth of the tree when training data are interrelated. The proposed hybrid approach can produce powerful splits during the growth of the tree and thereby construct shorter and easily interpretable trees. The target defined in the real case is the CP yield that is sometimes inappropriate for diagnosing processes, since the causes of a fault may be obscure. Indeed, the yield is a synthetic index of the performance determined over hundreds of processes. The WAT data is another good substitute for the yield as the target because each WAT parameter reflects specific operations. However, several advanced analyses can be conducted to determine whether the normal yield group exhibits any pattern in the case study. Further studies can be done to identify other causes of faults that could lead to the yield rate s beingbetween 70%~80%. ACKNOWLEDGEMENT This research is partially supported by National Science Council, Taiwan (NSC E ; NSC E ; NSC E ). REFERENCES 1. Han, J. and Kamber, M., Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers (2001). 11

12 2. Fu, Y., Data mining, IEEE Potentials, 164, (1997). 3. Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P., The KDD Proces for Extracting Useful Knowledge from Volumes of Data, Communication of ACM, 39, 11, (1996). 4. Feelders, A., Daniels, H., and Holsheimer, M., Methodological and practical aspects of data mining, Information & Management, 37, (2000). 5. Kass, G. V., An exploratory technique for investigating large quantities of categorical data, Applied Statistics, 29, 2, (1980). 6. Breiman, L., Friedman, J. H., Olshen, R. J., and Stone, C. J., Classification and Regression Trees, Belmont, CA: Wadsworth (1984). 7. Quinlan, J. R., C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco, California (1993). 8. Braha, D. and Shmilovici, A. Data mining for improving a cleaning process in the semiconductor industry, IEEE Transactions on Semiconductor Manufacturing, 15, 1, (2002). 9. Zhou, C., Nelson, P. C., Xiao, W., Tripak, T. M., and Lane, S. A., An intelligent data mining system for drop test analysis of electronic products, IEEE Transactions in electronics packaging manufacturing, 24, 3, July, (2001). 10. Chien, C., Lin, T., Peng, C., and Hsu, S., Developing data mining framework and methods for diagnosing semiconductor manufacturing defects and an empirical study of wafer acceptance test data in a wafer fab, Journal of the Chinese Institute of Industrial Engineers, 18, 4, (2001). 11. Chien, C., Wang, W., and Cheng, C., Data mining for yield enhancement in semiconductor manufacturing and an empirical study, Expert Systems with Applications, 33(1), 1-7 (2007). 12

13 建構混合決策樹以分析半導體製程資料及其實證研究簡禎富 * 鄭仁傑林昀萱國立清華大學工業工程與管理學系摘要在企業電子化與數位決策時代, 大量的資料儲存於資料倉儲或資料庫中, 這些資料能萃取豐富的資訊以提供知識發現與決策分析在半導體的製造過程中, 大量的製程資料會收集到工程資料中, 以便進行製程監控故障分析與製造管理然而因為半導體的製程複雜, 而且影響的變因眾多切通常具有相互關係, 工程師往往藉著本身的專業知識或是經驗法則, 因此難以迅速且有效率從資料中發覺導致製成異常的原因以及可能隱藏的資訊以及時處理良率異常問題本研究建構半導體資料挖礦架構並發展混合決策樹方法以尋找可能造成變異的原因, 做為工程師及領域專家解決問題的參考依據, 其中包含 Kruskal-Wallis 檢定卡方交互影響檢測變異降低分支法則, 可以協助工程師縮短事故診斷的時間, 進而提升半導體製程的良率本研究以某半導體廠之案例為實證, 運用真實資料比較混合決策樹方法與現行的決策樹演算法的表現, 以檢驗本研究的效度關鍵字 : 資料挖礦決策樹良率提升半導體製造資料 13

14 A Hybrid Decision Tree Approach for Semiconductor Manufacturing Data Mining and An Empirical Study C. -F. Chien*, J. -C. Cheng, and Y. -S. Lin Department of Industrial Engineering, National Tsing Hua University Abstract During semiconductor fabrication process, huge process data will be automatically or semi-automatically recorded and accumulated in database for monitoring the process, diagnosing faults and managing manufacturing. However, the manufacturing factors that affect the wafer yield are frequently interrelated. Domain engineers cannot easily find possible root causes of low yield rapidly and efficiently only using their own domain knowledge or applying rules of thumb. This study aims to construct a data mining framework for analyzing semiconductor manufacturing data and propose a hybrid decision tree approach that involves Kruskal-Wallis test, chi-square interaction detection, and the variance reduction splitting criterion to analyze huge multi-dimensional data and infer possible causes of faults for troubleshooting. The proposed hybrid decision tree approach can also eliminate the variable selection bias during the decision tree construction. We conduct an empirical study in a semiconductor company for validation. The results demonstrated the practical viability of the proposed method to help the engineers to diagnose the faults and improve yield efficiently and effectively. Keywords: Data mining, Decision tree, yield enhancement, Semiconductor manufacturing data Chen-Fu Chien, Ph.D. is now the Deputy Director of Industrial Engineering Division at tsmc and is also a Professor on-leave from Department of Industrial Engineering and Engineering Management, National Tsing Hua University. He was a Fulbright Scholar in UC Berkeley from Dr. Chien has received many awards including Tier 1 P.I. of NSC, the Distinguished Industrial Collaboration Award from the Ministry of Education, Best Research Award by NSC, Best Paper Award by CIIE, and Best Engineering Paper Award by the Chinese Institute of Engineers, Distinguished Young IE Engineers, and Distinguished Young Faculty of NTHU, Taiwan. Before joining tsmc, he has served as a Senior Consultant in Manufacturing Technology Center, tsmc. His research areas include modeling and analysis for semiconductor manufacturing, manufacturing strategy, decision analysis, and data mining. 14

15 Jen-Chieh Cheng received M.S. degree from Industrial Engineering and Engineering Management at the National Tsing Hua University. His research includes data mining, semiconductor manufacturing management, and decision analysis. Yun-Syuan Lin is a Ph.D. Candidate of Industrial Engineering and Engineering Management at the National Tsing Hua University. Her research focuses on soft-computing application of production systems, data mining, and decision analysis. 15