A Hybrid Decision Tree Approach for Semiconductor. Manufacturing Data Mining and An Empirical Study

Size: px
Start display at page:

Download "A Hybrid Decision Tree Approach for Semiconductor. Manufacturing Data Mining and An Empirical Study"

Transcription

1 A Hybrid Decision Tree Approach for Semiconductor Manufacturing Data Mining and An Empirical Study 1 C. -F. Chien J. -C. Cheng Y. -S. Lin 1 Department of Industrial Engineering, National Tsing Hua University cfchien@mx.nthu.edu.tw Abstract During semiconductor fabrication process, huge process data will be automatically or semi-automatically recorded and accumulated in database for monitoring the process, diagnosing faults and managing manufacturing. However, the manufacturing factors that affect the wafer yield are frequently interrelated. Domain engineers cannot easily find possible root causes of low yield rapidly and efficiently only using their own domain knowledge or applying rules of thumb. This study aims to construct a data mining framework for analyzing semiconductor manufacturing data and propose a hybrid decision tree approach that involves Kruskal-Wallis test, chi-square interaction detection, and the variance reduction splitting criterion to analyze huge multi-dimensional data and infer possible causes of faults for troubleshooting. The proposed hybrid decision tree approach can also eliminate the variable selection bias during the decision tree construction. We conduct an empirical study in a semiconductor company for validation. The results demonstrated the practical viability of the proposed method to help the engineers to diagnose the faults and improve yield efficiently and effectively. Keywords: Data mining, Decision tree, yield enhancement, Semiconductor manufacturing data 1. INTRODUCTION Semiconductor manufacturing is perhaps the most complex of modern manufacturing process. However, several manufacturing factors, which are frequently interrelated, impact the yield of silicon wafers. The complex fabrication processes and the daily accumulation of a large amount of raw data from various sources. Data mining is the process of discovering interesting knowledge from large amounts of data stored in databases, data warehouses, or other information repositories [1]. This study aims to construct a hybrid decision tree approach, involving a nonparametric Kruskal-Wallis test, chi-square interaction detection, and variance reduction splitting criterion to elucidate multi-dimensional semiconductor manufacturing data for mining (including SORT data and WIP data), troubleshooting and diagnosing processes. This 1

2 study addresses a real semiconductor problem in the fab, in which causal relationships exist between the machines used in the specific process and the yield. Then, after the relevant data are selected and appropriate preprocessing is performed, the proposed hybrid decision tree approach is employed to analyze the preprocessed data and thus derive a diagnostic model. Therefore, possible causes of manufacturing process abnormality are inferred and the results can be discussed with domain engineers for interpretation and implementation. This approach also can eliminate the variable selection bias when interrelated attributes existed during the decision tree construction. For validation, real data are used in an empirical study to compare the performance of the hybrid decision tree approach with that of some other decision tree algorithms. 2. A HYBRID DECISION TREE APPROACH TO DIAGNOSING PROCESSES This study proposes a framework and a hybrid decision tree approach for exploring huge sets of engineering data. This framework includes five phases as follows: defining the problem; selecting data, preprocessing data, constructing the decision tree, and evaluating and interpreting results, as illustrated in Figure Defining the problem The process of fabricating an integrated circuit in semiconductor manufacturing is very complex, and involves hundreds of operations at different stations. Typical semiconductor factories may include one to ten major fabrication process flows and produce 2,000 or more wafers every month. An averagely sized factory contains on average of 400 pieces of equipment. The complexity of wafer fabrication processes is such that engineers cannot easily locate rapidly and efficiently possible causes of fault from their own domain knowledge and experience. A number of studies have been done to employ data mining methods to improve the yield or diagnose the processes. Braha and Shmilovici [8] suggested a data mining method for improving a cleaning process used in the semiconductor industry, involving a decision tree, a neural network and a composite classificatory method. Zhou et al. [9] proposed a data mining system of rule induction as part of a drop test analysis of electronic products. Chien et al. [10] proposed a data mining approach that integrated the Kruskal-Wallis test and decision tree for diagnosing defects. Furthermore, Chien et al. [11] developed a data mining framework that consists of Kruskal-Wallis test, K-means clustering, and the variance reduction splitting criterion to investigate the huge amount of semiconductor manufacturing data and infer possible causes of faults and manufacturing process variations. 2.2 Selecting Data This study selects the SORT data and WIP (wafer in process) data. The CP (Circuit Probe) test is an electrical test that includes various functional tests of all the dies on each wafer. The CP yield rate from SORT data is defined as the target variable. SORT data 2

3 consists of the results of CP test including lot id, product name, and the location of the wafer. WIP data that consists of basic information of the fabricated wafer including lot id, product name, station, machine used at the process station, time and date are used as input variables. 1. Defining Problem Semiconductor Problem 2. Selecting Data Select Manufacturing Data from Engineering Database 3. Preprocessing Data Data Integration Data Cleaning Data Transformation 4. Constructing Decision Tree Indicate target variable and attributes Kruskal-Wallis test Interaction Detection Iterative growing Variance Reduction Final Decision tree Extract the decision tree rules 5. Evaluating and Interpreting Results Is any pattern existed? No Add more data Yes Revision and Advanced Analysis 3

4 Figure 1 Conceptual framework for diagnosing processes 2.3 Preprocessing data This study integrates different data sources from engineering database. The collected data is often affected by noisy, missing, and inconsistent data. Thus, data preparation is required to improve data quality and thus the effectiveness of data mining process. Preprocessing techniques include integrating, cleaning, and transforming data. Data integration merges data from multiple sources into a coherent data store. Data are cleaned to remove noisy data, identify outliers, fill in missing data, and correct inconsistencies. Data transformation transforms or consolidates data to make them appropriate for mining. In particular, SORT data and WIP data are integrated to diagnose processes if causal relationships exist between the machines used to perform a specific process and the CP yield. Yet, not all wafers pass all processes. Data cleaning methods are then used to handle missing values and remove some row and columns that are not pertinent to the problem. If too few wafer lots pass through the process station, then that process station can be ignored. Finally, the cleaned data are transformed into a format suitable for constructing a decision tree. 2.4 Constructing decision tree The fabrication of a semiconductor circuit is a multi-step process of up to hundreds of steps. The low yield problem may be caused by a single process station or machine in a particular period, and some local interactions may occur among various stations. However, when all the possible interactions among N stations are considered, the computation time may be very long. Detecting pair-wise interactions is more feasible. Since semiconductor manufacturing data are not all normally distributed, nonparametric Kruskal-Wallis test is used to determine whether significant differences exist among the yield of different machines in the same process station. Chi-square analysis of residuals is performed to detect local two-station interactions and eliminate variable selection bias from the construction of the decision tree. Then, the variance reduction splitting criterion is applied to grow the branches to minimize the total variance. This approach is designed to deal with continuous target variable and categorical input attributes. The proposed hybrid decision tree approach consists of the following four major steps: Step 1: Perform the Kruskal-Wallis test to identify which attributes have significant difference in continuous target of different levels. Step 2: Detect two-variable interactions between pairs of candidate attributes in step 1. Step 3: Apply variance reduction criterion to split on the candidate attributes and grow the branches. Step 4: Grow the tree iteratively by repeating Steps 1 to 3 until all attributes in the decision tree model are not significantly different or the following stopping rules are met. 1. The node includes fewer than five instances. 2. The values of all attributes are the same. 4

5 2.5 Evaluating and interpreting results The causal relationship can be extracted easily from the results of the decision tree model. The leaf nodes are extracted according to the target distribution. Each If-Then rule can be created on each path from the root node to a leaf node. The model can be interpreted and the results analyzed in a discussion among domain engineers and data experts. Then, revision or advanced analysis can be conducted to find better ways to solve the problem. 3. PERFORMANCE COMPARISON Before the empirical study, we first estimate the validity of this approach by comparing the performance of the hybrid decision tree approach with some current decision tree such as CART and CHAID based on a real dataset from the machine-learning-database website ( 3.1 Defining the problem Servo is a dataset collected from a simulation of a servo system, which involves a servo amplifier, a motor, a lead screw, and a sliding carriage of some sort. The output value is the required time of the system to respond to a step change in a position set point. Servo covers an extremely non-linear phenomenon - predicting the rise time of a servomechanism in terms of two continuous gain settings and two discrete choices of mechanical linkages. Servo dataset includes 167 instances with four attributes and one target. Table I shows the data information of Servo. 3.2 Selecting and preprocessing data Servo dataset is a single dataset without missing value. Selecting and preprocessing data are unnecessary. Table I The variable information of Servo Variable Levels Data Type motor A, B, C, D, E Categorical attribute screw A, B, C, D, E Categorical attribute pgain 3,4,5,6 Categorical attribute vgain 1,2,3,4,5 Categorical attribute class 0.13 to 7.10 Continuous target 3.3 Constructing the decision tree Then, hybrid decision tree approach is used to construct the decision tree model. Each step is described as follows. 5

6 All Data Avg: Size: 167 pgain-vgain p3-g1 p3-g2 motor p4-v1,p4-v2,p4-v3 p5-v1,p5-v2,p5-v3 p5-v4,p6-v1,p6-v2 p6-v3,p6-v5 screw-vgain N:6 AVE:5.166 DEV: screw A,B A,B,C screw C,D,E vgain N:10 AVE:1.72 DEV: A B 1 2 N:6 AVE:4.399 DEV: N:9 AVE:3.122 DEV: N:9 AVE:4.166 DEV: vgain D,E 1 2 N:10 AVE:2.08 DEV: vgain 4 sa-v1,sa-v2,sa-v3 sa-v4,sa-v5,sb-v1 sb-v3,sc-v3,sd-v3 se-v3 motor A,B,E N:24 AVE: pgain DEV: ,6 N:8 AVE: DEV: C,D 1 N:10 AVE: DEV: sb-v2,sb-v4,sb-v5 sc-v1,sc-v2,sc-v4 sc-v5,sd-v1,sd-v2 sd-v4,sd-v5,se-v1 se-v2,se-v4,se-v5 screw A,B vgain motor 2,4,5 C,D,E N:27 AVE: DEV: ,2 3 N:9 AVE: DEV: N:15 AVE: DEV: B N:6 AVE: DEV: motor C,D,E Figure 2 The final hybrid decision tree of Servo A N:9 AVE: DEV: B N:9 AVE: DEV: Firstly, the Kruskal-Wallis test is applied to the continuous target, clas, to determine which attributes have significant difference among different levels. The significance level of Kruskal-Wallis test is If the p-value of an attribute tested by Kruskal-Wallis test is below 0.05, then the levels in the attribute are said to differ significantly. From the Kruskal-Walis test results, the atributes pgain and vgain have significant difference among different levels. Secondly, the interactions of two-variable between pairs of attributes that are significant in step 1 are detected. The significance level of Chi-square test is The attributes pgain and vgain are determined to differ significantly among different levels in step 1. Step 2 detects whether a significant interaction exist between pgain and vgain. From the detection result, the interaction between pgain and vgain is significant. Then, a new combined attribute pgain-vgain is generated. The attributes pgain-vgain, pgain and vgain are the candidate attributes. Thirdly, the best variance reduction is selected for splitting in the candidate attributes. Finally, Steps 1 to 3 are repeated to grow the decision tree iteratively until all attributes in the decision tree do not differ significantly or the number of instances in the node is less than five. Figure 2 shows the final decision tree. 6

7 3.4 Evaluating and interpreting the results We compared the performance of three decision tree algorithms, CART, CHAID and the proposed hybrid decision tree approach. The prediction accuracy, the number of leaves, and the tree depth are compared. The prediction accuracy is measured by average squared error as follows: Average squared error = 1 N ( N n 1 y d ( n x n where d( x n ) is a predictor of y n. The number of leaves and the tree depth are for evaluating the size of decision tree. Table II summarizes the test results of these decision tree algorithms in term of average squared error, the number of leaves, the tree depth and the number of rules extract from decision tree. Table II Comparison of decision tree algorithms test Algorithm CART CHAID Our hybrid approach Ave. squared error Number of leaves Tree depth Number of rules )) 2 4. EMPIRICAL STUDY This research conducted an empirical study by using real semiconductor manufacturing data from a fab in Taiwan to validate the proposed hybrid decision tree approach for solving the low yield problem. 4.1 Defining the problem The data from the fab show that the CP yields of some lots are abnormal, and the CP yield is low, implying reduced productivity of the fab. The relevant fabrication data with large variations are examined to determine the root causes, and the problem is solved as quickly as possible, to improve the yield of the product and reduce the cost. The possible root causes may involve some machines at process stations or in particular periods during the semiconductor manufacturing process. 4.2 Selecting data The company of interest had built an engineering database system to perform data access and management. In this case, the CP yield was found to be out of the control limit of 156 lots, as plotted in Figure 3. Hence, the manufacturing data including CP test data, and other relevant data were extracted from the engineering database from 7/1~7/19 in one year. 7

8 CP Yield CP Yield Lot Lot Figure 3 Joint plot of CP yield of all wafer lots Figure 4 Joint plot of CP yield after data are preprocessed 4.3 Preprocessing data The SORT data and WIP data merge into a single dataset to construct the decision tree. The target is the CP yield rate, and relevant information on process stations includes attributes, such as machines and operating time. The selected data include some missing values and not all wafer lots passed through all process stations, so some wafer lots and some process stations were removed to improve the quality of the data and the efficiency of the decision tree construction steps. The possible root causes may involve machines at the process station or particular periods, so information about each process station is presented in terms of two attributes - one is the machine in the process station and the other is the machine and its date of operation. 4.4 Constructing the decision tree After the data are preprocessed, the training data include 150 wafer lots (instances), including one continuous target (CP yield) and 928 categorical attributes (464 process stations and 464 process stations with their dates of operation). The corresponding levels are the names of the machines in the station and the names of the machines combined with their dates of operation. Then, the hybrid decision tree approach is used to construct the decision tree model. Each step is described as follows. First, the Kruskal-Wallis test is applied to the continuous target, CP yield, to determine which attributes (stations) have significant difference among different levels (machines or operating time). The significance level of the Kruskal-Wallis test is If the p-value of an attribute tested by the Kruskal-Wallis test is below 0.05, then the levels in the attribute are said to differ significantly. Thirty attributes are significant at the 0.05 confidence level. Table III shows the Kruskal-Wallis test results. Second, the interactions of two-variable between pairs of attributes that are significant in step 1 are detected. If an interaction between a pair of attributes exists, their levels are combined and these attributes are considered as a new single attribute. The significance level 8

9 in the chi-square test is Therefore, if the p-value of an attribute in the chi-square test is under 0.05, the pair of attributes are said to significantly interact. Table III lists the detection results. The significant attributes in Step 1 and the combined attributes in Step 2 are the candidate attributes. Table III Kruskal-Wallis test results Station P-Value Station P-Value Station P-Value Tb Tb AH Tb Tb Tb Ta Tb Tb Tc Ta Tb Tb Tb Tb Tb Tb Tb Tb Tb Tb Tb Tb Ta Tb Tb Tc Tb Tb Tc Third, the best variance reduction is selected for splitting in the candidate attributes. Figure 4 shows the results at the first split. In the root node, the mean of all training data is After the first split, if the combined attribute of Ta60-Ta47 is the combined level AT02-BT25, the branch leads to the left child. The mean of the target is and the number of instances is 19 in the left child. The other instances lead to the right child, whose combined attribute Ta60-Ta47 is not the combined level AT02-BT25. In the right child, the mean of the target is and the number of instances is 131. Steps 1 to 3 are repeated to grow the decision tree iteratively until all attributes in the decision tree do not differ significantly or the number of instances in the node is less than five. Figure 5 presents the final decision tree. After the initial split, the right child continues to grow. The left child contains 35 instances with split values of T702 and T707 in the attribute Ta17. The mean of target is The remaining instances lead to the right child, whose mean of the target is Then, the decision tree result can be provided to domain engineers for trouble-shooting and diagnosing processes. 9

10 Table IV Detection of interaction results Combined Attributes p-value Tc04 Ta Ta60 - Ta Tb366 - Ta AH4 - Ta Tc05 - Tb Ta17 - Tb Tb393 Tc Tc04 - Tb Evaluating and interpreting results From the decision tree result, the left leaf in the first split and left leaf in the second split can be identified as low yield groups. The right leaf in the second split can be regarded as a normal yield group. In the first split, the combined attribute of Ta60-Ta47 and combined level of AT02-BT25 imply that 19 wafer lots of the left leaf node passed through the machine AT02 in station Ta60 and the machine BT25 in station Ta47. The machine AT02 in station Ta60 and the machine BT25 in station Ta47 may be causing the low yield. On the left leaf in the second split, the machines T702 and T707 in station Ta17 are also responsible for other low yield. The decision tree model shows that these two production paths cause low yield. The Kruskal-Wallis test yields insignificant results and the instances in the leaf node are fewer than five, so the decision tree stops growing. Then, some revision or advanced analysis is conducted. The yield in the two low-yield groups is viewed in order of time of operation, to determine whether causal relationships exist between the operating time and the yield. On the left leaf node in the first split, the low yield of machine BT25 in station Ta47suddenly drops between 7/16 and 7/17. Seven wafer lots are produced during this period, of which four have a poor yield rate under 40%. Figure 7 shows the results. On the left leaf node in the second split, the low yield rate suddenly drops from 7/10 to 7/15 at station Ta17. Thirteen wafer lots are produced during these periods, of which four have a poor yield rate under 60%. Figure 8 presents the results. A discussion with domain experts revealed that the root cause machine is BT25 in station Ta47 from date 7/15 to 7/17. Another machine T707 in station Ta17 during 7/9 to 7/14 is not the root cause but should still be noticed. 10

11 CP Yield CP Yield Operation time of Ta47 Operation time of Ta17 Figure 7 Yield performance of leaf node in first split Figure 8 Yield performance of leaf node in second split 5. CONCLUDING REMARKS The hybrid decision tree approach is designed for diagnosing the machines or stations in a complex semiconductor manufacturing process that cause low yield. Fabrication processes include hundreds of operations in various stations, especially some local interactions may occur among stations, affecting the accuracy of the results of the decision tree. The empirical results validate the proposed approach as practically viable, and demonstrate that this hybrid decision tree approach effectively assists engineers in trouble-shooting and diagnosing processes. The hybrid decision tree can eliminate variable selection bias during the growth of the tree when training data are interrelated. The proposed hybrid approach can produce powerful splits during the growth of the tree and thereby construct shorter and easily interpretable trees. The target defined in the real case is the CP yield that is sometimes inappropriate for diagnosing processes, since the causes of a fault may be obscure. Indeed, the yield is a synthetic index of the performance determined over hundreds of processes. The WAT data is another good substitute for the yield as the target because each WAT parameter reflects specific operations. However, several advanced analyses can be conducted to determine whether the normal yield group exhibits any pattern in the case study. Further studies can be done to identify other causes of faults that could lead to the yield rate s beingbetween 70%~80%. ACKNOWLEDGEMENT This research is partially supported by National Science Council, Taiwan (NSC E ; NSC E ; NSC E ). REFERENCES 1. Han, J. and Kamber, M., Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers (2001). 11

12 2. Fu, Y., Data mining, IEEE Potentials, 164, (1997). 3. Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P., The KDD Proces for Extracting Useful Knowledge from Volumes of Data, Communication of ACM, 39, 11, (1996). 4. Feelders, A., Daniels, H., and Holsheimer, M., Methodological and practical aspects of data mining, Information & Management, 37, (2000). 5. Kass, G. V., An exploratory technique for investigating large quantities of categorical data, Applied Statistics, 29, 2, (1980). 6. Breiman, L., Friedman, J. H., Olshen, R. J., and Stone, C. J., Classification and Regression Trees, Belmont, CA: Wadsworth (1984). 7. Quinlan, J. R., C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco, California (1993). 8. Braha, D. and Shmilovici, A. Data mining for improving a cleaning process in the semiconductor industry, IEEE Transactions on Semiconductor Manufacturing, 15, 1, (2002). 9. Zhou, C., Nelson, P. C., Xiao, W., Tripak, T. M., and Lane, S. A., An intelligent data mining system for drop test analysis of electronic products, IEEE Transactions in electronics packaging manufacturing, 24, 3, July, (2001). 10. Chien, C., Lin, T., Peng, C., and Hsu, S., Developing data mining framework and methods for diagnosing semiconductor manufacturing defects and an empirical study of wafer acceptance test data in a wafer fab, Journal of the Chinese Institute of Industrial Engineers, 18, 4, (2001). 11. Chien, C., Wang, W., and Cheng, C., Data mining for yield enhancement in semiconductor manufacturing and an empirical study, Expert Systems with Applications, 33(1), 1-7 (2007). 12

13 建 構 混 合 決 策 樹 以 分 析 半 導 體 製 程 資 料 及 其 實 證 研 究 簡 禎 富 * 鄭 仁 傑 林 昀 萱 國 立 清 華 大 學 工 業 工 程 與 管 理 學 系 摘 要 在 企 業 電 子 化 與 數 位 決 策 時 代, 大 量 的 資 料 儲 存 於 資 料 倉 儲 或 資 料 庫 中, 這 些 資 料 能 萃 取 豐 富 的 資 訊 以 提 供 知 識 發 現 與 決 策 分 析 在 半 導 體 的 製 造 過 程 中, 大 量 的 製 程 資 料 會 收 集 到 工 程 資 料 中, 以 便 進 行 製 程 監 控 故 障 分 析 與 製 造 管 理 然 而 因 為 半 導 體 的 製 程 複 雜, 而 且 影 響 的 變 因 眾 多 切 通 常 具 有 相 互 關 係, 工 程 師 往 往 藉 著 本 身 的 專 業 知 識 或 是 經 驗 法 則, 因 此 難 以 迅 速 且 有 效 率 從 資 料 中 發 覺 導 致 製 成 異 常 的 原 因 以 及 可 能 隱 藏 的 資 訊 以 及 時 處 理 良 率 異 常 問 題 本 研 究 建 構 半 導 體 資 料 挖 礦 架 構 並 發 展 混 合 決 策 樹 方 法 以 尋 找 可 能 造 成 變 異 的 原 因, 做 為 工 程 師 及 領 域 專 家 解 決 問 題 的 參 考 依 據, 其 中 包 含 Kruskal-Wallis 檢 定 卡 方 交 互 影 響 檢 測 變 異 降 低 分 支 法 則, 可 以 協 助 工 程 師 縮 短 事 故 診 斷 的 時 間, 進 而 提 升 半 導 體 製 程 的 良 率 本 研 究 以 某 半 導 體 廠 之 案 例 為 實 證, 運 用 真 實 資 料 比 較 混 合 決 策 樹 方 法 與 現 行 的 決 策 樹 演 算 法 的 表 現, 以 檢 驗 本 研 究 的 效 度 關 鍵 字 : 資 料 挖 礦 決 策 樹 良 率 提 升 半 導 體 製 造 資 料 13

14 A Hybrid Decision Tree Approach for Semiconductor Manufacturing Data Mining and An Empirical Study C. -F. Chien*, J. -C. Cheng, and Y. -S. Lin Department of Industrial Engineering, National Tsing Hua University Abstract During semiconductor fabrication process, huge process data will be automatically or semi-automatically recorded and accumulated in database for monitoring the process, diagnosing faults and managing manufacturing. However, the manufacturing factors that affect the wafer yield are frequently interrelated. Domain engineers cannot easily find possible root causes of low yield rapidly and efficiently only using their own domain knowledge or applying rules of thumb. This study aims to construct a data mining framework for analyzing semiconductor manufacturing data and propose a hybrid decision tree approach that involves Kruskal-Wallis test, chi-square interaction detection, and the variance reduction splitting criterion to analyze huge multi-dimensional data and infer possible causes of faults for troubleshooting. The proposed hybrid decision tree approach can also eliminate the variable selection bias during the decision tree construction. We conduct an empirical study in a semiconductor company for validation. The results demonstrated the practical viability of the proposed method to help the engineers to diagnose the faults and improve yield efficiently and effectively. Keywords: Data mining, Decision tree, yield enhancement, Semiconductor manufacturing data Chen-Fu Chien, Ph.D. is now the Deputy Director of Industrial Engineering Division at tsmc and is also a Professor on-leave from Department of Industrial Engineering and Engineering Management, National Tsing Hua University. He was a Fulbright Scholar in UC Berkeley from Dr. Chien has received many awards including Tier 1 P.I. of NSC, the Distinguished Industrial Collaboration Award from the Ministry of Education, Best Research Award by NSC, Best Paper Award by CIIE, and Best Engineering Paper Award by the Chinese Institute of Engineers, Distinguished Young IE Engineers, and Distinguished Young Faculty of NTHU, Taiwan. Before joining tsmc, he has served as a Senior Consultant in Manufacturing Technology Center, tsmc. His research areas include modeling and analysis for semiconductor manufacturing, manufacturing strategy, decision analysis, and data mining. 14

15 Jen-Chieh Cheng received M.S. degree from Industrial Engineering and Engineering Management at the National Tsing Hua University. His research includes data mining, semiconductor manufacturing management, and decision analysis. Yun-Syuan Lin is a Ph.D. Candidate of Industrial Engineering and Engineering Management at the National Tsing Hua University. Her research focuses on soft-computing application of production systems, data mining, and decision analysis. 15

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India lav_dlr@yahoo.com

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

Efficiency in Software Development Projects

Efficiency in Software Development Projects Efficiency in Software Development Projects Aneesh Chinubhai Dharmsinh Desai University aneeshchinubhai@gmail.com Abstract A number of different factors are thought to influence the efficiency of the software

More information

REPORT DOCUMENTATION PAGE

REPORT DOCUMENTATION PAGE REPORT DOCUMENTATION PAGE Form Approved OMB NO. 0704-0188 Public Reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,

More information

DATA MINING METHODS WITH TREES

DATA MINING METHODS WITH TREES DATA MINING METHODS WITH TREES Marta Žambochová 1. Introduction The contemporary world is characterized by the explosion of an enormous volume of data deposited into databases. Sharp competition contributes

More information

DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress)

DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress) DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress) Leo Pipino University of Massachusetts Lowell Leo_Pipino@UML.edu David Kopcso Babson College Kopcso@Babson.edu Abstract: A series of simulations

More information

Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Weather forecast prediction: a Data Mining application

Weather forecast prediction: a Data Mining application Weather forecast prediction: a Data Mining application Ms. Ashwini Mandale, Mrs. Jadhawar B.A. Assistant professor, Dr.Daulatrao Aher College of engg,karad,ashwini.mandale@gmail.com,8407974457 Abstract

More information

Customer Classification And Prediction Based On Data Mining Technique

Customer Classification And Prediction Based On Data Mining Technique Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor

More information

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

A New Approach for Evaluation of Data Mining Techniques

A New Approach for Evaluation of Data Mining Techniques 181 A New Approach for Evaluation of Data Mining s Moawia Elfaki Yahia 1, Murtada El-mukashfi El-taher 2 1 College of Computer Science and IT King Faisal University Saudi Arabia, Alhasa 31982 2 Faculty

More information

Data Mining Applications in Fund Raising

Data Mining Applications in Fund Raising Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.7 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Linear Regression Other Regression Models References Introduction Introduction Numerical prediction is

More information

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University

More information

Big Data Analytics and Decision Analysis for Manufacturing Intelligence to Empower Industry 3.5

Big Data Analytics and Decision Analysis for Manufacturing Intelligence to Empower Industry 3.5 ISMI2015, Oct. 16-18, 2015 KAIST, Daejeon, South Korea Big Data Analytics and Decision Analysis for Manufacturing Intelligence to Empower Industry 3.5 Tsinghua Chair Professor Chen-Fu Chien, Ph.D. Department

More information

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One

More information

Towards applying Data Mining Techniques for Talent Mangement

Towards applying Data Mining Techniques for Talent Mangement 2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Towards applying Data Mining Techniques for Talent Mangement Hamidah Jantan 1,

More information

Data Mining for Model Creation. Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA 98370 paul.below@eds.

Data Mining for Model Creation. Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA 98370 paul.below@eds. Sept 03-23-05 22 2005 Data Mining for Model Creation Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA 98370 paul.below@eds.com page 1 Agenda Data Mining and Estimating Model Creation

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

Assessing Data Mining: The State of the Practice

Assessing Data Mining: The State of the Practice Assessing Data Mining: The State of the Practice 2003 Herbert A. Edelstein Two Crows Corporation 10500 Falls Road Potomac, Maryland 20854 www.twocrows.com (301) 983-3555 Objectives Separate myth from reality

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant

More information

APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION ANALYSIS. email paul@esru.strath.ac.uk

APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION ANALYSIS. email paul@esru.strath.ac.uk Eighth International IBPSA Conference Eindhoven, Netherlands August -4, 2003 APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION Christoph Morbitzer, Paul Strachan 2 and

More information

CHAID Decision Tree: Reverse Mortgage Loan Termination Example

CHAID Decision Tree: Reverse Mortgage Loan Termination Example CHAID Decision Tree: Reverse Mortgage Loan Termination Example Business Context Reverse Mortgage Loan (RML) enables Senior Citizens to avail of periodical payments from a lender against the mortgage of

More information

An Overview and Evaluation of Decision Tree Methodology

An Overview and Evaluation of Decision Tree Methodology An Overview and Evaluation of Decision Tree Methodology ASA Quality and Productivity Conference Terri Moore Motorola Austin, TX terri.moore@motorola.com Carole Jesse Cargill, Inc. Wayzata, MN carole_jesse@cargill.com

More information

An Empirical Study of Application of Data Mining Techniques in Library System

An Empirical Study of Application of Data Mining Techniques in Library System An Empirical Study of Application of Data Mining Techniques in Library System Veepu Uppal Department of Computer Science and Engineering, Manav Rachna College of Engineering, Faridabad, India Gunjan Chindwani

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

Decision Tree Learning on Very Large Data Sets

Decision Tree Learning on Very Large Data Sets Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa

More information

A Case Study in Knowledge Acquisition for Insurance Risk Assessment using a KDD Methodology

A Case Study in Knowledge Acquisition for Insurance Risk Assessment using a KDD Methodology A Case Study in Knowledge Acquisition for Insurance Risk Assessment using a KDD Methodology Graham J. Williams and Zhexue Huang CSIRO Division of Information Technology GPO Box 664 Canberra ACT 2601 Australia

More information

Financial Profiling for Detecting Operational Risk by Data Mining

Financial Profiling for Detecting Operational Risk by Data Mining Financial Profiling for Detecting Operational Risk by Data Mining Ali Serhan Koyuncugil, PH.D. Capital Markets Board of Turkey, Turkey Nermin Ozgulbas, PH.D. Baskent University, Turkey Abstract Basel II

More information

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

More information

Application of data mining in a maintenance system for failure prediction

Application of data mining in a maintenance system for failure prediction Safety, Reliability and Risk Analysis: Beyond the Horizon Steenbergen et al. (Eds) 2014 Taylor & Francis Group, London, ISBN 978-1-138-00123-7 Application of data mining in a maintenance system for failure

More information

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.

More information

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)

More information

A Review of Data Mining Techniques

A Review of Data Mining Techniques Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether

More information

Chapter 2 Literature Review

Chapter 2 Literature Review Chapter 2 Literature Review 2.1 Data Mining The amount of data continues to grow at an enormous rate even though the data stores are already vast. The primary challenge is how to make the database a competitive

More information

How To Use Data Mining For Loyalty Based Management

How To Use Data Mining For Loyalty Based Management Data Mining for Loyalty Based Management Petra Hunziker, Andreas Maier, Alex Nippe, Markus Tresch, Douglas Weers, Peter Zemp Credit Suisse P.O. Box 100, CH - 8070 Zurich, Switzerland markus.tresch@credit-suisse.ch,

More information

Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University. www.cs.cmu.edu/~awm awm@cs.cmu.

Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University. www.cs.cmu.edu/~awm awm@cs.cmu. Decision Trees Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 42-268-7599 Copyright Andrew W. Moore Slide Decision Trees Decision trees

More information

Random forest algorithm in big data environment

Random forest algorithm in big data environment Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms

Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Y.Y. Yao, Y. Zhao, R.B. Maguire Department of Computer Science, University of Regina Regina,

More information

Revista Economică 66:2 (2014) MINING EMPLOYEE EFFICIENCY IN MANUFACTURING

Revista Economică 66:2 (2014) MINING EMPLOYEE EFFICIENCY IN MANUFACTURING MINING EMPLOYEE EFFICIENCY IN MANUFACTURING ESRA Kahya Ozyirmidokuz 1, CEBRAIL Ciflikli 2 Erciyes University Abstract Data mining has a significant impact in information management in discovering patterns

More information

Data Mining Techniques Chapter 6: Decision Trees

Data Mining Techniques Chapter 6: Decision Trees Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

CLUSTERING AND PREDICTIVE MODELING: AN ENSEMBLE APPROACH

CLUSTERING AND PREDICTIVE MODELING: AN ENSEMBLE APPROACH CLUSTERING AND PREDICTIVE MODELING: AN ENSEMBLE APPROACH Except where reference is made to the work of others, the work described in this thesis is my own or was done in collaboration with my advisory

More information

Healthcare Data Mining: Prediction Inpatient Length of Stay

Healthcare Data Mining: Prediction Inpatient Length of Stay 3rd International IEEE Conference Intelligent Systems, September 2006 Healthcare Data Mining: Prediction Inpatient Length of Peng Liu, Lei Lei, Junjie Yin, Wei Zhang, Wu Naijun, Elia El-Darzi 1 Abstract

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

Method of Fault Detection in Cloud Computing Systems

Method of Fault Detection in Cloud Computing Systems , pp.205-212 http://dx.doi.org/10.14257/ijgdc.2014.7.3.21 Method of Fault Detection in Cloud Computing Systems Ying Jiang, Jie Huang, Jiaman Ding and Yingli Liu Yunnan Key Lab of Computer Technology Application,

More information

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;

More information

Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I

Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Data is Important because it: Helps in Corporate Aims Basis of Business Decisions Engineering Decisions Energy

More information

Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining -

Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining - Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining - Hidenao Abe, Miho Ohsaki, Hideto Yokoi, and Takahira Yamaguchi Department of Medical Informatics,

More information

EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS

EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS Susan P. Imberman Ph.D. College of Staten Island, City University of New York Imberman@postbox.csi.cuny.edu Abstract

More information

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI Data Mining Knowledge Discovery, Data Warehousing and Machine Learning Final remarks Lecturer: JERZY STEFANOWSKI Email: Jerzy.Stefanowski@cs.put.poznan.pl Data Mining a step in A KDD Process Data mining:

More information

Rule based Classification of BSE Stock Data with Data Mining

Rule based Classification of BSE Stock Data with Data Mining International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification

More information

Data Preprocessing. Week 2

Data Preprocessing. Week 2 Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

More information

Three Perspectives of Data Mining

Three Perspectives of Data Mining Three Perspectives of Data Mining Zhi-Hua Zhou * National Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China Abstract This paper reviews three recent books on data mining

More information

Why do statisticians "hate" us?

Why do statisticians hate us? Why do statisticians "hate" us? David Hand, Heikki Mannila, Padhraic Smyth "Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data

More information

Pattern Recognition Using Feature Based Die-Map Clusteringin the Semiconductor Manufacturing Process

Pattern Recognition Using Feature Based Die-Map Clusteringin the Semiconductor Manufacturing Process Pattern Recognition Using Feature Based Die-Map Clusteringin the Semiconductor Manufacturing Process Seung Hwan Park, Cheng-Sool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, Jun-Geol Baek Abstract Depending

More information

Implementation of Data Mining Techniques to Perform Market Analysis

Implementation of Data Mining Techniques to Perform Market Analysis Implementation of Data Mining Techniques to Perform Market Analysis B.Sabitha 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, P.Balasubramanian 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

Data Engineering for the Analysis of Semiconductor Manufacturing Data

Data Engineering for the Analysis of Semiconductor Manufacturing Data Data Engineering for the Analysis of Semiconductor Manufacturing Data Peter Turney Knowledge Systems Laboratory Institute for Information Technology National Research Council Canada Ottawa, Ontario, Canada

More information

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Analytics for Business Intelligence and Decision Support Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

More information

Performance Analysis of Decision Trees

Performance Analysis of Decision Trees Performance Analysis of Decision Trees Manpreet Singh Department of Information Technology, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India Sonam Sharma CBS Group of Institutions, New Delhi,India

More information

Data Mining Applications in Manufacturing

Data Mining Applications in Manufacturing Data Mining Applications in Manufacturing Dr Jenny Harding Senior Lecturer Wolfson School of Mechanical & Manufacturing Engineering, Loughborough University Identification of Knowledge - Context Intelligent

More information

Top Top 10 Algorithms in Data Mining

Top Top 10 Algorithms in Data Mining ICDM 06 Panel on Top Top 10 Algorithms in Data Mining 1. The 3-step identification process 2. The 18 identified candidates 3. Algorithm presentations 4. Top 10 algorithms: summary 5. Open discussions ICDM

More information

Chia-Yen Lee ( 李 家 岩 )

Chia-Yen Lee ( 李 家 岩 ) Chia-Yen Lee ( 李 家 岩 ) Assistant Professor Institute of Manufacturing Information and Systems National Cheng Kung University, Taiwan No.1, University Road, Email: cylee@mail.ncku.edu.tw Tainan City 701,

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

More information

How To Solve The Kd Cup 2010 Challenge

How To Solve The Kd Cup 2010 Challenge A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

Philosophies and Advances in Scaling Mining Algorithms to Large Databases

Philosophies and Advances in Scaling Mining Algorithms to Large Databases Philosophies and Advances in Scaling Mining Algorithms to Large Databases Paul Bradley Apollo Data Technologies paul@apollodatatech.com Raghu Ramakrishnan UW-Madison raghu@cs.wisc.edu Johannes Gehrke Cornell

More information

Divide-n-Discover Discretization based Data Exploration Framework for Healthcare Analytics

Divide-n-Discover Discretization based Data Exploration Framework for Healthcare Analytics for Healthcare Analytics Si-Chi Chin,KiyanaZolfaghar,SenjutiBasuRoy,AnkurTeredesai,andPaulAmoroso Institute of Technology, The University of Washington -Tacoma,900CommerceStreet,Tacoma,WA980-00,U.S.A.

More information

Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1

Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1 Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1 Salvatore J. Stolfo, David W. Fan, Wenke Lee and Andreas L. Prodromidis Department of Computer Science Columbia University

More information

On the effect of data set size on bias and variance in classification learning

On the effect of data set size on bias and variance in classification learning On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

More information

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Enhanced Boosted Trees Technique for Customer Churn Prediction Model IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction

More information

Classification and Prediction

Classification and Prediction Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

More information

Top 10 Algorithms in Data Mining

Top 10 Algorithms in Data Mining Top 10 Algorithms in Data Mining Xindong Wu ( 吴 信 东 ) Department of Computer Science University of Vermont, USA; 合 肥 工 业 大 学 计 算 机 与 信 息 学 院 1 Top 10 Algorithms in Data Mining by the IEEE ICDM Conference

More information

Word Count: Body Text = 5,500 + 2,000 (4 Figures, 4 Tables) = 7,500 words

Word Count: Body Text = 5,500 + 2,000 (4 Figures, 4 Tables) = 7,500 words PRIORITIZING ACCESS MANAGEMENT IMPLEMENTATION By: Grant G. Schultz, Ph.D., P.E., PTOE Assistant Professor Department of Civil & Environmental Engineering Brigham Young University 368 Clyde Building Provo,

More information

How To Use Neural Networks In Data Mining

How To Use Neural Networks In Data Mining International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES

PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES The International Arab Conference on Information Technology (ACIT 2013) PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES 1 QASEM A. AL-RADAIDEH, 2 ADEL ABU ASSAF 3 EMAN ALNAGI 1 Department of Computer

More information

Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University)

Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University) 260 IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.6, June 2011 Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case

More information

Data Mining: A Preprocessing Engine

Data Mining: A Preprocessing Engine Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,

More information

Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study

Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study Tongshan Chang The University of California Office of the President CAIR Conference in Pasadena 11/13/2008

More information

Perspectives on Data Mining

Perspectives on Data Mining Perspectives on Data Mining Niall Adams Department of Mathematics, Imperial College London n.adams@imperial.ac.uk April 2009 Objectives Give an introductory overview of data mining (DM) (or Knowledge Discovery

More information