Pattern Recognition Using Feature Based Die-Map Clusteringin the Semiconductor Manufacturing Process
|
|
|
- Lindsay Perkins
- 9 years ago
- Views:
Transcription
1 Pattern Recognition Using Feature Based Die-Map Clusteringin the Semiconductor Manufacturing Process Seung Hwan Park, Cheng-Sool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, Jun-Geol Baek Abstract Depending on the big data analysis becomes important, prediction using data from the semiconductor process is essential. In general, prediction and analysis of the causes of the failure are closely related. The purpose of this study is to analyze pattern affects the final test results using a die map based clustering. Many researches have been conducted using die data from the semiconductor test process. However, analysis has limitation as the test data is less directly related to the final test results. Therefore, this study proposes a framework for analysis through clustering using more detailed data than existing die data. This study consists of three phases. In the first phase, die map is created through fail bit data in each sub-area of die. In the second phase, clustering using map data is performed. And the third stage is to find patterns that affect final test result. Finally, the proposed three steps are applied to actual industrial data and experimental results showed the potential field application. Keywords Die-Map Clustering, Feature Extraction, Pattern Recognition, Semiconductor Manufacturing Process. I. INTRODUCTION HE importance of big data analysis in the semiconductor Tindustry is growing increasingly, as the semiconductor manufacturing process produces core products that are more integrated. Also, because of the high demand for the newest electronic devices, such as smart phones and tablets, it is important to improve the product through state-of-the-art equipment or process management techniques. The steps in semiconductor manufacturing are complicated because there are hundreds of manufacturing processes which take several months to complete. Therefore, to effectively manage the process, semiconductor manufacturing companies require a super-clean environment, regular equipment maintenance and comprehensive worker training [1]. In particular, big data analysis is essential for process management for the process management. This study focuses on analysis of data generated during the test process called as post-fabrication process. In general, test data analysis is performed usingtest data in die level and it is important to find a Seung Hwan Park, Jun Seok Kim, Cheong-Sool Park, and Youngji Yoo are with the School of Industrial Management Engineering, Korea University, Anam-dong, Seongbuk-gu, Seoul, , Republic of Korea (phone: ; {udongpang, bliths, dumm97, kakiro}@korea.ac.kr). Daewoong An is with DRAM Development Division, SK Hynix Semiconductor, Gyeongchung-daero 091, Bubal-eup, Icheon, Gyeonggi-do, , Republic of Korea (phone: ; [email protected]). Jun-Geol Baek is professor in the School of Industrial Management Engineering, Korea University, Anam-dong, Seongbuk-gu, Seoul, , Republic of Korea (phone: ; [email protected]). relationship to test data and the final test results. Fig. 1 represents the overall flow chart of semiconductor manufacturing process. Wafer Test data Fab Wafer Test Assembly Wafer test - Clustering -Prediction - Classification Assembly Final Test Results Final Test Packaging Fig. 1 Flow chart of thesemiconductor manufacturing process The semiconductor manufacturing process consists of fabrication and post fabrication process. In Fig. 1, Fab means stage that fabricates wafers and the subsequent process consists of a wafer test, the assembly, and the final test []. In conjunction with each of these steps, the fabrication, the wafer test, the assembly and the packaging are all calculated. The fabrication refers to the ratio of wafers in to wafers out. The wafer test is calculated as the ratio of whole chips of wafer out and the number of non-defective chips as decided by the wafer test. The assembly and packaging s are the ratio of the input chips after assembly and the input chips after the final test respectively. This study focuses primarily on analysis of final test results using feature based clustering from test data from the wafer test. In order to best improve the in the semiconductor manufacturing process, many researches have been performed for a long time. However, most research has focused on the test data in die level or wafer level. The prediction that uses a markov chain is performed by calculating the probability that a chip is acceptable given n defects [3]. However, this prediction is limited due to the univariate analysis on which it is based. Also, multiple discriminant analysis is conducted for multivariate data analysis. However, this method of analysis is limited by the assumption that variables are independent, identical and normally distributed. For the further multivariate analysis, the hybrid machine learning methods are applied in the semiconductor manufacturing domain, using decision tree, artificial neural network, and a self-organizing map (SOM) []. Also, the prediction is conducted using a fuzzy-based 3
2 neural network from the multivariate parametric test data [5]. However, this prior research has focused on the in wafer test process. To predict the in final test, a stepwise support vector machine (S-SVM) algorithm is applied to the wafer test data [1]. S-SVM classifies the low and high of the final test by screening for potential defects in the final test process. As a research about pattern recognition, the study is conducted using wafer clustering and Bayesian inference [7]. Also, self-localization through pattern recognition of wafer using geometric hashing has performed [6]. However, above two studies only focuses on wafer image. Therefore, this study proposes the procedures of selecting features using die level data to more accurately analyze the causes affect failure in final test. As shown in Fig., this study consists of the three steps. Step 1 Step Step 3 Test data Die Map Generation Clustering Pattern Recognition Fig. The proposed 3 steps in this study Validation In step 1, die-maps are generated from the test data. Die maps represents input variable with regard to clustering. In step, feature that reflects the state of dies is selected through clustering. In step 3, we have validated the superiority of selected feature from die-maps. The rest of this paper is organized as follows. Section II and Section III introduce die-map generation process and clustering analysis for pattern recognition, respectively. Section IV describes the validation concept with regard to the proposed algorithm based on Sections II and III. In Section V, the experimental results are explained after the proposed algorithm is applied to actual industrial data. Finally, a conclusion and further studies are described in Section VI. II. DIE-MAP GENERATION This section describes die-map generation process from the test data. The test data consists of the FBC (Fail Bit Count) about each repair area in die. The FBC means the number of failure cells whose repair is required. Also, repair area means minimum unit that can replace fail cells with normal cells at once. Repair is conducted by using spare column or row cells. Using the test data, die-maps were generated in following three types. Type A (Column FBC): Colum FBC represents the number of spare column cell that is substituted for repair. As column repair area is greater than the hundreds, repair areas were summarized in a grid. In other word, die data that contains hundreds of variables are converted to a die-map. Type B (Column and row FBC): Type B concurrently considers the number of spare column and row cell that is substituted for repair. Like Type A, column and row repair areas were summarized in a 8 1 matrix. The column size of matrix is 1 because the size of column and row area is different each other. Type C (Detailed column and row FBC): Type C closely considers the number of spare column and row cell that is substituted for repair. This type is the row area is sliced to be equal to column area. The size of this die-map is 8 8 and this map is expected to include more detailed information. Fig. 3 represents die-map with regard to Type A. The legend on the right side means the minimum and maximum values of FBC on the wafer and density between two values. Also, 16 features are generated and features mean contrast of each position on the die-map. Wafer No. 1 / X=18 / Y= Fig. 3 A by die-map through Type A Fig. describes die-map with regard to Type B. As the sum of column FBC in repair area with row was used, contrast of each row is uniform. Therefore, in case of Type B, 8 features are generated. Wafer No. 1 / X=18 / Y= Fig. A 8 by 1 die-map through Type B Fig. 5 describes die-map with regard to Type C. In case of Type C, as the sum by section (section=8) of column FBC in
3 repair area with row was used, 6 features are generated. Wafer No. 1 / X=18 / Y= Fig. 5 A 8 by 8 die-map through Type C III. PATTERN RECOGNITION USING CLUSTERING This section describes pattern recognition through die-map clustering. In this study, K-means clus11tering is used since K-means clustering is widely known as one of the simplest unsupervised learning algorithms [6]. A. K-Means Clustering The primary concept of K-means clustering is to group given data set through a certain number of clusters (K-clusters). The calculation procedure of the algorithm is as follows. Step 1. Makes the initial centroid set y,,y N of K after K vectors from the data set x,,x N is randomly selected. Step. If data x is closest to y, x is labeled to belong to X. After all the data sets is divided into K clusters X,,X.,,, 1, } Step 3. The centroids of new cluster obtained in Step are updated. Step. The sum of distances between data and the nearest cluster centroid is calculated as the total distortion.,,, Step 5. Step - are repeated until fixed number of iterations is reached. B. Pattern Recognition through Die-Map Clustering Die-map clustering is performed based on feature from die-map. Features are defined by contrast according to location on the map. The basic concept of die-map clustering is to group dies with similar pattern. The procedure of extracting feature is as follows: First, FBC is converted to contrast level on the die and contrast level represents feature that reflects characteristics of the die-map. As shown in Fig. 6, die-maps are clustered according to similar patterns. Fig. 6 represents the clustering result about Type C die-map. Die-maps belong to cluster 1 have the lower probability of failure and die-maps belong to cluster have the higher probability of failure. Therefore, this study proposes the best feature among Type A, B and C and defective patterns by through die-map clustering. First, we find out feature affects failure using Cluster Specificity (CS). CS means index that represents the percentage of normal and abnormal dies in each cluster. For example, if the percentage of normal and abnormal dies is 1:1, CS is equal to 0. Also, if CS is closer to 1, the percentage of normal and abnormal dies is higher. In other word, CS is a value that reflects the performance of clustering for selecting best one among given features. Subsequently, we propose a pattern that affects failure most using selected feature. Cluster 1 Cluster Fig. 6 Clustering result about die-maps from Type C IV. PROPOSED ALGORITHM This section describes procedures of proposed algorithm, which consists of four parts in Fig. 7. First, die-map generation is performed to convert amounts of FBC data. Second, the clustering and pattern recognition through features extracted from die-maps are conducted. Finally, we validate the superiority of feature and pattern. Step 1: Die Map Generation Step -1: Clustering Step -: Pattern Recognition Step 3: Validation Fig. 7 Overall framework of the proposed algorithm 3
4 More detailed description with regards to the proposed algorithm is as in the following. A. Die Map Generation As mentioned in Section II, 3 Types die-maps are generated from FBC data set. Each type has a difference with accuracies of density. Type 3 is expected to be better than Type. Also, Type is expected to be better feature than Type. B. Clustering and Pattern Recognition This section describes comparing with 3 features from 3 kinds of die-map and finding out pattern that affects failure. First, clustering is performed using 3 features from 3 die-map data. Subsequently, the best feature is selected by comparing CS value defined in Section III-B. Second, using selected feature, we find out pattern affects failure in final test. For finding out pattern, heuristic algorithm was used. C. Validation For evaluating applicability of selected feature, we have compared the performance of three data mining models using 3 types feature. For experiment, Support Vector Machine, Artificial Neural Network and Decision Tree are used. V. EXPERIMENTAL RESULTS For the experiment, an actual industrial data set is used. The original data set includes 13 wafers, consisting of approximately 100 observations. This experiment consists of two parts. The first is to select more significant feature through die-map clustering. The second experiment evaluates the performance of three data mining model using 3 proposed features in this study. The data mining model considers Support Vector Machine, Neural Network and Decision Tree that are widely used for classification or regression. To compare the performance of clustering by each Type, CS value is used in Section III-B. The higher CS is the better clustering performance is. Fig. 8 represents CS values according to feature of 3 Types Cluster Specificity Type A Type B Type C Fig. 8 Comparing with CS values about 3 Types features 6 features from Type C are more significant than other Types in die-map clustering. Therefore, it is desirable that the original FBC data consisting of hundreds of variables is converted to 6 features. For validation about features from CS Type C, we have compared the performance of classification algorithms. In general, data set should be separated to training and test set to evaluate the performance of learning algorithm. However, as this study focuses on selecting feature affects failure in final test of semiconductor manufacturing process, the algorithm performance about training set is compared. Table I represents the classification accuracies with regard to 3 classification algorithms. Decision Tree Support Vector Machine Artificial Neural Network TABLE I PERFORMANCE OF 3 CLASSIFICATION MODELS Model Type A Type B Type C FCA PCA FCA PCA FCA PCA To evaluate the performance of the algorithms, we compared the FCA (Fail Classification Accuracy) and PCA (Pass Classification Accuracy). FCA means the percentage whose observations classified as Fail class is regarded as actual Fail class. PCA means the percentage whose observations classified as Pass class are regarded as actual Pass class. Except for PCA of Neural Network, 6 features from Type C are significant in all cases. Because Artificial Neural Network algorithm is sensitive in selecting model parameters, learning performance is not uniform. VI. CONCLUSION As the semiconductor test process consists of many test stages and each stage generates a large amount of test data, big data analysis is essential. Therefore, this study proposes the procedures of feature extraction from FBC data consisting of hundreds of variables. Also, this proposed procedure is applied to real industrial data and its applicability is verified. For the further study, the performance of classification algorithms is evaluated about both training and test set. Also, many more data sets need to be applied to this proposed algorithm to test the robustness of these results. ACKNOWLEDGMENT This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (013R1A1A010019). This research was supported by the MSIP (The Ministry of Science, ICT, and Future Planning), Korea, under the ICT R&D Infrastructure Support Program (NIPA-013-(I ) supervised by the NIPA (National IT Industry Promotion Agency) REFERENCES [1] An, D., Ko, H.H., Gulambar, T., Kim, J., Baek, J.G., and Kim, S., A Semiconductor Yields Prediction Using Stepwise Support Vector Machine, IEEE International Symposium on Assembly and Manufacturing, December
5 [] Uzsoy, R., C. Lee, and L.A. Martin-Vega, A Review of Production Planning and Scheduling models in the Semiconductor Industry PART I: System Characteristics, Performance Evaluation and Production Planning, IIE Transactions, Vol., No., pp. 7-60, 199. [3] Ciciani, B. and G. Jazeolla, "A Markov Chain-Based Yield Formula for VLSI Fault-Tolerant Chips," IEEE Transactions on Computer-Aided Design, Vol. 10, No., pp. 5-59, [] Kang, B.S., Lee, L.H., Shin, C.K., Yu, S.L., & Park, S.C., "Hybrid machine learning system for integrated management in semiconductor manufacturing," Expert Systems with Applications, Vol. 15, No., pp , August [5] Wu, L., and Zhang, J., Fuzzy neural network based prediction model for semiconductor manufacturing system, International Journal of Production Research, Vol. 8, No. 11, pp , June 010. [6] Jyoti, K., and Singh, S., Data Clustering Approach to Industrial Process Monitoring, Fault Detection and Isolation, International Journal of Computer Applications, Vol.17, No., Article8, March 011. [7] Tao, Y., and Way, K., Spatial defect pattern recognition on semiconductor wafers using model-based clustering and Bayesian inference, European Journal of Operation Research, Vol. 190, No 1, pp. 8-0, October 008. Seung Hwan Park received his B.S. degree in Computer Science and M.S. degree in Industrial Management Engineering from Korea University, Korea. He was a programming researcher at Tmax Soft from 008 to 009. He is currently a Ph.D. candidate in Industrial Management Engineering. His research interests are in statistical analysis and data-mining algorithms and application. Cheong-Sool Park is a Ph.D. candidate in Industrial Engineering at Korea University. He holds B.S. and M.S. degrees in Industrial Engineering from Korea University. From 005 to 006, he worked as a research analyst at Samsung Economic Research Institute. In 007, he worked as a researcher at the Institute for Advanced Engineering. His research is primarily in the field of data mining for manufacturing systems. He has a particular interest in modeling and analyzing time effects. Jun Seok Kim is a postdoctoral research fellow at the School of Industrial Management Engineering at Korea University. He holds B.S. and M.S. degrees in Industrial Systems and Information Engineering and a Ph.D. in Information Management Engineering from Korea University. His research interests include statistical process control and data-mining algorithms. Youngji Yoo received her B.S. degree in Information Statistics and Industrial Management Engineering from Korea University, Korea. She is currently in an integrated master and doctor degree course. Her research interests are in statistical analysis and application of data mining in production system. Daewoong An received his B.S. degree in Electronic Engineering and M.S. degree in Industrial Management Engineering from Korea University, Korea. He has been working as a product analysis engineer especially DRAM product at SK Hynix from 000.His research interests are in statistical analysis, data mining and big data analysis. Jun-Geol Baek received B.S., M.S., and Ph.D. degrees in Industrial Engineering from Korea University in 1993, 1995, and 001, respectively. From 00 to 007, he was an assistant professor in the Department of Industrial Systems Engineering at Induk Institute of Technology, Seoul, Korea. He was also an assistant professor in the Department of Business Administration at Kwangwoon University, Seoul, Korea, from 007 to 008. In 008, he joined the School of Industrial Management Engineering at Korea University, where he is now a professor. His research interests include statistical process control, fault detection and classification, advanced process control, and applications of data mining in production systems. 36
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin
Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)
Semantic Concept Based Retrieval of Software Bug Report with Feedback
Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Robust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
New Ensemble Combination Scheme
New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Segmentation of stock trading customers according to potential value
Expert Systems with Applications 27 (2004) 27 33 www.elsevier.com/locate/eswa Segmentation of stock trading customers according to potential value H.W. Shin a, *, S.Y. Sohn b a Samsung Economy Research
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
Crime Hotspots Analysis in South Korea: A User-Oriented Approach
, pp.81-85 http://dx.doi.org/10.14257/astl.2014.52.14 Crime Hotspots Analysis in South Korea: A User-Oriented Approach Aziz Nasridinov 1 and Young-Ho Park 2 * 1 School of Computer Engineering, Dongguk
TECHNOLOGY ANALYSIS FOR INTERNET OF THINGS USING BIG DATA LEARNING
TECHNOLOGY ANALYSIS FOR INTERNET OF THINGS USING BIG DATA LEARNING Sunghae Jun 1 1 Professor, Department of Statistics, Cheongju University, Chungbuk, Korea Abstract The internet of things (IoT) is an
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Quality Assessment in Spatial Clustering of Data Mining
Quality Assessment in Spatial Clustering of Data Mining Azimi, A. and M.R. Delavar Centre of Excellence in Geomatics Engineering and Disaster Management, Dept. of Surveying and Geomatics Engineering, Engineering
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
A Survey on Outlier Detection Techniques for Credit Card Fraud Detection
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. VI (Mar-Apr. 2014), PP 44-48 A Survey on Outlier Detection Techniques for Credit Card Fraud
CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA
CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA Professor Yang Xiang Network Security and Computing Laboratory (NSCLab) School of Information Technology Deakin University, Melbourne, Australia http://anss.org.au/nsclab
Environmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
Modeling and Design of Intelligent Agent System
International Journal of Control, Automation, and Systems Vol. 1, No. 2, June 2003 257 Modeling and Design of Intelligent Agent System Dae Su Kim, Chang Suk Kim, and Kee Wook Rim Abstract: In this study,
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
Patent Big Data Analysis by R Data Language for Technology Management
, pp. 69-78 http://dx.doi.org/10.14257/ijseia.2016.10.1.08 Patent Big Data Analysis by R Data Language for Technology Management Sunghae Jun * Department of Statistics, Cheongju University, 360-764, Korea
Stock Trading by Modelling Price Trend with Dynamic Bayesian Networks
Stock Trading by Modelling Price Trend with Dynamic Bayesian Networks Jangmin O 1,JaeWonLee 2, Sung-Bae Park 1, and Byoung-Tak Zhang 1 1 School of Computer Science and Engineering, Seoul National University
IMAV: An Intelligent Multi-Agent Model Based on Cloud Computing for Resource Virtualization
2011 International Conference on Information and Electronics Engineering IPCSIT vol.6 (2011) (2011) IACSIT Press, Singapore IMAV: An Intelligent Multi-Agent Model Based on Cloud Computing for Resource
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm
SIGNATURE VERIFICATION
SIGNATURE VERIFICATION Dr. H.B.Kekre, Dr. Dhirendra Mishra, Ms. Shilpa Buddhadev, Ms. Bhagyashree Mall, Mr. Gaurav Jangid, Ms. Nikita Lakhotia Computer engineering Department, MPSTME, NMIMS University
Principles of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Keywords: Mobility Prediction, Location Prediction, Data Mining etc
Volume 4, Issue 4, April 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Data Mining Approach
A Case Retrieval Method for Knowledge-Based Software Process Tailoring Using Structural Similarity
A Case Retrieval Method for Knowledge-Based Software Process Tailoring Using Structural Similarity Dongwon Kang 1, In-Gwon Song 1, Seunghun Park 1, Doo-Hwan Bae 1, Hoon-Kyu Kim 2, and Nobok Lee 2 1 Department
Text Opinion Mining to Analyze News for Stock Market Prediction
Int. J. Advance. Soft Comput. Appl., Vol. 6, No. 1, March 2014 ISSN 2074-8523; Copyright SCRG Publication, 2014 Text Opinion Mining to Analyze News for Stock Market Prediction Yoosin Kim 1, Seung Ryul
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Impact of Feature Selection on the Performance of ireless Intrusion Detection Systems
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,
Efficient Data Replication Scheme based on Hadoop Distributed File System
, pp. 177-186 http://dx.doi.org/10.14257/ijseia.2015.9.12.16 Efficient Data Replication Scheme based on Hadoop Distributed File System Jungha Lee 1, Jaehwa Chung 2 and Daewon Lee 3* 1 Division of Supercomputing,
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo [email protected],[email protected]
Document Image Retrieval using Signatures as Queries
Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and
Addressing the Class Imbalance Problem in Medical Datasets
Addressing the Class Imbalance Problem in Medical Datasets M. Mostafizur Rahman and D. N. Davis the size of the training set is significantly increased [5]. If the time taken to resample is not considered,
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
HT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
Enhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
Self Organizing Maps for Visualization of Categories
Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, [email protected]
K-Means Clustering Tutorial
K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
Machine Learning with MATLAB David Willingham Application Engineer
Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the
DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE
DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE 1 K.Murugan, 2 P.Varalakshmi, 3 R.Nandha Kumar, 4 S.Boobalan 1 Teaching Fellow, Department of Computer Technology, Anna University 2 Assistant
Developing a Video-based Smart Mastery Learning through Adaptive Evaluation
, pp. 101-114 http://dx.doi.org/10.14257/ijseia.2014.8.11.09 Developing a Video-based Smart Mastery Learning through Adaptive Evaluation Jeongim Kang 1, Moonhee Kim 1 and Seong Baeg Kim 1,1 1 Department
PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA
PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA Prakash Singh 1, Aarohi Surya 2 1 Department of Finance, IIM Lucknow, Lucknow, India 2 Department of Computer Science, LNMIIT, Jaipur,
Keywords data mining, prediction techniques, decision making.
Volume 5, Issue 4, April 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analysis of Datamining
How To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
Measurement of Multi-Port S-Parameters using Four-Port Network Analyzer
JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.13, NO.6, DECEMBER, 2013 http://dx.doi.org/10.5573/jsts.2013.13.6.589 Measurement of Multi-Port S-Parameters using Four-Port Network Analyzer Jongmin
LVQ Plug-In Algorithm for SQL Server
LVQ Plug-In Algorithm for SQL Server Licínia Pedro Monteiro Instituto Superior Técnico [email protected] I. Executive Summary In this Resume we describe a new functionality implemented
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca [email protected] Spain Manuel Martín-Merino Universidad
Combining SOM and GA-CBR for Flow Time Prediction in Semiconductor Manufacturing Factory
Combining SOM and GA-CBR for Flow Time Prediction in Semiconductor Manufacturing Factory Pei-Chann Chang 12, Yen-Wen Wang 3, Chen-Hao Liu 2 1 Department of Information Management, Yuan-Ze University, 2
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement
Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement Toshio Sugihara Abstract In this study, an adaptive
A New Programmable RF System for System-on-Chip Applications
Vol. 6, o., April, 011 A ew Programmable RF System for System-on-Chip Applications Jee-Youl Ryu 1, Sung-Woo Kim 1, Jung-Hun Lee 1, Seung-Hun Park 1, and Deock-Ho Ha 1 1 Dept. of Information and Communications
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: [email protected]
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
A Study of Web Log Analysis Using Clustering Techniques
A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
A Comparison Between Data Mining Prediction Algorithms for Fault Detection (Case study: Ahanpishegan co.)
www.ijcsi.org 425 A Comparison Between Data Mining Prediction Algorithms for Fault Detection (Case study: Ahanpishegan co.) Golriz Amooee 1*, Behrouz Minaei-Bidgoli 2, Malihe Bagheri-Dehnavi 3 1 Department
Using Artificial Intelligence to Manage Big Data for Litigation
FEBRUARY 3 5, 2015 / THE HILTON NEW YORK Using Artificial Intelligence to Manage Big Data for Litigation Understanding Artificial Intelligence to Make better decisions Improve the process Allay the fear
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
Data Mining and Neural Networks in Stata
Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca [email protected] [email protected]
A Novel Approach for Network Traffic Summarization
A Novel Approach for Network Traffic Summarization Mohiuddin Ahmed, Abdun Naser Mahmood, Michael J. Maher School of Engineering and Information Technology, UNSW Canberra, ACT 2600, Australia, [email protected],[email protected],M.Maher@unsw.
Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition
Brochure More information from http://www.researchandmarkets.com/reports/2171322/ Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition Description: This book reviews state-of-the-art methodologies
Classification of Engineering Consultancy Firms Using Self-Organizing Maps: A Scientific Approach
International Journal of Civil & Environmental Engineering IJCEE-IJENS Vol:13 No:03 46 Classification of Engineering Consultancy Firms Using Self-Organizing Maps: A Scientific Approach Mansour N. Jadid
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework
An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework Jakrarin Therdphapiyanak Dept. of Computer Engineering Chulalongkorn University
How To Filter Spam Image From A Picture By Color Or Color
Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among
CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Visualization of large data sets using MDS combined with LVQ.
Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk
A Fast Fraud Detection Approach using Clustering Based Method
pp. 33-37 Krishi Sanskriti Publications http://www.krishisanskriti.org/jbaer.html A Fast Detection Approach using Clustering Based Method Surbhi Agarwal 1, Santosh Upadhyay 2 1 M.tech Student, Mewar University,
Integrated Data Mining and Knowledge Discovery Techniques in ERP
Integrated Data Mining and Knowledge Discovery Techniques in ERP I Gandhimathi Amirthalingam, II Rabia Shaheen, III Mohammad Kousar, IV Syeda Meraj Bilfaqih I,III,IV Dept. of Computer Science, King Khalid
. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns
Outline Part 1: of data clustering Non-Supervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties
Hadoop Operations Management for Big Data Clusters in Telecommunication Industry
Hadoop Operations Management for Big Data Clusters in Telecommunication Industry N. Kamalraj Asst. Prof., Department of Computer Technology Dr. SNS Rajalakshmi College of Arts and Science Coimbatore-49
Grid Density Clustering Algorithm
Grid Density Clustering Algorithm Amandeep Kaur Mann 1, Navneet Kaur 2, Scholar, M.Tech (CSE), RIMT, Mandi Gobindgarh, Punjab, India 1 Assistant Professor (CSE), RIMT, Mandi Gobindgarh, Punjab, India 2
Machine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
A Novel Method to Improve Resolution of Satellite Images Using DWT and Interpolation
A Novel Method to Improve Resolution of Satellite Images Using DWT and Interpolation S.VENKATA RAMANA ¹, S. NARAYANA REDDY ² M.Tech student, Department of ECE, SVU college of Engineering, Tirupati, 517502,
A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan
, pp.217-222 http://dx.doi.org/10.14257/ijbsbt.2015.7.3.23 A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan Muhammad Arif 1,2, Asad Khatak
Fast Device Discovery for Remote Device Management in Lighting Control Networks
J Inf Process Syst, Vol.11, No.1, pp.125~133, March 2015 http://dx.doi.org/10.3745/jips.03.0011 ISSN 1976-913X (Print) ISSN 2092-805X (Electronic) Fast Device Discovery for Remote Device Management in
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
A Dynamic Flooding Attack Detection System Based on Different Classification Techniques and Using SNMP MIB Data
International Journal of Computer Networks and Communications Security VOL. 2, NO. 9, SEPTEMBER 2014, 279 284 Available online at: www.ijcncs.org ISSN 2308-9830 C N C S A Dynamic Flooding Attack Detection
