Defect Analytics in a High-End Server Manufacturing Environment
|
|
|
- Norman Lambert
- 10 years ago
- Views:
Transcription
1 Proceedings of the 2015 Industrial and Systems Engineering Research Conference S. Cetinkaya and J. Ryan, eds. Defect Analytics in a High-End Server Manufacturing Environment Faisal Aqlan Industrial Engineering Department The Pennsylvania State University, The Behrend College Erie, PA Chanchal Saha Department of Systems Science and Industrial Engineering State University of New York at Binghamton Binghamton, NY Sreekanth Ramakrishnan IBM Corporation 1 Rogers St, Cambridge, MA Abstract Server manufacturing is characterized by extensive test processes to ensure high quality and reliability of the servers. Server components are obtained from different suppliers who may have different specifications. Although outsourcing of components provides many potential benefits to the company, it can also cause quality issues. If quality issues are not addressed effectively at the initial stages, defects can transit through the supply chain. Thus, quality control is one of the major challenges for the high-end server manufacturing industries. Defective parts are either disposed, repaired, or returned to the supplier depending on the type of defects. Product quality is ensured through multiple test processes at the manufacturing and design stages are substantially expensive. The defect-related quality test results are stored in different databases in both structured and unstructured data format. In this study, defect analytics models are used for defect assessment of more than 5,000 different defect instances collected from different databases sources of a highend server manufacturing environment. Analytics models including cluster analysis, neural networks, and text mining to characterize and predict the defect root causes and solutions. The proposed defect analytics framework replaced the current manual defect analysis method which is based on trial and error. Keywords Defect analytics, defect characterization, cluster analysis, artificial neural network, text mining, server manufacturing 1. Introduction Data analytics has emerged as one of the main research areas in the last few years. Companies also found data analytics as a big opportunity to utilize for improving their performance. In integrated manufacturing environments such as high-end manufacturing, parts, and components are supplied by different suppliers who may have different specifications. Extensive test processes are required to ensure high quality and treat any defect at the earlier stages. Manual and automated systems have been developed to detect and resolve the defects in such environments. However, even with the automated defect detection systems, defect can still arise in which root causes and solutions are not known. In many manufacturing environments, defect resolutions tend to be based on trial and error. This process consumes time and effort to troubleshoot the defect root causes and to identify proper solutions. The typical characteristics of server manufacturing include aggressive new product introduction cycles, continuous quality improvements, extremely skewed demand patterns, high penalty costs from end-product order fulfillment, lower forecasting accuracies due to nature of production process and long lead times, thin profit margins, and a continuously increasing number of parts and features [1-3]. As a result of these characteristics, extensive test processes to ensure high quality and reliability of the servers are extremely critical in this environment. Quality issues are major disruptions of the operations in the high-end server manufacturing. Defective parts may be disposed, repaired, or
2 returned to the supplier depending on the issue. Removal of defective parts is necessary to protect the company s product, image and reputation, and customer satisfaction. Product quality is ensured through test processes, manufacturing, and design. Since server components are very expensive and they should have high quality, they are tested multiple times in both suppliers sites and manufacturers sites. Figure 1 shows the material flow and test processes for the server manufacturing environment. Major quality risk events can disrupt the smooth flow of products and operations in the supply chain. Quality management is responsible for stopping the flow of defective materials to the customers. Figure 1: Material flow and test processes for server manufacturing environment Defect management in manufacturing environments requires effective identification of the defects, finding the proper solutions for these defects, and providing the required resources and tools to repair the defects. Predicting and preventing the defects or quality issues before they can occur is the focus of quality risk management. Several tools are used for analyzing the defects such as Risk Ranking and Filtering (RRF), Failure Mode and Effect Analysis (FMEA), Hazard and Operability Analysis (HAZOP), and Fault Tree Analysis (FTA). Furthermore, automated systems have been proposed to identify defects and retrieve related solutions from the database. However, these systems do not consider the required skills and resources to solve the problem. The remainder of this paper is organized as follows: Section 2 discusses the literature related to data analytics methods for defect management. Section 3 presents the proposed framework for defect analytics. Section 4 discusses the case study for defect analytics in a high-end server manufacturing environment. Finally, conclusions and recommendations are discussed in Section Literature Review The challenges that are faced by the companies in quality risk management mainly arise from the lack of early defect detection mechasims. Furthermore, the problem is exacerbated by the time consuming yet erronous solutions retrival mechanisms that are currently available. Thus, accurate defect prediction and prompt retrival of defect resolution mechanisms are important for the quality risk management sysem of a company. Many predictive models can be found in literature including discriminant analysis, statistical methods, logistic regression, factor analysis, fuzzy classification, classification trees, Bayesian network, Artificial Neural Networks, support vector machines for defect prediction. It was claimed that NN has been proven to be more effective in prediction compared to statistical tools and expert systems [4]. However, a predective model should be chosen based on the complexity of problem, i.e., types of inputs and outputs (data structure), their relationships, data availability, nature of problems, and expected outcomes. This section presents a thorough review of literature related to the defect detection predective models using structured data, their application areas more specially, application of ANN based models, and scope of defect predection and resolution through through text mining of unstructured data. An intelligent defect analysis framework was proposed that automatically gathers manufacturing process data from all the related databases to determine the root-cause of a process excursion. The proposed model combined both special and temporal data, and analyzed them using artificial intelligence methods. The real-time output was presented through a multi-dimensional cubic structure. Although, the framework outlined an intelligent defect analysis method, however the author did not measure its effectiveness by implementing the model into any real environment [5]. Thus, this study can be extended by conducting performance measures, i.e., survey among the users, accuracy and reliability analysis, time-saving experiments for the proposed framework. An ANN based classification method was proposed to classify software into defect prone and non-defect prone classes. This early defect detection approach compared three algorithms to capture the misclassifications of non-defect prone software considering time and cost metrics. The 2
3 threshold-moving algorithm was claimed to be the most cost-sensitive software for the defect prediction [6]. A data mining approach was proposed to identify the attributes responsible for the defective software modules. This extracted knowledge was applied in defect prediction using a data mining model that is a weighted voting rule of four data mining clustering algorithms, namely Naïve Bayes, ANN, Association Rules, and Decision Tree algorithms [7]. Generalization of the proposed model can be a potential future direction to detect defects in manufacturing processes. A case-based reasoning system was proposed to predict the defects in the Printed Circuit Board (PCB) design. In casebased method, a case database stores all the past defect cases along with their design specifications, defect items, and corresponding costs. The past cases were clustered and ranked using vantage based case indexing mechanism to accelerate the case retrieval efficiency for a new case similar to past cases. Finally, a reasoning algorithm proposed the defect costs for the defective items [8]. Thus, in future, a factorial analysis of the design parameters can be conducted to determine the value of threshold parameters of the reasoning algorithm. Another study proposed a Naïve Bayes classifier based statistical method for defect prediction. The authors recommended to pay more attention to calibrating defect prediction model for that particularproblem rather searching for complex algorithms [9]. ANN is an effective tool for prediction because it can analyze the behavior of a system with certain amount of data to train the system and correlate it with other system parameters. Accurate predictions are important for a Supply Chain Network (SCN), as incorrect prediction not only affects a single stage of a company s Supply Chain (SC) but also the entire SC of that company as well as other stakeholders compancies. As stated earlier, ANN has been proven to be more effective in prediction compared to statistical tools and expert systems [4]. From the users perspective, prediction is probably the most discussed application in ANN domain. ANNs are increasingly used for short and long term demand forecasting and automatic defect predictions for electric loads, energy consumption, pattern recognitions, and stock markets [4, 10]. Reducing total cost in SC has become a crucial issue. Thus, ANNs can help in reducing or eliminating defects that are affecting the production or supply network of an SC by developing better forecasting models. ANNs are used in SC for optimization (logistics management, resource allocation, and scheduling), modeling and simulation (discrete event simulation, dynamic systems theory), defect prediction, globalization (interactions among different activities at different locations), decision support (data query, analysis, and management), and forecasting (any state from one echelon propagate to others in a SC) [10, 11]. However, ANN-based Artificial Intelligence (AI) models are very effective in analyzing only the structured data. Thus, for analyzing unstructured data, attention can be extended to Natural Language Processing (NLP). In current times, many sources including social media, mobile transactions, business networks, scientific experiments as well as operational domains such as healthcare, bioinformatics, finance, manufacturing industries are generating a remarkable amount of data and the amount is increasing rapidly. In response to that, studies on collecting, storing, cleaning, analyzing, and presenting new meaningful and real-time insights of these data have gained tremendous growth. The analytics associated with the big data analysis not only complements traditional statistics, surveys, archival data sources, hypothesis testing but also aim to explore novel patterns or predict future trends from the big data [12, 13]. In the research paradigm of big data analytics, one of the application areas of growing interest is text analytics which can be used for opinion mining and sentiment analysis [14]. In general, sentiment analysis and opinion mining refer to the same techniques that are derived from and based upon NLP, Information Retrieval (IR), Information Extraction (IE), and AI. Typical tasks of sentiment analysis include: (1) finding data relevant to a specific topic or purpose; (2) pre-processing collected data, e.g., summarizing data into single words and extracting relevant information from them; and (3) identifying the sentiment surrounding a product or service [15]. Sentiment analysis technologies, a special type of text mining, can be applied for extracting opinions and sentiments from unstructured human-authored documents [16]. Thus, NLP can be an excellent tool for handling many business intelligence tasks including reputation management, public relations, defect prediction and resolutions, tracking public viewpoints, as well as market trend prediction. In NLP, sentiment analysis takes the challenge of classifying the orientation of texts either into positive or negative to help the machines understand texts similar to human. The texts are analyzed at different levels, such as, word or phrase, sentence, document level or user level. Word level sentiment analysis explore the orientation of the words or phrases in the text as well as their effect on the overall sentiment, while sentence level expresses a single opinion and tries to define its orientation from sentences. The document level opinion mining looks at the overall sentiment of the whole document, and user level sentiment searches for the possibility that connected users on the social network could have the same opinion [17]. Three different approaches, namely machine learning approach, lexicon based, and linguistic analysis are found to be applied in sentiment analysis to classify texts. Machine learning methods are based on training an algorithm, mostly classification on a set of selected features for a specific mission and then test on another set whether it is able to detect the right features and give the right classification. Naïve Bayes, maximum entropy and SVM are used as sentiment 3
4 classifiers in this method. A lexicon based method depends on a predefined list or corpus of words with a certain polarity. An algorithm is then searching for those words, counting them or estimating their weight and measuring the overall polarity of the text. Lastly, the linguistic approach uses the syntactic characteristics of the words or phrases, the negation, and the structure of the text to determine the text orientation. This approach is usually combined with a lexicon based method [17, 18]. A study was conducted to find the relationship between public sentiment and stock market price using Twitter streams. They proposed an active learning approach using Support Vector Machine (SVM) classifier to query the news feed of the Twitter streams as an active learning process for the sentiment analysis [19]. Their proposed model was able to predict the stock market price movements a few days in advance. Another study also applied SVM to classify the topics for sentiment analysis [18]. The authors claimed that pre-processing of texts using SVM can improve the accuracy of the results. Many studies can be found on defect prediction and resolution that applied structured data in risk management. However, there are limited studies available considering both structured and unstructured data format for model development. To the best of the authors knowledge, none of the previous study applied both data format for defect prediction and resolution. Therefore, in this study, an initiative is taken to propose a defect analytics framework for defect prediction and resolution considering structured and unstructured data format. 3. Proposed Framework for Defect Analytics The proposed framework for defect analytics utilizes both structured and unstructured data for defect characterization and assessment. Figure 2 shows the proposed framework in which analytics models that are used to predict and resolve the defects. For the unstructured data, the individual defect files are kept together to form the corpus, which is a collection of documents. Text analytics models are then used to characterize the defects. Predictive analytics models are also used to characterize the defects based on the structured data. The output both text analytics and predictive analytics models are then used to predict defect root cause and potential solutions. Figure 2: Proposed defect analytics framework 3.1 Unstructured Data Analytics Unstructured data analytics is used to characterize and classify the defects. The proposed framework for defect unstructured data analytics is shown in Figure 3. The unstructured data framework consists of the following steps: 4
5 1. Documents collection step collects documents that include the unstructured data on defects 2. Text analysis and concept extraction step analyze text using NLP 3. Text link analysis step identifies relationships between the concepts using pattern matching 4. Building defect categories relies on the extracted concepts from the text link analysis. In this step, a clustering method is used to cluster the defect into categories based on the similarities in the extracted concepts 5. Defect characterization step in which the defects are characterized based on the concepts in each category Figure 3: Unstructured data analytics for defect assessment 3.2 Structured Data Analytics Structured data are used to predict defect root causes and potential solutions using the ANN. Structured data analytics consists of two main steps: 1) predicting the root cause of the defect and 2) predicting the potential solutions of the defect. For predicting the root causes, the main defect attributes that are used as inputs include: defect type, product characteristics, production environment variables, and the defect categories obtained by the text analytics model. For predicting the potential solutions, the attributes considered as inputs are the resource attributes and the predicted root causes. The proposed ANN structure for the defect root cause and solution prediction is shown in Figure 4. Examples of defect attributes that are used as inputs for root cause predictions include: part type, part size or capacity, and part supplier. Examples of production environment variables include: production stage and time-to-failure. Examples of resource availability that is used as an input for solution prediction include: available spare parts for repair, cost of disposal, etc. Figure 4: ANN based approach for predicting defect root cause and solution 4. Case Study: Defect Analytics in High-End Server Manufacturing Environment Server manufacturing environment is relatively complex and it is prone to many quality problems that could be caused by external suppliers and internal processes. Since the server manufacturing environment requires extremely high reliability and quality assurance, thus their test processes are expected to be very accurate and downtimes free. Figure 5
6 5 shows a high level overview of the main stages of the high-end server production process that is considered in this study. In this production process, there are three test stages: panel test, assembly or fabrication test, and fulfillment test. Structured and unstructured data of 5,000 defects data points of the three test stages were collected from different databases. Figure 5: Process flow of high-end server manufacturing The part considered in this study is the Memory Card which is also known as known as Dual In-line Memory Module (DIMM). The process flow of the DIMM inspection, test, and assembly is shown in Figure 6. Non-value added processes were highlighted with red frame while value-added processes were lighted with green frame. The figure shows the assembly and test processes that are performed on the DIMMs and the different movements of the DIMMS between the inventory and production area locations. The defect analytics framework is implemented using IBM SPSS Modeler software. The analytics models for defect root cause and solution prediction are shown in Figure 7. The unstructured data were characterized using the concept of text mining analytics. The text mining analytics model uses linguistic and frequency techniques of NLP methods to extract the key concepts from the unstructured data and categorize the data according to its concepts and patterns. The text mining model extracted 479 concepts by analyzing the unstructured data. By careful observation of extracted concepts usage percentage and technical importance, 179 key concepts were selected for cluster analysis. The concept-wise categorized unstructured data were clustered using the two-step clustering method. Two-step clustering algorithm was used due to its ability to handle mixed data types and larger data sets efficiently. In addition, the twostep clustering algorithm has the advantage of automatically decide the optimal number of clusters. Therefore, the clustering algorithm clustered 179 key concepts into 15 clusters. The selection of 15 clusters gives the best cluster quality which is measured by the Silhouette index. The obtained value of Silhouette index was 0.7 which means a good clustering quality. Root cause and solution prediction models are developed using ANN models. The output (15 clusters) of two-step clustering algorithm (obtained from the concept extraction of unstructured data using text mining) along with structured data were combined using Defect IDs. The combined data were used as inputs to train and test the ANN models for the root cause and solution prediction. Figure 8 shows that the accuracy rates of ANN models for both root cause and solution predictions are 86% and 74.4%, respectively. However, in absence of unstructured data, the accuracy rates of the ANN models for root cause and solution predictions are 75.7% and 50%, respectively. Therefore, inclusion of unstructured data for defect assessment increased the root cause and solution predictions accuracies by 14% and 49%, respectively. 6
7 Figure 6: Process flow for DIMMs Figure 7: Analytics model for defect assessment 7
8 Figure 8: Accuracy of the ANN models for the root cause and solution predictions 5. Conclusions and Future Work In this study, an analytics based framework is proposed for defect assessment in a high-end server manufacturing environment. Both structured and unstructured data were utilized to build prediction and assessment models for the defects. Identifying causes of defects and proposing solutions using the proposed framework is found to be very effective for sever manufacturing environment for early detection of production related faults. The performance levels of the analytics used in this framework are 86% and 74.4% for root cause prediction and solution prediction, respectively. The proposed defect analytics framework replaces the current manual defect analysis method which is based on trial and error. It plays a significant role to predict the defect characteristics and root causes using historical data and could be incorporated into the decision support system of the server manufacturing environment. There are several avenues that future research could follow to overcome the limitations of the proposed models. Efforts can be made to increase the accuracy levels of the model parameters by conducting Design of Experiment. Furthermore, a larger set of data as well as data from other defect prone sectors can be analyzed by adjusting the proposed framework model parameters. References 1. Ramakrishnan, S., Tsai, P.-F., Srihari, K., and Foltz, C., 2008, Using Design of Experiments and Simulation Modeling to Study the Facility Layout for a Server Assembly Process, Proc. of the 2008 Industrial Engineering Research Conference, May 17-21, Vancouver, BC, Cao, H., Xi, H., and Smith, S.F., 2003, A Reinforcement Learning Approach to Production Planning in the Fabrication/Fulfillment Manufacturing Process, Proc. of the 35th Winter Simulation Conference, December 7-10, New Orleans, LA, Lendermann, P., 2006, About the Need for Distributed Simulation Technology for the Resolution of Real- World Manufacturing and Logistics Problems, Proc. of the 2006 Winter Simulation Conference, December 3-6, Monterey, CA, Efendigil, T., Önüt, S., and Kahraman, C., 2009, A Decision Support System for Demand Forecasting with Artificial Neural Networks and Neuro-fuzzy Models: A Comparative Analysis, Expert Systems with Applications, 36(1), Siglaz, 2011, Intelligent Defect Analysis, Framework for Integrated Data Management, Available at Accessed Decemmber 26, Zheng, J., 2010, Cost-sensitive Boosting Neural Networks for Software Defect Prediction, Expert Systems with Applications, 37(6), Yousef, A.H., 2014, Extracting Software Static Defect Models Using Data Mining, Ain Shams Engineering Journal, 6(1),
9 8. Tsai, C.-Y., Chiu, C-C., and Chen, J.-S., 2005, A Case-based Reasoning System for PCB Defect Prediction, Expert Systems with Applications, 28(4), Tosun, A., Bener, A., Turhan, B., and Menzies, T., 2010, Practical Considerations in Deploying Statistical Methods for Defect Prediction: A Case Study within the Turkish Telecommunications Industry, Information and Software Technology, 52(11), Mirapeix, J., García-Allende, P.B., Cobo, A., Conde, O.M., López-Higuera, J.M., 2007, Real-Time Arc- Welding Defect Detection and Classification with Principal Component Analysis and Artificial Neural Networks, NDT & E International, 40(4), Leung, H. C., 1995, Neural Networks in Supply Chain Management, Proc. of IEEE Annual International Engineering Management Conference, June 28-30, George, G., Haas, M.R., and Pentland, A., 2014, Big Data and Management, Academy of Management Journal, 57(2), Aiden, E., and Michel, J.-B., Dec , The Predictive Power of Big Data, Newsweek, Available at Accessed November 15, Pang, B. and Lee, L., 2008, Opinion Mining and Sentiment Analysis, Foundations and Trends in Information Retrieval, 2(1-2), Schmunk, S., Höpken, W., Fuchs, M., and Lexhagen, M., 2013, Sentiment Analysis: Extracting Decisionrelevant Knowledge from UGC. In: Xiang, Z., Tussyadiah, I. (Eds.), Information and Communication Technologies in Tourism Springer Inter-national Publishing, New York, NY, Choudhary, A.K., Oluikpe, P.I., Harding, J.A., and Carrillo, P.M., 2009, The Needs and Benefits of Text Mining Applications on Post-Project Reviews, Computers in Industry, 60(9), Tan, C., Lee, L., Tang, J., Jiang, L., Zhou, M., and Li, P., 2011, User-level Sentiment Analysis Incorporating Social Networks, Proc. of the 17 th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, Haddi, E, Liu, X., and Shi, Y., 2013, The Role of Text Pre-processing in Sentiment Analysis, Information and Quantitative Management, 17(1), Smailović, J., Grčar, M., Lavrač, N., and Žnidaršič, M., 2014, Stream-based Active Learning for Sentiment Analysis in the Financial Domain, Information Sciences, 285(1),
Sentiment analysis on tweets in a financial domain
Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams
2012 International Conference on Computer Technology and Science (ICCTS 2012) IPCSIT vol. XX (2012) (2012) IACSIT Press, Singapore Using Text and Data Mining Techniques to extract Stock Market Sentiment
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Hexaware E-book on Predictive Analytics
Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,
Text Opinion Mining to Analyze News for Stock Market Prediction
Int. J. Advance. Soft Comput. Appl., Vol. 6, No. 1, March 2014 ISSN 2074-8523; Copyright SCRG Publication, 2014 Text Opinion Mining to Analyze News for Stock Market Prediction Yoosin Kim 1, Seung Ryul
Research of Postal Data mining system based on big data
3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
Chapter ML:XI. XI. Cluster Analysis
Chapter ML:XI XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster
TEXT ANALYTICS INTEGRATION
TEXT ANALYTICS INTEGRATION A TELECOMMUNICATIONS BEST PRACTICES CASE STUDY VISION COMMON ANALYTICAL ENVIRONMENT Structured Unstructured Analytical Mining Text Discovery Text Categorization Text Sentiment
Sentiment analysis of Twitter microblogging posts. Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies
Sentiment analysis of Twitter microblogging posts Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies Introduction Popularity of microblogging services Twitter microblogging posts
How To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China [email protected] [email protected]
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015
Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015
Class Imbalance Learning in Software Defect Prediction
Class Imbalance Learning in Software Defect Prediction Dr. Shuo Wang [email protected] University of Birmingham Research keywords: ensemble learning, class imbalance learning, online learning Shuo Wang
The Big Data methodology in computer vision systems
The Big Data methodology in computer vision systems Popov S.B. Samara State Aerospace University, Image Processing Systems Institute, Russian Academy of Sciences Abstract. I consider the advantages of
DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate
Introduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin
Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
Cleaned Data. Recommendations
Call Center Data Analysis Megaputer Case Study in Text Mining Merete Hvalshagen www.megaputer.com Megaputer Intelligence, Inc. 120 West Seventh Street, Suite 10 Bloomington, IN 47404, USA +1 812-0-0110
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends
Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends Spring 2015 Thomas Hill, Ph.D. VP Analytic Solutions Dell Statistica Overview and Agenda Dell Software overview Dell in
Random forest algorithm in big data environment
Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest
CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES
International Journal of Scientific and Research Publications, Volume 4, Issue 4, April 2014 1 CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES DR. M.BALASUBRAMANIAN *, M.SELVARANI
Sentiment Analysis on Big Data
SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social
Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management
Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Paper Jean-Louis Amat Abstract One of the main issues of operators
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
Application of Business Intelligence in Transportation for a Transportation Service Provider
Application of Business Intelligence in Transportation for a Transportation Service Provider Mohamed Sheriff Business Analyst Satyam Computer Services Ltd Email: [email protected], [email protected]
Towards applying Data Mining Techniques for Talent Mangement
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Towards applying Data Mining Techniques for Talent Mangement Hamidah Jantan 1,
Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics
Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Please note the following IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice
Maximizing Return and Minimizing Cost with the Decision Management Systems
KDD 2012: Beijing 18 th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Rich Holada, Vice President, IBM SPSS Predictive Analytics Maximizing Return and Minimizing Cost with the Decision Management
A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH
205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5
A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks
A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO Agenda Difficult text analytics tasks Feature extraction Bio-inspired
Analyzing Customer Churn in the Software as a Service (SaaS) Industry
Analyzing Customer Churn in the Software as a Service (SaaS) Industry Ben Frank, Radford University Jeff Pittges, Radford University Abstract Predicting customer churn is a classic data mining problem.
Forecasting stock markets with Twitter
Forecasting stock markets with Twitter Argimiro Arratia [email protected] Joint work with Marta Arias and Ramón Xuriguera To appear in: ACM Transactions on Intelligent Systems and Technology, 2013,
Data Mining System, Functionalities and Applications: A Radical Review
Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo [email protected],[email protected]
COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
Equity forecast: Predicting long term stock price movement using machine learning
Equity forecast: Predicting long term stock price movement using machine learning Nikola Milosevic School of Computer Science, University of Manchester, UK [email protected] Abstract Long
IT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
Course Syllabus For Operations Management. Management Information Systems
For Operations Management and Management Information Systems Department School Year First Year First Year First Year Second year Second year Second year Third year Third year Third year Third year Third
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
Online Content Optimization Using Hadoop. Jyoti Ahuja Dec 20 2011
Online Content Optimization Using Hadoop Jyoti Ahuja Dec 20 2011 What do we do? Deliver right CONTENT to the right USER at the right TIME o Effectively and pro-actively learn from user interactions with
Statistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
Neural Networks for Sentiment Detection in Financial Text
Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.
Social Media Implementations
SEM Experience Analytics Social Media Implementations SEM Experience Analytics delivers real sentiment, meaning and trends within social media for many of the world s leading consumer brand companies.
Big Data. Fast Forward. Putting data to productive use
Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize
Role of Social Networking in Marketing using Data Mining
Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:
Prerequisites. Course Outline
MS-55040: Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot Description This three-day instructor-led course will introduce the students to the concepts of data mining,
ANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
Data Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
Using News Articles to Predict Stock Price Movements
Using News Articles to Predict Stock Price Movements Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 9237 [email protected] 21, June 15,
MS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
Document Image Retrieval using Signatures as Queries
Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and
The Big Data Paradigm Shift. Insight Through Automation
The Big Data Paradigm Shift Insight Through Automation Agenda The Problem Emcien s Solution: Algorithms solve data related business problems How Does the Technology Work? Case Studies 2013 Emcien, Inc.
A Big Data Analytical Framework For Portfolio Optimization Abstract. Keywords. 1. Introduction
A Big Data Analytical Framework For Portfolio Optimization Dhanya Jothimani, Ravi Shankar and Surendra S. Yadav Department of Management Studies, Indian Institute of Technology Delhi {dhanya.jothimani,
A Proposed Prediction Model for Forecasting the Financial Market Value According to Diversity in Factor
A Proposed Prediction Model for Forecasting the Financial Market Value According to Diversity in Factor Ms. Hiral R. Patel, Mr. Amit B. Suthar, Dr. Satyen M. Parikh Assistant Professor, DCS, Ganpat University,
SURVEY REPORT DATA SCIENCE SOCIETY 2014
SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Master s Program in Information Systems
The University of Jordan King Abdullah II School for Information Technology Department of Information Systems Master s Program in Information Systems 2006/2007 Study Plan Master Degree in Information Systems
Software Defect Prediction for Quality Improvement Using Hybrid Approach
Software Defect Prediction for Quality Improvement Using Hybrid Approach 1 Pooja Paramshetti, 2 D. A. Phalke D.Y. Patil College of Engineering, Akurdi, Pune. Savitribai Phule Pune University ABSTRACT In
Intrusion Detection via Machine Learning for SCADA System Protection
Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. [email protected] J. Jiang Department
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Data Isn't Everything
June 17, 2015 Innovate Forward Data Isn't Everything The Challenges of Big Data, Advanced Analytics, and Advance Computation Devices for Transportation Agencies. Using Data to Support Mission, Administration,
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Industrial Roadmap for Connected Machines. Sal Spada Research Director ARC Advisory Group [email protected]
Industrial Roadmap for Connected Machines Sal Spada Research Director ARC Advisory Group [email protected] Industrial Internet of Things (IoT) Based upon enhanced connectivity of this stuff Connecting
Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA
Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA ABSTRACT Current trends in data mining allow the business community to take advantage of
Index Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
APPLICATION OF DATA MINING TECHNIQUES FOR THE DEVELOPMENT OF NEW ROCK MECHANICS CONSTITUTIVE MODELS
APPLICATION OF DATA MINING TECHNIQUES FOR THE DEVELOPMENT OF NEW ROCK MECHANICS CONSTITUTIVE MODELS T. Miranda 1, L.R. Sousa 2 *, W. Roggenthen 3, and R.L. Sousa 4 1 University of Minho, Guimarães, Portugal
CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode
A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode Seyed Mojtaba Hosseini Bamakan, Peyman Gholami RESEARCH CENTRE OF FICTITIOUS ECONOMY & DATA SCIENCE UNIVERSITY
Business Intelligence and Decision Support Systems
Chapter 12 Business Intelligence and Decision Support Systems Information Technology For Management 7 th Edition Turban & Volonino Based on lecture slides by L. Beaubien, Providence College John Wiley
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Predictive Modeling for Collections of Accounts Receivable Sai Zeng IBM T.J. Watson Research Center Hawthorne, NY, 10523. Abstract
Paper Submission for ACM SIGKDD Workshop on Domain Driven Data Mining (DDDM2007) Predictive Modeling for Collections of Accounts Receivable Sai Zeng [email protected] Prem Melville Yorktown Heights, NY,
E-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee [email protected] Seunghee Ham [email protected] Qiyi Jiang [email protected] I. INTRODUCTION Due to the increasing popularity of e-commerce
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information
A HYBRID RULE BASED FUZZY-NEURAL EXPERT SYSTEM FOR PASSIVE NETWORK MONITORING
A HYBRID RULE BASED FUZZY-NEURAL EXPERT SYSTEM FOR PASSIVE NETWORK MONITORING AZRUDDIN AHMAD, GOBITHASAN RUDRUSAMY, RAHMAT BUDIARTO, AZMAN SAMSUDIN, SURESRAWAN RAMADASS. Network Research Group School of
INTELLIGENT DEFECT ANALYSIS, FRAMEWORK FOR INTEGRATED DATA MANAGEMENT
INTELLIGENT DEFECT ANALYSIS, FRAMEWORK FOR INTEGRATED DATA MANAGEMENT Website: http://www.siglaz.com Abstract Spatial signature analysis (SSA) is one of the key technologies that semiconductor manufacturers
Pentaho Data Mining Last Modified on January 22, 2007
Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
A.I. in health informatics lecture 1 introduction & stuff kevin small & byron wallace
A.I. in health informatics lecture 1 introduction & stuff kevin small & byron wallace what is this class about? health informatics managing and making sense of biomedical information but mostly from an
Data Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University [email protected] Chetan Naik Stony Brook University [email protected] ABSTRACT The majority
Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016
Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00
IBM SPSS Modeler Premium
IBM SPSS Modeler Premium Improve model accuracy with structured and unstructured data, entity analytics and social network analysis Highlights Solve business problems faster with analytical techniques
