Applying Classifier Algorithms to Organizational Memory to Build an Attrition Predictor Model
|
|
|
- Rose Nash
- 10 years ago
- Views:
Transcription
1 Applying Classifier Algorithms to Organizational Memory to Build an Attrition Predictor Model K. M. SUCEENDRAN 1, R. SARAVANAN 2, DIVYA ANANTHRAM 3, Dr.S.POONKUZHALI 4, R.KISHORE KUMAR 5, Dr.K.SARUKESI 6 1 Research Scholar, Hindustan University, & Senior Consultant, Tata Consultancy Services, Chennai. 2 Consultant, 3 Business Analyst, TCS, Chennai. 4 Professor, 5 Assistant Professor, Dept. of IT, Rajalakshmi Engineering College, Chennai. 6 Dean, Kamaraj College of Engineering & Technology, Virudhunagar. INDIA. [email protected] 1, [email protected] 2, [email protected] 3, [email protected] 4, [email protected] 5, [email protected] 6 Abstract: Employee Engagement is a strong factor in driving the growth of a business or an organization. Employees who are committed to their respective organizations and are constantly motivated to work, contribute effectively to the growth of the organization s business. On the other hand, disengaged employees are capable of digressing from organizational goals and influencing the others around too, impacting efficiency in all quarters. In spite of monetary benefits, there are other psychological and situational factors that impact an employee s level of engagement. This paper discusses how exit interview details of employees can be combined with organizational memory to classify causative factors triggering attrition and construct digital persona of attrition-likely employees. In this paper, eight classifier algorithms are used to create and classify specific attrition clusters. These algorithms are evaluated on the basis of precision, recall, F-measure and ROC measure. The digital persona thus formed from the clusters can be applied to the entire workforce to identify attrition-likely associates. Additionally, an appropriate engagement strategy can be formulated to suit the possible reasons based on the classifier algorithm. Key Words: Attrition, Classification, Digital Profile, Organizational Memory, Tacit Knowledge. 1 Introduction Employee connect is a critical attribute of the relationship between an employee and the enterprise. The research on employer-employee relationship is not new and the term employee engagement was defined in the early 1990s. William Kahn provided the first formal definition of employee engagement as the harnessing of organization members selves to their work roles; in engagement, people employ and express themselves physically, cognitively, and emotionally during role performances [6]. Over the years, the emphasis and perspective on this relationship have undergone many changes: In the 70 s it started with satisfaction which shifted to commitment in the 80 s. In the 90s, the process of engaging employees became the focus of HR practitioners, particularly in the knowledge enterprises[15]. The imperative of demonstrating the value of HR gave impetus to many studies to correlate employee engagement with tangible business outcomes. Many organizations with large workforce comprising knowledge workers are confronted with the challenge of effectively leveraging employee engagement. Apart from monetary benefits and incentives, there are other psychological and situational factors that affect an employee s level of engagement [12]. A recent study covering 142 countries indicates that only 13 percent of employees across the globe are engaged at work. This translates into only about one in eight employees is actually committed to his job[5]. This study from Gallup reveals that most employees, 63 percent, are not engaged and 24 percent are actively disengaged. Furthermore, this study also speaks of the employee engagement levels with respect to the Indian scenario. The study says only 9% of employees are engaged, while 31% are actively disengaged. However there is large variation in engagement levels when comparing Indian employees of different job types and education background. Approximately 17-18% of people holding supervisorial, managerial, professional or sales related roles are engaged in their work[5]. Effective employee engagement depends on the successful connect between employees and organizational representatives including supervisors, senior leaders, HR personnel[7]. Consistent and effective communication with the employees is imperative for ISBN:
2 employee engagement and retention[14]. Challenges exist in the form of nature and scale of workforce, type of connects, degree of subjectivity inherent in the process and more importantly, not leveraging the organizational memory about the employee involved in the connect. Employee Connects refers to various situations where an employee has an opportunity to connect and discuss with an HR personnel or a leader/supervisor. In this paper we have considered the exit interview data as our connects. Exit interviews and discussion with the HR as part of separation process are done so as to understand the reasons of separation of the associate from the company. These connects being the last in an associate s tenure with the company will bring out opinions and feedback from the associate and the reasons will be useful for analysts to study trends of attrition and arrest the same effectively[11]. In this paper, exit interview details from a leading Information Technology organization in India are taken as input data. The data is used for analyzing the various classification techniques using WEKA data mining tool[16]. In this evaluation process, different features are considered for choosing best algorithm which classifies the employees into different categories of profiles, according to the reason stated for leaving the organization. Finally performance evaluation is done to analyze the various classification algorithms to select the best classifier for employee exit details. Outline of the paper: Section 2 presents related works on employee attrition analysis using data mining algorithms. Section 3 represents the Need for embedding K Elements in HR processes. Section 4 represents the framework of the proposed system, Section 5 gives results and discussion and Section 6 is the conclusion. 2 Related Works Alao D. & Adeyemo A. B. analyzed the role of Decision tree algorithms in their research on employee attrition. Employees demographic data and work related data were used to classify employee details into attrition classes [1]. Waikato Environment for Knowledge Analysis (WEKA) and C4.5 for Windows were used to generate decision tree models and rule-sets. Decision trees used in data mining are of two main types namely, Classification tree analysis, where predicted outcome is the class to which the data belongs and Regression tree analysis, where predicted outcome can be considered a real number. The term Classification and Regression Tree (CART) analysis refers to the above procedures. Alao et al, used the classifiers C4.5(J48), REPTree and CART (SimpleCart) decision tree algorithms. Attribute importance analysis was used to determine the significance of attributes. F-measure and the AUC probability tree learning measures were used as evaluation metrics for the classifier models generated[1]. Decision trees are more flexible data modeling techniques than neural networks. Missing data values are managed by decision tree algorithms and models can be built with missing values, whereas in neural network and regression models, missing values need to be inserted. Jayanthi et al (2008) presented the role of data mining in Human Resource Management Systems (HRMS). The paper explains about datamining techniques and the ability of these algorithms to improve decision making processes in HRMS, so as to sustain competitive advantage of the organization among competitors[4]. Hamidah et al suggested that in future works, attribute reduction should be conducted to identify relevant attributes for each of the factor[2]. In the study Employee Turnover Analysis with Application of Data Mining Methods, the analysis was carried out by use of decision trees, logistic regression and neural networks. The data models were built using SEMMA methodology. SEMMA (Sample, Explore, Modify, Model, and Assess) was developed by the SAS Institute.[13] Hsin Yun Chang, in the study on Employee Turnover, explains about feature selection, also known as feature subset selection. Feature selection is a common data preprocessing method. Yun Chang uses a mixed feature subset selection method in the study combining Taguchi method and Nearest Neighbour Classification rules to select and analyze factors and find best predictor of employee attrition[3]. S.Poonkuzhali et al. anayzed various classification algorithm for predicting efficient classifier for T53 Mutants[9]. R.Kishore Kumar et al. performed comparative analysis for predicting best classifiers for s spam[8]. 3 Need for Embedding K-Elements and Creating Digital Persona With many of the non-core HR functions getting progressively digitized or outsourced, the no. of HR personnel actively involved in employee engagement is few in comparison to the size of the workforce. Knowledge-intensive organizations such as IT companies have one HR associate for every 1000 employees. With such a skewed ratio, HR associates meet ISBN:
3 Figure 1: K- Elements framework in Employee Connects only a portion of the total associates, the number in accordance with their performance targets. Associates chosen by HR associates is mostly arbitrary and by convenience. There is no definitive process which is derived from the business imperatives or an enabling system which presents HR persons with a complete digital profile. In our earlier publication Embedding K-Elements in HR Processes, it was highlighted that data obtained from HR processes is not consolidated or seldom used in real time[10]. Though HR increasingly plays the role of a strategic partner and contributes also to operational management, there is often no formal framework for capturing, consolidating, and continually refining tacit data arising from HR touch points. Major HR touch points are mentoring, career counseling, staffing, grievance sessions, and exit interviews. At each of these interactions, pertinent tacit details about the employee are elicited and used in the context of the process. These tacit details are mostly soft data and are subjective and time-specific in nature. The paper proposes to select set of associates in line with a specific business area in order to identify sample profiles for the business area. Example of business areas include predicting associates for attrition (with respect to this paper), Highpotentials, transfer, counseling and associates likely to become disenchanted or potentials for right-sizing. 4 Framework for the Proposed System The K-Elements framework in Employee Connects is shown in Figure 1. Employees who initiate separation from the organization, are reached to in exit interview sessions. Here the feedback of the employee regarding the organization and the primary reason for leaving are discussed. These tacit details are recorded by the HR and consolidated along with the employees demographic and job related details. These exit details for a certain period are generated and preprocessed for removing inconsistencies and missing values. Classifier algorithms are applied onto the data and the employee profiles are classified. Once the best classifier algorithms are identified, and applied to the data, the attributes of employees falling under different categories and groups can be studied. The employee exit details are analyzed on the basis of their reason furnished for leaving the organization and the classifier algorithm categorizes the profiles into relevant groups. A few sample profiles representing each particular group can be obtained. Thus, the sample profiles of employees most likely to exit the company can be constructed. These digital personae or sample profiles, can be used as a reference to select the next set of employees to conduct employee connect sessions, employees whose profiles are similar to the digital profiles identified by the classifier algorithm. 5 Results and Discussion Employee exit interview details are collected from a leading IT organization, during the year 2013 which consist of 2572 records with 13 attributes (12-input attributes,1-target attribute) and are listed in Table 1. The Employee exit dataset is loaded into the WEKA data mining tool after preprocessing. Then 8 classification algorithms namely ID3, J48, Naïve Bayes, Bayesian Network, K Star, IBK, Random tree and Random Forest are applied. The results of these 8 classification algorithms are depicted in Table 2. The results portrayed exhibit accuracy, precision, recall, F-measure and ROC value. The performance of all these classifiers are analyzed based on the accuracy to predict the best classifier. Here, IBK an instance ISBN:
4 based classifier and Random tree an ensemble classifiers have been identified as a best classifier for this Employee exit interview dataset as they classified with an accuracy of 97.74%. Table 1: Attribute List of Exit Interview Data Set S.No Attribute List 1 Gender 2 Grade 3 Age 4 Designation 5 Business Unit 6 Resignation Quarter 7 Account/Customer 8 Total Experience 9 Performance Rating on confirmation 10 Last Performance Rating 11 Qualification 12 Role 13 Primary reason for leaving the organization Table 2: Results of various classifiers Classifiers Accur Preci Reca F_M ROC ID J Naive Bayes Bayes Net KStar IBK R_Tree R_Forest Legend Accur Preci Reca F_M ROC R_Tree R_Forest Accuracy Precision Recall F-Measure Receiver Operating Characteristic Random Tree Random Forest Once the best classifier algorithms are identified, and applied on to the data, the attributes of employees falling under different categories and groups can be studied. The employee exit details are analyzed on the basis of their reason furnished for leaving the organization and the classifier algorithm categorizes the profiles into relevant groups. Thus, the sample profiles of employees most likely to exit the company can be deduced and formed. These digital personae or sample profiles, can be used as a reference to target the next set of employees to conduct employee connect sessions. Employees whose digital persona match with the attrition likely profile can be identified and connected with for proactive engagement. Thus randomness and unstructuredness of the Employee Connects session can be mitigated and potential attrition can be arrested considerably. 5.1 IBK(K-Nearest Neighbor): IBK is a K-Nearest Neighbor classifier that uses that same distance metric for classification. It is an instance based classifier in which the class of the test instance is based on the class of those training instances similar to it as determined by the similarity function based on distance. This algorithm uses normalized distances for all attributes. An object is classified by a majority vote of its neighbor, with the object being assigned to the class most common among its K nearest neighbors. Pseudo for K-Nearest Neighbor Classification Algorithm: Input: HRMS Training dataset. Output: Decision tree. Step 1: Determine the parameter K. Step 2: Compute the distance between the queryinstance and all the training attributes using Euclidean distance algorithm. D(q, p) = n (q i p i ) 2 (1) i=0 Step 3: Sort the training records based on the distance. Step 4: Find the nearest neighbor based on the Kth minimum distance. Step 5: Get all the Categories of the training data for the sorted value which fall under K. Step 6: Predict the measured value using the majority of nearest neighbors. 5.2 Random Tree: Random tree is an ensemble classifier that consist of many class of many decision trees and outputs the class that is the mode of class output by individual trees. Each tree is fully grown and not pruned. It runs efficiently on large data bases. For predicting a new record is pushed down the tree. It is assigned the label ISBN:
5 of the training sample in the terminal node it ends up in. This procedure is iterated over all trees in the ensemble, and the average vote of all trees is reported as random forest prediction. Pseudo for Random Tree Algorithm: Input: HRMS Training dataset. Output: Stereotyping classification of Employees. Step 1: For 1 to N do (N -Number of employee records in HRMS dataset D) Step 2: Select m input attributes at random from the n total number of attributes in dataset D. Step 3: Find the best spilt point among the m attributes according to a purity measure based on Gini index. G = 2 n iy i i=1 n n i=1 n + 1 y n i (2) Step 4: Spilt the node into two different nodes on the basis of split point. Step 5: Repeat the above steps for different set of records to construct possible decision trees. Step 6: Ensemble all constructed trees into single forest for classifying stereotyping digital profiles of HRMS data. 5.3 Sample Rules from Random Tree Classifier If Designation = Middle Level Manager1 and Gender = Male and Resignation Quarter = QN YYYY and Rating on Confirmation = Average and Role = Design Lead then class = Better career opportunities A. Accuracy Accuracy of a classifier was defined as the percentage of the dataset correctly classified by the method. The accuracy of all the classifiers used for classifying employee exit interview dataset dataset is represented graphically in Figure 2. Accuracy = No. of correctly classified samples T otal No. of samples in the class (3) Figure 2: Accuracy of various Classification Algorithms B. Recall Recall of the classifier was defined as the percentage of errors correctly predicted out of all the errors that actually occurred. Recall = C. Precision T ruep ositive T ruep ositive + F alsenegative (4) Precision of the classifier was defined as the percentage of the actual errors among all the encounters that were classified as errors. T ruep ositive P recision = T ruep ositive + F alsep ositive (5) D. F-Measure F-Measure is defined as the measure of a test s accuracy and can be interpreted as a weighted average of the precision and recall. F Measure = 2 E. ROC P recision Recall P recision + Recall (6) Receiver Operating Characteristic curve (or ROC curve) provides a graphical plot of the true positive rate against the false positive rate for the different possible cutpoints of a test. Since the curve follows the left-hand border and then the top border of the ROC space, the classifier is more accurate. The terms positive and negative refer to the classifier s prediction, ISBN:
6 and the terms true and false refer to classifier s expectation. The precision, recall, F-measure and ROC value for the IBK and Random tree classifiers is depicted in Figure 3 & Figure 4. Further, these digital persona thus constructed attrition-likely employees can be applied to the entire workforce to predict attrition-likely associates. Additionally, an appropriate engagement strategy can be formulated to suit the possible reasons based on the classifier algorithm. In future, this approach can be iteratively improved and scaled up as and when better employee connects data is generated and recorded. Figure 3: Performance Analysis of IBK classifier Figure 4: Performance Analysis of Random Forest classifier 6 Conclusions and Future Work The employee exit interview details have been classified through IBK and Random Tree classifier successfully in this work. These classifiers have categorized the employer profiles into relevant groups based on the reason furnished for leaving the organization. Thus, the sample profiles of attrition-likely employees can be deduced and formed from these groups. The following are the research outcomes: Design of the K-Elements framework in Employee Connects Evolving of the best classifier algorithm for employee exit data. Performance analysis on various metrics such as Precision, Recall, F-Measure and ROC for the best classifier. References: [1] Alao D. and Adeyemo A. B., Analyzing Employee Attrition Using Decision Tree Algorithms, Computing, Information Systems & Development Informatics Vol. 4, No. 1,2013. [2] Hamidah J, AbdulRazak H, and Zulaiha A. O, Towards Applying Data Mining Techniques for Talent Managements, International Conference on Computer Engineering and Applications, IPCSIT Vol. 2, IACSIT Press, [3] Hsin-Yun Chang, Employee Turnover: A Novel Prediction Solution with Effective Feature Selection, In WSEAS Transactions on Information Science and Applications., Vol. 6, No. 3, 2009, pp [4] Jayanthi, R., Goyal, D.P., Ahson, S.I., Data Mining Techniques for Better Decisions in Human Resource Management Systems, International Journal of Business Information Systems, Vol. 3, No. 5, 2008, pp [5] Jim Clifton State of the Global Workplace, Gallup Inc, 2013, Available: WorkplaceReport_2013.pdf [6] Kahn, William A, Psychological Conditions of Personal Engagement and Disengagement at Work, Academy of Management Journal, Vol. 33, No.4, 1990, pp [7] Kim Harrison, Good Communication can hugely lift employee engagement, Cutting Edge PR, 2013 Available: [8] Kishore Kumar. R, Poonkuzhali. G, Sudhakar. P "Comparative Study on Spam Classifier using Data Mining Techniques", Proceedings of International Multiconference on Engineers and Computer Scientist, Vol.1, [9] Poonkuzhali. s, Geetha Ramani, Kishore Kumar. R, "Efficient Classifier for TP53 Mutants using Feature Relevance Analysis", Proceedings of International Multiconference on Engineers and Computer Scientist, Vol.1, ISBN:
7 [10] Suceendran. K. M, Saravanan Ramachandran, Sarukesi K, Embedding K-Elements in HR Processes, Advances in Engineering, Science and Management (ICAESM), [11] Susan M Heathfield, 4 Tips for Effective Exit Interviews, Human Resources, Available: employment-ends/fl/4-tips-for-effective-exit- Interviews.htm [12] Sylvia Vorhauser-Smith, How the Best Places to Work are Nailing Employee Engagement, August 2013, Available: /08/14/how-the-best-places-to-work-arenailing-employee-engagement/ [13] Tamizharasi. K, Dr. UmaRani, Employee Turnover Analysis with Application of Data Mining Methods, International Journal of Computer Science and Information Technologies, Vol. 5, No. 1, 2014, pp [14] Thomas R Cutler, Employee Engagement, retention and communication, Business Excellence, August Available: [15] Tom O Byrne, History of employee engagement - from satisfaction to sustainability, HR Zone,2013, Available: [16] Weka Tool,Available: ISBN:
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
ANALYZING EMPLOYEE ATTRITION USING DECISION TREE ALGORITHMS
ANALYZING EMPLOYEE ATTRITION USING DECISION TREE ALGORITHMS Alao D. & Adeyemo A. B. Department of Computer Science University of Ibadan Ibadan, Nigeria [email protected] ABSTRACT Employee turnover
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
SVM Ensemble Model for Investment Prediction
19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE S. Anupama Kumar 1 and Dr. Vijayalakshmi M.N 2 1 Research Scholar, PRIST University, 1 Assistant Professor, Dept of M.C.A. 2 Associate
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three
Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.
International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant
Random forest algorithm in big data environment
Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest
Using multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
Better credit models benefit us all
Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis
Chapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors
Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann
An Approach to Detect Spam Emails by Using Majority Voting
An Approach to Detect Spam Emails by Using Majority Voting Roohi Hussain Department of Computer Engineering, National University of Science and Technology, H-12 Islamabad, Pakistan Usman Qamar Faculty,
CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
Email Spam Detection A Machine Learning Approach
Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn
Business Process Services. White Paper. Predictive Analytics in HR: A Primer
Business Process Services White Paper Predictive Analytics in HR: A Primer About the Authors Tuhin Subhra Dey Tuhin is a member of the Analytics and Insights team at Tata Consultancy Services (TCS), where
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk [email protected] Tom Kelsey ID5059-19-B &
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
Gerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
A Decision Tree for Weather Prediction
BULETINUL UniversităŃii Petrol Gaze din Ploieşti Vol. LXI No. 1/2009 77-82 Seria Matematică - Informatică - Fizică A Decision Tree for Weather Prediction Elia Georgiana Petre Universitatea Petrol-Gaze
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Predicting Student Performance by Using Data Mining Methods for Classification
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
Mining the Software Change Repository of a Legacy Telephony System
Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
Data Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
Classification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
Workshop: Predictive Analytics to Understand and Control Flight Risk
Workshop: Predictive Analytics to Understand and Control Flight Risk Data science for deeper insights and more accurate predictions Peter Louch Founder and CEO peter.louch@ (917) 443-4572 Agenda Introductions
Lecture: Mon 13:30 14:50 Fri 9:00-10:20 ( LTH, Lift 27-28) Lab: Fri 12:00-12:50 (Rm. 4116)
Business Intelligence and Data Mining ISOM 3360: Spring 203 Instructor Contact Office Hours Course Schedule and Classroom Course Webpage Jia Jia, ISOM Email: [email protected] Office: Rm 336 (Lift 3-) Begin
Data Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
Predicting earning potential on Adult Dataset
MSc in Computing, Business Intelligence and Data Mining stream. Business Intelligence and Data Mining Applications Project Report. Predicting earning potential on Adult Dataset Submitted by: xxxxxxx Supervisor:
E-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee [email protected] Seunghee Ham [email protected] Qiyi Jiang [email protected] I. INTRODUCTION Due to the increasing popularity of e-commerce
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
How To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző [email protected] 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
Data Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Keywords data mining, prediction techniques, decision making.
Volume 5, Issue 4, April 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analysis of Datamining
Predictive Data modeling for health care: Comparative performance study of different prediction models
Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath [email protected] National Institute of Industrial Engineering (NITIE) Vihar
Classification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
from Larson Text By Susan Miertschin
Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.
First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms
First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms Azwa Abdul Aziz, Nor Hafieza IsmailandFadhilah Ahmad Faculty Informatics & Computing
MACHINE LEARNING BASICS WITH R
MACHINE LEARNING [Hands-on Introduction of Supervised Machine Learning Methods] DURATION 2 DAY The field of machine learning is concerned with the question of how to construct computer programs that automatically
Leveraging Ensemble Models in SAS Enterprise Miner
ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to
COMPARING NEURAL NETWORK ALGORITHM PERFORMANCE USING SPSS AND NEUROSOLUTIONS
COMPARING NEURAL NETWORK ALGORITHM PERFORMANCE USING SPSS AND NEUROSOLUTIONS AMJAD HARB and RASHID JAYOUSI Faculty of Computer Science, Al-Quds University, Jerusalem, Palestine Abstract This study exploits
MHI3000 Big Data Analytics for Health Care Final Project Report
MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu (1002274530) http://gallery.azureml.net/details/81ddb2ab137046d4925584b5095ec7aa 1. Data pre-processing The data given
Towards applying Data Mining Techniques for Talent Mangement
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Towards applying Data Mining Techniques for Talent Mangement Hamidah Jantan 1,
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
Course Syllabus. Purposes of Course:
Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building
A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model
A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model Twinkle Patel, Ms. Ompriya Kale Abstract: - As the usage of credit card has increased the credit card fraud has also increased
Evaluation and Comparison of Data Mining Techniques Over Bank Direct Marketing
Evaluation and Comparison of Data Mining Techniques Over Bank Direct Marketing Niharika Sharma 1, Arvinder Kaur 2, Sheetal Gandotra 3, Dr Bhawna Sharma 4 B.E. Final Year Student, Department of Computer
Studying Auto Insurance Data
Studying Auto Insurance Data Ashutosh Nandeshwar February 23, 2010 1 Introduction To study auto insurance data using traditional and non-traditional tools, I downloaded a well-studied data from http://www.statsci.org/data/general/motorins.
ISSN: 2321-7782 (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Data Mining as a tool to Predict the Churn Behaviour among Indian bank customers
Data Mining as a tool to Predict the Churn Behaviour among Indian bank customers Manjit Kaur Department of Computer Science Punjabi University Patiala, India [email protected] Dr. Kawaljeet Singh University
Predicting Student Persistence Using Data Mining and Statistical Analysis Methods
Predicting Student Persistence Using Data Mining and Statistical Analysis Methods Koji Fujiwara Office of Institutional Research and Effectiveness Bemidji State University & Northwest Technical College
A Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis
ElegantJ BI White Paper The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis Integrated Business Intelligence and Reporting for Performance Management, Operational
A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining
A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining Sakshi Department Of Computer Science And Engineering United College of Engineering & Research Naini Allahabad [email protected]
Monday Morning Data Mining
Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse Outline: - data mining - IceCube - Data mining in IceCube Computer Scientists are different... Fakultät Physik Fakultät Physik
Decision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
not possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
Enhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
Data Mining and Predictive Modeling for HR Data Mining Techniques for Analyzing Voluntary Turnover. Jason Noriega jason.noriega@sandisk.
Data Mining and Predictive Modeling for HR Data Mining Techniques for Analyzing Voluntary Turnover Jason Noriega [email protected] 1 Agenda Introduction Background and experience. What is Data
Implementation of Data Mining Techniques to Perform Market Analysis
Implementation of Data Mining Techniques to Perform Market Analysis B.Sabitha 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, P.Balasubramanian 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
PharmaSUG2011 Paper HS03
PharmaSUG2011 Paper HS03 Using SAS Predictive Modeling to Investigate the Asthma s Patient Future Hospitalization Risk Yehia H. Khalil, University of Louisville, Louisville, KY, US ABSTRACT The focus of
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset
P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang
Numerical Algorithms Group
Title: Summary: Using the Component Approach to Craft Customized Data Mining Solutions One definition of data mining is the non-trivial extraction of implicit, previously unknown and potentially useful
Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product
Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago [email protected] Keywords:
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
Predictive Modeling and Big Data
Predictive Modeling and Presented by Eileen Burns, FSA, MAAA Milliman Agenda Current uses of predictive modeling in the life insurance industry Potential applications of 2 1 June 16, 2014 [Enter presentation
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Dynamic Predictive Modeling in Claims Management - Is it a Game Changer?
Dynamic Predictive Modeling in Claims Management - Is it a Game Changer? Anil Joshi Alan Josefsek Bob Mattison Anil Joshi is the President and CEO of AnalyticsPlus, Inc. (www.analyticsplus.com)- a Chicago
Organizational Application Managing Employee Retention as a Strategy for Increasing Organizational Competitiveness
Applied H.R.M. Research, 2003, Volume 8, Number 2, pages 63-72 Organizational Application Managing Employee Retention as a Strategy for Increasing Organizational Competitiveness Sunil Ramlall, Ph.D. University
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One
Index Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
Beating the MLB Moneyline
Beating the MLB Moneyline Leland Chen [email protected] Andrew He [email protected] 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series
How To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China [email protected] [email protected]
Machine learning for algo trading
Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center
