Data mining application in banking sector with clustering and classification methods

Size: px
Start display at page:

Download "Data mining application in banking sector with clustering and classification methods"

Transcription

1 Proceedings of the 2015 International Conference on Industrial Engineering and Operations Management Dubai, United Arab Emirates (UAE), March 3 5, 2015 Data mining application in banking sector with clustering and classification methods Aslı Çaliş Gazi University Department of Industrial Engineering Ankara, Turkey aslicalis@gazi.edu.tr Ahmet Boyaci Hitit University Department of Management Çorum, Turkey ahmet_boyaci@hotmail.com.tr Kasım Baynal Kocaeli University Department of Industrial Engineering Kocaeli, Turkey kbaynal@kocaeli.edu.tr Abstract Because of the phenomenal rise in information, future forecasting systems about strategy development were needed in each area. Therefore, data mining techniques are used extensively in banking area such as many areas. In this study, conducted in banking sector, it was aimed to reduce the rate of risk in decision making to a minimum via analysis of existing personal loan customers and estimate potential customers payment performances with k-means method is one of the clustering techniques and the decision trees method which is one of the models of classification in data mining. In the study, SPSS Clementine was used as a software of data mining and an application was done for evaluation of personal loan customers. Keywords classification; clustering; data mining; personal loans; spss clementine I. INTRODUCTION Advancements in computer technologies caused a rise in information production and data base system volume. To discover the data with the potential to be useful which are kept in databases and to create meaningful patterns from these are stated as data mining. Businesses are in a tense competition which needs continuity in today s consumer focused markets. Businesses have to apply effective and low cost marketing strategies to be successful in these competition conditions [9]. To create effective marketing strategies true information is needed and to obtain true information future headed forecasting systems which can analyze the data in multiple dimensions are needed. In this connection, the data mining techniques are used widely in banking field same as many other fields. Since credit allocation is a risky condition for the banks, in this study it is targeted to obtain secure information via data mining to reduce the rate of risk in decision making to a minimum and to start out to find customer potential for future. In this study, clustering and classification models are given a place. The credit repayment performance of existing individual customers in a first class branch belonging one of the largest banks engaged in financial sector in Turkey is assessed with k-means method and estimates will be indicated in relation to the repayment conditions of potential customers in future by using decision trees. II. LITERATURE SEARCH Aşan (2007), aimed at grouping the socio-economic characteristics of customers, using credit cards. With priority, as a functional outcome, the individual banking and credit cards, are defined; the place and importance of this concept in this country are explained and bank customers who are using credit cards are put into sets by clustering analysis. Bank customers which are put into relation with this method, are put into three groups, according to their socio-economic characteristics and into three sets. According to the three sets, it is observed that the customers differ, according to ten socioeconomic indicators [2]. Doğan (2008), completed an application about clustering analysis by taking financial rations of commercial banks which are active in Turkish Banking Sector in the period of ( ). By discussing compatibility of application results which are based on financial ratios and belonging to commercial banks that are active, as of the subject date, the analysis of sets is made and explained in the study. By discussing the adaptability of application results with the results of analysis done for banks and in the light of conclusions reached, in line with the purpose of using the technique of Analysis of Sets, the financial performance of banks are determined and similar banks that resemble each other from financial angle are defined. Examinations are made to see if they can be used as an existing technique, under observance of banks and to take the form of being a /15/$ IEEE

2 complementary technique, along with the ones that are present [7]. Chien and Chen (2008), made it an objective to develop the relationship rules containing personnel selection, personnel characteristics and behavior at work containing job performance and separation from work by presenting a framework for data mining. In their study, they have focused on decision trees and rules of working together and aimed at filling the gap between data mining and personnel selection and attainment of benefit in process of personnel selection. Especially, for personnel selection decision, with the decision tree analysis, rules are formed. Since considerably large number of personal data is categorical data, they are used in forming CHAID decision tree for classification. In assessment of the performance of the classification method and in arriving at beneficial rules, it is used as the lifting criteria. They have performed their studies in recruitment of indirect workers containing engineers and managers for different business functions of the firm. Results made it possible to determine decision rules relating to personnel performance and separation from work [5]. Hsia and et al. (2008) in their study, used mining technique in a University in Taiwan, in relation to preference of course and rate of course completion analysis. The student records for the years were made subject of research based on three data mining algorithms named decision tree, connection analysis and decision forest. The objective of studies was to use new data mining technique in determining the course preferences of students and preferences of students in future in relation to course to be attended. Decision trees are used in finding the course preferences of students, connection analysis is used to determine the course category and participant vocation correlation while decision forest was used to determine the probability of completing the course preferred by the participants. In the study, CHAID is used as decision tree. In form of course category and participant profession estimated variables, the status of participants at the time of joining is taken as the objective variable. After the decision tree structured to find the courses preferred, the connection analysis was used to find the relationship between the course category and the profession of the participant. Lastly, with the decision forest, courses preferred by participants coming from different sectors are determined [11]. Fu and et al. (2007), aimed at conducting a research on female and males from two different countries from the Angeles of their culture, behavior and social loyalty as estimation of factors which determine their quality of life. CART is used in determining the quality of life of 278 Australian and 398 Taiwan female and males. In form of 4 different dependent variable in the study, physical, psychological, spiritual (mental) and environmental health were measured for determining the multi-dimensional quality of life. Whereas the independent variables were culture, behavior and social loyalty along with socio demographic status, religious and spiritual characteristics. Social demographic variables were the age, marital status, level of education; current employment status and annual hose hold income. When age was taken into consideration as continues variable in this study, other variables were used as dummy in multiple regression analysis. At the end of studies, it was determined that the CART algorithm could be used with parametric data without need for data transformation and one of the big advantages of CART was to discover the hierarchy between independent variables [10]. Questier and et al. (2005), completed the CART and multiple variable regression tree (MRT) for controlled and uncontrolled characteristic selection. The CART Method allow modeling with controlled characteristics of more than one explanatory variables x and with one respond variable y. Whereas MRT is derived from CART and can perform processes with more than one response variable y. This shows that, controlled characteristic selection for applications in artificial and real data sets can be effectively used in selecting the characteristics of the method proposed. When the number of characteristics is reduced, the most important set structure is being presented. The method, is at the same time, is developing the structure of the set by removing he unnecessary characteristics and the ones that has no relationship [13]. Albayrak and Yılmaz (2009), completed a data mining application study of data of Istanbul Securities Exchange (İMKB). In the study, by benefiting annual financial indicators for the years of the 100 enterprises which were, in industry and service sector, operating in İMKB 100 indexes, the decision tree techniques which is a data mining technique was used. To data secured by using the financial information belonging to the companies, the CHAID algorithm was applied and position of enterprises one according to the other were determined. With the results of the study, the positions of enterprises according to each other were determined by using decision tree technique and most important variables effecting variable of the sector were determined [1]. Dolgun and et al. (2009), performed a study in form of application of data mining, in analysis of unstructured data. As a result of converting the unstructured data by using the methods of text and web mining and their contribution to the success of the model after inclusion and being converted into structured from, were analyzed. Models built by using C5.0 algorithm which is one of the decision tree methods were compared with each other and the best model is determined [8]. Emel and Taşkın (2005), by benefiting from data base containing personalized sales behavior according to the customers of a retail enterprise, aimed at, making a sales analysis containing, a detailed and relative measurement results. The classification type formed, benefited from C&RT decision tree technique for the sales forecasting model. At the end of C&RT decision tree technique application, k the customers were divided into classes according to their amount of spending. By doing so, it was possible to determine the target voids formed in scale success and to determine if at what degree the relative contributions of different factors were in this [9]. Özekes and Çamurcu (2002), made an application in data mining about classification and prediction. In this application, by examine the credits given to customers in past by a bank

3 and the credits contracts that are ended, the decision tree and classification rules were formed. Following this, by using these classification rules, the status of repayment of credits of customers with credit contracts continuing were estimated [12]. III. METHOD/MODEL In the application, k-means method is one of the clustering techniques and the decision trees method which is one of the models of classification in data mining will be used. The application will be realized by using the SPSS Clementine program. In the application, the effects of variables on clusters will separately examine by using the k-means method and assessments will make in the direction of existing customers. Also the results of C5.0 and C&RT algorithms will be compared. A. K-Means for Clustering Clustering is one of the basic data processing. It is widely used in solving problems of customer segmentation and determination of swindling acts. In clustering applications we end up performing three tasks [6]. 1. Separation of data sets into sections within the clusters, 2. Verification of results of clustering, 3. Interpreting the clusters. Objective in models of clustering, bases on the fact that the elements of the clusters, resemble each other very much, but have characteristics that are present in clusters having a rather different aspects. Records present in database, are divided into this different clusters. In the K-means algorithm, K value can be determined according to problem or it can not be determined. Like squared error criterion, there is need to have a clustering criterion. The K-means algorithm starts with random selection of an object that will represent every cluster. Each of remaining objects is assigned to a cluster and the clustering criterion is used to compute average of the cluster. These averages are used as new cluster averages and each of the objects are assigned again to the cluster that resembles itself most. These clusters are computed and until no change is observed in the clusters and no change fall under the desired error level, this cycle is continued [4]. B. Decision Trees for Classification Decision trees are data mining approaches that are frequently used in classification and estimation. Despite being capable of being used in classification of other methodologies like the nerve networks, the decision trees with their easy to make interpretations and ease of being understood provides advantage or decision makers [5]. Decision trees: Have low cost, They are easy to understand, interpret and could be integrated with data base, Having good dependability (reliability) Because of such reasons, they are one of the most widely used classification techniques. Classification of data by using decision tree technique is a two step process which contains learning and classification. Before the learning step, a known training data is analyzed by a classification algorithm with the purpose of building a model. The model learned, is seen as the classification rule or the decision tree. Whereas in classification step, test data is used to determine the correctness of classification rules or correctness of the decision tree. If correctness is at an acceptable rate, rules are used for the purpose of classification of new data. The areas in the training data must be determined in relation to which sequence they will be used in forming the decision tree. For this purpose, the most widely used measurement, is the Entropy measurement. As much the Entropy measure is, the results determined by using that filed will be uncertain and indifferent at that rate. Therefore, the areas having least entropy measure at the root of the decision tree are used [12]. Let area A has different k vales {a 1, a 2,..., a k }. The Formula for finding the entropy measure of area A given is [12]: M k N E ( CA \ ) = p( a k, j) x p( ci\a k, j) log 2 p( ci\a k, j) j= 1 i= 1 (1) Where: E (C\A) = Entropy measure of classification characteristic of area A, p (a k, j) = Probability of area a k having a value of j, p (c i \ a k, j) = Probability of class value of area a k when it has value of j to be c i, M k = Number of values contained in area a k ; j=1,2,, M k, N = number of different classes ; i= 1,2,, N k = number of areas ; k = 1,2,, k. If elements in a cluster S are separated categorically to C 1, C 2, C 3,..., C i classes, to determine the class of an element in cluster S, the required information is being computed by using the Formula: ( ) = log ( ) + log ( ) log ( ) I S p p p p p p i 2 i (2) In this Formula, p i, is the probability of a random sample to be separated into class C i and it is expressed as S i / S. Whereas S i is in class C i and represents the number of samples of a S. Expected information equation basing on Entropy or

4 separation of sub sets according to A can be expressed as follows: n S i (3) S i= 1 i ( ) = I( S ) E A In this case, in the branching process to be made by using the area A, the information gain is computed by using the Formula: Kazanç ( A) = I ( S) E( A) (4) In other words, Gain (A) is the decrease in entropy originating from knowing the value of area A. C. C4.5 and C5.0 Algorithms The most widely used decision tree algorithm is the C4.5 algorithm which is the develop state of ID3 algorithm that was proposed in 1986 by Quinlan. The C5.0 algorithm is the develop state of C4.5 and it is used especially for large data sets. To increase the correctness for the C5.0 algorithm, the boosting algorithm is used and therefore they are also known as boosting trees. The C5.0 algorithm is more rapid as compared to C4.5 and uses memory in a more productive manner [14]. Even to the results of both of the two algorithms are the same, the C5.0 as form makes it possible to come out with a smoother decision trees. D. CART Algorithm It has the nature of being the continuation of the decision tree of Morgan and Sonquist titled AID (Automatic Interaction Detection) and was proposed by Breiman and others in CART algorithm accepts both numerical and the nominal data types as input and predicted variables; can be used as a solution in classification and regression problems. CART decision tree, has unique dual form divided into a structure. As branching criteria, CART tree benefits from Gini index, without any stopping rule at the phase of its structuring, it is continually divided and grows. In the state where a new branching will not be realized, a cutting out from top in the direction of root is started. The probable most successful decision tree is subjected to assessment with a test data independently selected after each cutting offs and efforts are made to make determinations [15]. A. Data IV. APPLICATION Within the scope of the study, data containing customer numbers and information about the status of credit paybacks belonging to the credit customers of the branch where the application is going to be made were secured from the operating within the structure of General Directorate. Information about gender, marital status, age, monthly income, income by spouse, status of education, owning house and car, having children, being a customer who receives his salary form the bank, the way of work were reached by using customer numbers and the existing system in the bank for making examinations. Since principles of confidentiality were observed, the customer numbers were changed. Data used in the study, were put into categorical state. After deleting the lacking and erroneous data, remaining data were inputted on to Microsoft Excel and preliminary works were performed. A matrix of 200 x 12 containing total of 200 customers was formed. Data to be used in the application were based on legal follow-up and normal payment records in a period of six month belonging to the individuals in the branch. B. Application for Clustering The most critical subject of Clustering Analysis is to decide about the number of cluster. The researcher must minimize the uniqueness, in deciding the number of cluster. However, in many articles that are published currently, there are no final results that could be indicated as findings on this subject. The most known of initially proposed approaches is the identity: k= (n/2) 1/2 (5) And it is computed as indicated above. Where k is the number of clusters, n is the number of units. They are recommended for use in research, based on small samples. When it is used in research having large sample, it becomes difficult to reach at healthy results [3]. There are two different method that are used in applications, to determine the number of clusters First, the number of cluster is determined to be 10 by using: k= (200/2) 1/2 Where, 200 is the number of customers. In the second phase, the number of clusters from k= 2 to k=10 are increased by one and sum of squared errors for each value is determined. The k= 10 value is determined according to the above formula and the squares of errors which relates to other cluster numbers, are compared and the value having least sum of squared errors is accepted as the number of clusters. In Table I there are values relating to sum of squared errors for each number of clusters. The cluster having the least value of sum of squared errors is determined to be 3. TABLE I. Number of Clusters for K-means and Sum of Squared Errors It was determined by the program that, there was no variable that didn t have any effect on three clusters so the effects belonging all of the variables had been examined and the results had been interpreted. If we study the clusters, from the angle of payment status, we can see that, this variable represents importance for three clusters. As it can be observed from Figure I, all of the customers in the first cluster, experienced problem in making payment for credit amounts and they were subjected to legal

5 follow-up. The customers in the second and third clusters, were formed by persons who experienced problem, respectively at the rates of %63,64 and %98,48, in repayment of credit. to 200 individual credit customers. Of the data 60 % were allotted to Training set, and the remaining 40 % for the test set. The Status of Payment which is a dependent variable contained data about 100 each Legal follow-up and 100 normal payment status. For training and test set, this rate was protected as 50 %. Model built with SPSS Clementine can be observed from Figure II. Figure I. Effect of Payment Status Variable on Clusters When the customer profiles are assessed, under the light of data given above, it is observed that the first cluster is formed mostly by persons at age 45-51, having no home and car belonging to them and with monthly income in the rage of TL, receiving their salaries from different banks, graduates of primary school and retired male customers. All of the customers, in this cluster by delaying making payment for credit amounts when due entered into a legal follow-up status. The second cluster is generally composed of single customers, in age interval of years, employed at public and private sector. Contrary to others, the number of females in this cluster is more, as compared to the number of males. Another subject to pay attention is that % of the customers in this cluster, do not have house belonging to themselves and they do not possess normal payment status. When one looks at the third cluster, it can be seen that, this cluster contains, in majority, the public employees, retired male customers, having monthly income in the range of TL, receiving their salaries from the banks from which they have used credits. They are at age interval of years, owning a house and a car. Of a significant portion of customers, in this cluster, there is income earned by the spouse and % of them make payments in an orderly manner. Figure II. Model Built with SPSS Clementine 1) Results of C5.0 Algorithm: While customer classification was being done with C5.0 Algorithm, according to payment status, it can be seen that the first branching in the decision tree starts with status of education. In other words, the most effective variable on status of payment is seen to be status of education. Of the persons graduated from University % paid credits without delay and 9.52% entered into a legal follow-up state. For the customers having primary school and high school graduation level, the decision tree continued to branch with the status of being a customer who receives his salary from the bank. It is observed that in this group, all of the customers in status of being a customer who receives his salary from different bank entered into legal follow-up. According to the existing system of the bank, the credit installments of customers are being regularly collected from their salary accounts and in case the installments are delayed, the account is blocked and collection is attained. In this case, unless the customer does not enter into an exceptional condition, he is not subjected to follow-up. The structure of tree of C5.0 algorithm can be observed from Figure III. C. Application for Classification The application was realized by using a data set containing 11 independent variables and 1 dependent variables belonging

6 Figure III. Structure of decision tree belonging to the C5.0 Algorithm 2) Rate of accuracy for the C5.0 Algorithm: The rate of accuracy of the algorithm for training set is determined to be 96.67% and the rate of accuracy for test set is determined to be %. As it can be observed form Figure IV, there are 4 data in the training set and 9 data in the test set which are incorrectly classified. Since rate of accuracy is high for the both sets, it is possible to say that the model is successful. Figure V. Structure of decision tree of the C&RT Algorithm 4) Rate of accuracy belonging to the C&RT Algorithm: The rate of accuracy of the algorithm for training set is determined to be % and the rate of correctness for test set as %. As it can be observed from Figure VI, 4 data were incorrectly classified for training set and 11 data for the test set. Since rate of accuracy is high for both sets, it is possible to say that the model is successful. Figure VI. Rate of Accuracy for the C&RT Algorithm for training and test sets 5) Gains for the C&RT Algorithm: Gains which was provided by the C&RT Algorithm can be observed from Table.II. Figure IV. Rate of accuracy for the C5.0 Algorithm for training and test sets 3) Results of C&RT Algorithm: Branching of C&RT Algorithm started with the monthly income variable. It is observed that customers having monthly income of 750 TL and less and the customers having monthly income of TL were put into same group and of these customers who did not have equivalent salaries, all of them were in legal follow-up. According to monthly income, in classification of customers included into another group, the second variable which was effective is observed to be the age variable. The structure of tree belonging to the C&RT algorithm continues as follows: TABLE II. Gains for the C&RT Algorithm When the C&RT Algorithm is examined from the angle of gains it provided, it is possible to arrive at the below conclusions. When the nodes in Status of payment indicated to be in legal follow-up included in this table are taken into consideration and the 16th node is examined:

7 Node = 2 (of the total 120 data, in 16th node there are 2 data). Node(%) = 1.67 (2/120), Gain=2 (of the total of 120 entries, 60 of them belong to customers in legal follow-up. In 16th node, there are 2 entries and 2 of them belong to customers in legal follow-up), Gain (%) =3.33 [(Number of entries arriving at the node)=2/(number of entries in legal follow-up)=60], Response(%)=100 [(Number of entries arriving at the node)=2/ Number of entries in legal follow-up arriving at the node)=2], Index (%) =200[Response (%) =100/ Ratio of number of entries in legal follow-up to all of the entries= (60/120)]. 6) Binary Classifier Results: When Binary Classifier results are viewed from Table III, the value of accuracy of test set is seen to be at the highest classifier of Artificial Nerve Networks with a rate of 93.75%. The classifier in the second rank from the angle of accuracy value is Logistic Regression with a rate of % and in the third rank C5.0 is seen with a rate of % and the C&RT Algorithm. TABLE III. Binary Classifier Results At the end of the clustering analysis, numer of cluster was determined to be three. In first cluster all of the customers, by delaying making payment for credit amounts when due entered into a legal follow-up status. In case the customers in this cluster again requests a credit, their applications can be assessed as negative or requests can be made to present mortgage or a guarantee to decrease the rate of risks involved. In second cluster % of the customers do not have house belonging to themselves and they do not posses normal payment status. In this case, for the customers who did not delay their installments, encouragements can be provided to them to use housing loans. In third cluster, there is income earned by the spouse and % of them make payments in an orderly manner. Customers in this cluster, can be assessed, as special customers and cross-sales of internet banking, investment account having no account operation expense, foreign exchange account, credit cards, Rapid Transit System (HGS) devices and insurance products containing such events as accident, earthquake and fire can be made to them. Furthermore, in case they demand credit again, special interest reduction polices can be implemented. By doing so, the loyalty of the customers to the related bank can be protected. According to results of classification, while the C5.0 Algorithm is forming tree with multiple branches originating from each node, the C&RT Algorithm generated rules based on dual division process. While the most important variable in forming decision there in C5.0 Algorithm was status of education; for the C&RT Algorithm the monthly income variable became the most important one. When rate of accuracy of models are viewed, it is observed that from the angle of training set, both model have same percentage values. From the angle of test set, it is observed that C5.0 produces better results with a difference of 2.5% as compared to the C&RT Algorithm. So, by using decision tree of C5.0 Algorithm, potential customers repayment performances can be estimated and reducing the rate of entrance into legal follow-up can be provided. By increasing the levels of decision tree, it may be possible to reach at higher estimation successes. V. COMPARISON / CONCLUSIONS For the banks to attain competitive advantage in the sector and to stay operating for long time periods, they must understand their customers correctly and must separate risky customers from others. In the study, it was aimed to analysis of existing personal loan customers and estimate potential customers repayment performances. Firstly, existing customers that resemble each other very much according to a predetermined criteria were grouped into same group by using k-means method is one of the clustering techniques and then the rules were formed for potential credits customers in future by using the decision trees method which is one of the models of data mining classification. Under the light of data, the target is determined to reduce the rate of entrance into legal followup. REFERENCES [1] Albayrak A. S., Yılmaz Ş.K., Data mining: Decision tree algorithms and an application on data of IMKB, Süleyman Demirel University The Journal of Faculty of Economics and Administrative Sciences, vol. 14, pp , [2] Aşan Z., Examining the socioeconomic characteristics of customers using credit cards, with clustering analysis, Dumlupınar University The Journal of Social Sciences, vol.17, pp , [3] Atbaş A. C., A study on determining the cluster number in clustering analysis, Master Thesis, Ankara University, Graduate School of Natural Sciences, [4] Bilen H., Data mining application for personnel selection and performance evaluation in banking sector, Master Thesis, Gazi University, Graduate School of Natural and Applied Sciences, [5] Chien C.-F., Chen L.-F., Data mining to improve personnel selection and enhance human capital: A case study in high-technology industry, Expert Systems with Applications,vol. 34, pp , [6] Ching W. K., Pong M. K.., Advances in data mining and modeling, 1st ed., World Scientific, Hong Kong, China, [7] Doğan B., Clustering analysis as a tool under the supervision of banks: An application for Turkish banking sector, PhD Thesis, Kadir Has University, Graduate School of Social Sciences, 2008.

8 [8] Dolgun M.Ö., Özdemir T.G., Oğuz D., Analysis of unstructured data in data mining: Text and web mining, Journal of Statisticians, vol. 2, pp , [9] Emel G. G., Taşkın Ç., Decision trees in data mining and a sales analysis application, Eskişehir Osmangazi University Journal of Social Sciences, vol. 6, pp , [10] Fu S.-Y. K., Anderson D., Courtney M., Hu W., The relationship between culture, attitude, social networks and quality of life in midlife Austrilian and Taiwanese citizens, Maturitas, vol.58, pp , [11] Hsia T.-C., Shie A.-J.,Chen L.-C., Course planning of extension education to meet market demand by using data mining techniques-an example of Chinkuo technology university in Taiwan,Expert Systems with Applications, vol. 34, pp , [12] Özekes S., Çamurcu A.Y., A classification and prediction application in data mining, Marmara University Journal of Science, vol. 18, pp. 1-17, [13] Questier F., Put R., Coomans D., Walczak B., Heyden Y.V., The use of CART and multivariate regression trees for supervised and unsupervised feature selection,chemometrics And Intellegent Labaratory Systems, vol. 76,pp , [14] Sancak S., Comparison of techniques belonging to intrusion detection systems, Master Thesis, Gebze Institute of Technology, Graduate School of Engineering and Sciences,2008. [15] Sezer E. A., Bozkır A.S., Yağız S., Gökçeoğlu C., The effect of the depth of decision tree to the prediction capacity in C&RT algorithm: an application on the progress rate of a tunnel boring machine, Symposium of Innovations and Applications in Intelligent Systems, Kayseri, Turkey, June 2010.

Applying Data Mining Technique to Sales Forecast

Applying Data Mining Technique to Sales Forecast Applying Data Mining Technique to Sales Forecast 1 Erkin Guler, 2 Taner Ersoz and 1 Filiz Ersoz 1 Karabuk University, Department of Industrial Engineering, Karabuk, Turkey erkn.gler@yahoo.com, fersoz@karabuk.edu.tr

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

Towards applying Data Mining Techniques for Talent Mangement

Towards applying Data Mining Techniques for Talent Mangement 2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Towards applying Data Mining Techniques for Talent Mangement Hamidah Jantan 1,

More information

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d. EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models

More information

A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING

A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING Ahmet Selman BOZKIR Hacettepe University Computer Engineering Department, Ankara, Turkey selman@cs.hacettepe.edu.tr Ebru Akcapinar

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE S. Anupama Kumar 1 and Dr. Vijayalakshmi M.N 2 1 Research Scholar, PRIST University, 1 Assistant Professor, Dept of M.C.A. 2 Associate

More information

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three

More information

Application of Data mining in predicting cell phones Subscribers Behavior Employing the Contact pattern

Application of Data mining in predicting cell phones Subscribers Behavior Employing the Contact pattern Application of Data mining in predicting cell phones Subscribers Behavior Employing the Contact pattern Rahman Mansouri Faculty of Postgraduate Studies Department of Computer University of Najaf Abad Islamic

More information

Prediction of Stock Performance Using Analytical Techniques

Prediction of Stock Performance Using Analytical Techniques 136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

More information

DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING

DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING ABSTRACT The objective was to predict whether an offender would commit a traffic offence involving death, using decision tree analysis. Four

More information

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Optional Insurance Compensation Rate Selection and Evaluation in Financial Institutions

Optional Insurance Compensation Rate Selection and Evaluation in Financial Institutions , pp.233-242 http://dx.doi.org/10.14257/ijunesst.2014.7.1.21 Optional Insurance Compensation Rate Selection and Evaluation in Financial Institutions Xu Zhikun 1, Wang Yanwen 2 and Liu Zhaohui 3 1, 2 College

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning

More information

Management Science Letters

Management Science Letters Management Science Letters 4 (2014) 905 912 Contents lists available at GrowingScience Management Science Letters homepage: www.growingscience.com/msl Measuring customer loyalty using an extended RFM and

More information

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream

More information

Role of Social Networking in Marketing using Data Mining

Role of Social Networking in Marketing using Data Mining Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:

More information

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

More information

Predictive Modeling of Titanic Survivors: a Learning Competition

Predictive Modeling of Titanic Survivors: a Learning Competition SAS Analytics Day Predictive Modeling of Titanic Survivors: a Learning Competition Linda Schumacher Problem Introduction On April 15, 1912, the RMS Titanic sank resulting in the loss of 1502 out of 2224

More information

Data mining techniques: decision trees

Data mining techniques: decision trees Data mining techniques: decision trees 1/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 2/39 1 Agenda Rule systems Building rule systems vs rule systems Quick reference 3/39

More information

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges A Basic Guide to Modeling Techniques for All Direct Marketing Challenges Allison Cornia Database Marketing Manager Microsoft Corporation C. Olivia Rud Executive Vice President Data Square, LLC Overview

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Data Mining Techniques Chapter 6: Decision Trees

Data Mining Techniques Chapter 6: Decision Trees Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................

More information

How To Solve The Kd Cup 2010 Challenge

How To Solve The Kd Cup 2010 Challenge A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

More information

Course Syllabus. Purposes of Course:

Course Syllabus. Purposes of Course: Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building

More information

A Property & Casualty Insurance Predictive Modeling Process in SAS

A Property & Casualty Insurance Predictive Modeling Process in SAS Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing

More information

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether

More information

Decision Trees What Are They?

Decision Trees What Are They? Decision Trees What Are They? Introduction...1 Using Decision Trees with Other Modeling Approaches...5 Why Are Decision Trees So Useful?...8 Level of Measurement... 11 Introduction Decision trees are a

More information

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Enhanced Boosted Trees Technique for Customer Churn Prediction Model IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction

More information

An Introduction to Advanced Analytics and Data Mining

An Introduction to Advanced Analytics and Data Mining An Introduction to Advanced Analytics and Data Mining Dr Barry Leventhal Henry Stewart Briefing on Marketing Analytics 19 th November 2010 Agenda What are Advanced Analytics and Data Mining? The toolkit

More information

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

More information

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of

More information

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

More information

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

More information

DATA MINING METHODS WITH TREES

DATA MINING METHODS WITH TREES DATA MINING METHODS WITH TREES Marta Žambochová 1. Introduction The contemporary world is characterized by the explosion of an enormous volume of data deposited into databases. Sharp competition contributes

More information

Random forest algorithm in big data environment

Random forest algorithm in big data environment Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest

More information

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil brunorocha_33@hotmail.com 2 Network Engineering

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

A Review of Anomaly Detection Techniques in Network Intrusion Detection System

A Review of Anomaly Detection Techniques in Network Intrusion Detection System A Review of Anomaly Detection Techniques in Network Intrusion Detection System Dr.D.V.S.S.Subrahmanyam Professor, Dept. of CSE, Sreyas Institute of Engineering & Technology, Hyderabad, India ABSTRACT:In

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Nine Common Types of Data Mining Techniques Used in Predictive Analytics 1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better

More information

Classification and Prediction

Classification and Prediction Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

More information

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1 Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints

More information

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO

More information

A Decision Tree for Weather Prediction

A Decision Tree for Weather Prediction BULETINUL UniversităŃii Petrol Gaze din Ploieşti Vol. LXI No. 1/2009 77-82 Seria Matematică - Informatică - Fizică A Decision Tree for Weather Prediction Elia Georgiana Petre Universitatea Petrol-Gaze

More information

Decision Trees for Mining Data Streams Based on the Gaussian Approximation

Decision Trees for Mining Data Streams Based on the Gaussian Approximation International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 Decision Trees for Mining Data Streams Based on the Gaussian Approximation S.Babu

More information

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4. Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics

More information

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition Brochure More information from http://www.researchandmarkets.com/reports/2170926/ Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd

More information

POST-HOC SEGMENTATION USING MARKETING RESEARCH

POST-HOC SEGMENTATION USING MARKETING RESEARCH Annals of the University of Petroşani, Economics, 12(3), 2012, 39-48 39 POST-HOC SEGMENTATION USING MARKETING RESEARCH CRISTINEL CONSTANTIN * ABSTRACT: This paper is about an instrumental research conducted

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

More information

How To Make A Credit Risk Model For A Bank Account

How To Make A Credit Risk Model For A Bank Account TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions

More information

Web Mining as a Tool for Understanding Online Learning

Web Mining as a Tool for Understanding Online Learning Web Mining as a Tool for Understanding Online Learning Jiye Ai University of Missouri Columbia Columbia, MO USA jadb3@mizzou.edu James Laffey University of Missouri Columbia Columbia, MO USA LaffeyJ@missouri.edu

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS

PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao ABSTRACT Department of Computer Engineering, Fr.

More information

Data Mining Methods: Applications for Institutional Research

Data Mining Methods: Applications for Institutional Research Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

More information

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19 PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

A Fuzzy Decision Tree to Estimate Development Effort for Web Applications

A Fuzzy Decision Tree to Estimate Development Effort for Web Applications A Fuzzy Decision Tree to Estimate Development Effort for Web Applications Ali Idri Department of Software Engineering ENSIAS, Mohammed Vth Souissi University BP. 713, Madinat Al Irfane, Rabat, Morocco

More information

Binary Logistic Regression

Binary Logistic Regression Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including

More information

PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES

PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES The International Arab Conference on Information Technology (ACIT 2013) PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES 1 QASEM A. AL-RADAIDEH, 2 ADEL ABU ASSAF 3 EMAN ALNAGI 1 Department of Computer

More information

Prospects, Problems of Marketing Research and Data Mining in Turkey

Prospects, Problems of Marketing Research and Data Mining in Turkey Prospects, Problems of Marketing Research and Data Mining in Turkey Sema Kurtulu, and Kemal Kurtulu Abstract The objective of this paper is to review and assess the methodological issues and problems in

More information

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010. Title Introduction to Data Mining Dr Arulsivanathan Naidoo Statistics South Africa OECD Conference Cape Town 8-10 December 2010 1 Outline Introduction Statistics vs Knowledge Discovery Predictive Modeling

More information

AnalysisofData MiningClassificationwithDecisiontreeTechnique

AnalysisofData MiningClassificationwithDecisiontreeTechnique Global Journal of omputer Science and Technology Software & Data Engineering Volume 13 Issue 13 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One

More information

A Survey on Requirement Analysis in the Nigerian Context

A Survey on Requirement Analysis in the Nigerian Context A Survey on Requirement Analysis in the Nigerian Context Olaronke Ganiat Elias 1, Janet Olusola Olaleke 1, Micheal Segun Olajide 1, and Nureni John Ayinla 1 1 Computer Science Department, Adeyemi College

More information

ATTITUDE OF FIRST GRADE TEACHERS OF PRIMARY EDUCATION SCHOOLS RELATED TO PHYSICAL EDUCATION AND SPORT LESSONS

ATTITUDE OF FIRST GRADE TEACHERS OF PRIMARY EDUCATION SCHOOLS RELATED TO PHYSICAL EDUCATION AND SPORT LESSONS ATTITUDE OF FIRST GRADE TEACHERS OF PRIMARY EDUCATION SCHOOLS RELATED TO PHYSICAL EDUCATION AND SPORT LESSONS OSMAN PEPE 1, CELAL TAŞKIRAN 2, KADIR PEPE 3, BEKIR ÇOKSEVİM 4 Abstract Purpose: At this study,

More information

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013. Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing

More information

Efficient Integration of Data Mining Techniques in Database Management Systems

Efficient Integration of Data Mining Techniques in Database Management Systems Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France

More information

Research on the Performance Optimization of Hadoop in Big Data Environment

Research on the Performance Optimization of Hadoop in Big Data Environment Vol.8, No.5 (015), pp.93-304 http://dx.doi.org/10.1457/idta.015.8.5.6 Research on the Performance Optimization of Hadoop in Big Data Environment Jia Min-Zheng Department of Information Engineering, Beiing

More information

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical

More information

Data Mining Techniques in CRM

Data Mining Techniques in CRM Data Mining Techniques in CRM Inside Customer Segmentation Konstantinos Tsiptsis CRM 6- Customer Intelligence Expert, Athens, Greece Antonios Chorianopoulos Data Mining Expert, Athens, Greece WILEY A John

More information

THE AWARENESS OF HAVING LIFE INSURANCE POLICY: AN EXPLORATORY STUDY AMONG LAGOS STATE UNIVERSITY STUDENTS IN OJO, LAGOS, NIGERIA

THE AWARENESS OF HAVING LIFE INSURANCE POLICY: AN EXPLORATORY STUDY AMONG LAGOS STATE UNIVERSITY STUDENTS IN OJO, LAGOS, NIGERIA THE AWARENESS OF HAVING LIFE INSURANCE POLICY: AN EXPLORATORY STUDY AMONG LAGOS STATE UNIVERSITY STUDENTS IN OJO, LAGOS, NIGERIA Mustapha Abiodun Okunnu, Lagos State Polythenic, Ikorodu, Lagos, Nigeria

More information

Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

More information

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar Prepared by Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com Louise.francis@data-mines.cm

More information

ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS

ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS Michael Affenzeller (a), Stephan M. Winkler (b), Stefan Forstenlechner (c), Gabriel Kronberger (d), Michael Kommenda (e), Stefan

More information

Keywords data mining, prediction techniques, decision making.

Keywords data mining, prediction techniques, decision making. Volume 5, Issue 4, April 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analysis of Datamining

More information

Data Preprocessing. Week 2

Data Preprocessing. Week 2 Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

More information

How To Find Out How Different Groups Of People Are Different

How To Find Out How Different Groups Of People Are Different Determinants of Alcohol Abuse in a Psychiatric Population: A Two-Dimensionl Model John E. Overall The University of Texas Medical School at Houston A method for multidimensional scaling of group differences

More information

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

More information

REPORT DOCUMENTATION PAGE

REPORT DOCUMENTATION PAGE REPORT DOCUMENTATION PAGE Form Approved OMB NO. 0704-0188 Public Reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,

More information

METACOGNITIVE AWARENESS OF PRE-SERVICE TEACHERS

METACOGNITIVE AWARENESS OF PRE-SERVICE TEACHERS METACOGNITIVE AWARENESS OF PRE-SERVICE TEACHERS Emine ŞENDURUR Kocaeli University, Faculty of Education Kocaeli, TURKEY Polat ŞENDURUR Ondokuz Mayıs University, Faculty of Education Samsun, TURKEY Neşet

More information

2015 Workshops for Professors

2015 Workshops for Professors SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market

More information

Identification of User Patterns in Social Networks by Data Mining Techniques: Facebook Case

Identification of User Patterns in Social Networks by Data Mining Techniques: Facebook Case Identification of User Patterns in Social Networks by Data Mining Techniques: Facebook Case A. Selman Bozkır 1, S. Güzin Mazman 2, and Ebru Akçapınar Sezer 1 1 Hacettepe University, Department of Computer

More information

A Hybrid Decision Tree Approach for Semiconductor. Manufacturing Data Mining and An Empirical Study

A Hybrid Decision Tree Approach for Semiconductor. Manufacturing Data Mining and An Empirical Study A Hybrid Decision Tree Approach for Semiconductor Manufacturing Data Mining and An Empirical Study 1 C. -F. Chien J. -C. Cheng Y. -S. Lin 1 Department of Industrial Engineering, National Tsing Hua University

More information

Method of Fault Detection in Cloud Computing Systems

Method of Fault Detection in Cloud Computing Systems , pp.205-212 http://dx.doi.org/10.14257/ijgdc.2014.7.3.21 Method of Fault Detection in Cloud Computing Systems Ying Jiang, Jie Huang, Jiaman Ding and Yingli Liu Yunnan Key Lab of Computer Technology Application,

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad Email: Prasad_vungarala@yahoo.co.in

Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad Email: Prasad_vungarala@yahoo.co.in 96 Business Intelligence Journal January PREDICTION OF CHURN BEHAVIOR OF BANK CUSTOMERS USING DATA MINING TOOLS Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

from Larson Text By Susan Miertschin

from Larson Text By Susan Miertschin Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.

More information

Content Based Data Retrieval on KNN- Classification and Cluster Analysis for Data Mining

Content Based Data Retrieval on KNN- Classification and Cluster Analysis for Data Mining Volume 12 Issue 5 Version 1.0 March 2012 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: & Print ISSN: Abstract - Data mining is sorting

More information