Enhancing Quality of Data using Data Mining Method

Size: px
Start display at page:

Download "Enhancing Quality of Data using Data Mining Method"

Transcription

1 JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN Enhancing Quality of Data using Data Mining Method Fatemeh Ghorbanpour A., Mir M. Pedram, Kambiz Badie, Mohammad Alishahi Abstract Data is asset for companies and organizations. Because data and the information obtained from data analysis play an important role in decision making. The quality of data affects the quality of decisions and the incorrect data causes incorrect decision making. Recently, a great deal of researches has focused on enhancing data quality. It is infeasible or very difficult to improve quality of data through manual inspection. Because data quality is one of the complicated and non-structured concepts and data cleansing process can not be done without the help of professional domain experts, and detection of errors require a thorough knowledge in the related domain of the data. Therefore (semi-)automatic data cleansing methods is employed to find data errors and defects and solve them. Data mining methods are appropriate for enhancing different dimensions of data quality, since they are aimed at finding abnormal patterns in large volumes of dataset. In this paper, a new approach is presented to detect the errors inside the dataset using fuzzy association rules. Fuzzy association rules are used to build a model that is intended to capture the structure of the regarded data. Finally, Experimental results of the proposed approach show the effectiveness of the proposed method to find errors in datasets. Index Terms data quality, data mining, fuzzy association rules INTRODUCTION IDESPREAD use of data and decisions based on Wdata analysis focuses on data quality in today s business success. Nevertheless, a study by the Mta group reveals that 4% of projects based on data analysis will be failed. As one of the main reasons, they identified insufficient data quality leading to wrong decisions []. Therefore, traditional methods of cleaning data can be used rarely. It is normally infeasible to guarantee data quality by manual inspection, especially when data are collected over long periods of time and through multiple generations of databases. Therefore, (semi-)automatic data cleaning methods have to be used []. Since the early 9s, knowledge discovery in databases (KDD) has been introduced as a well established field of research, and over the years new methods together with scalable algorithms have been developed to analyze effciently very large datasets. Unfortunately most of the orientation is towards the particular and theoretical problems. The application of data mining methods to improve data quality is a relatively new and promising approach from research and usage viewpoint and can present new domains of their application outside the domain of pure data analysis [2]. Data quality studies have been accomplished using data mining methods. For example, Brodley and Friedl F. Ghorbanpour A. is with the Department of Computer Engineering, Islamic Azad University, Science and Research Branch, Tehran,Iran M.M. Pedram is with the Department of Computer Engineering, Faculty of Engineering, Tarbiat Moallem University, Karaj/Tehran,Iran K. Badie is with the Information Technology Research Center,Telecom Research center, Tehran,Iran M. Alishahi is with the Department of Industrial Engineering, Sharif University of Technology, Tehran,Iran used data mining in preprocessing data in the process of knowledge discovery in database. By filtering the probably incorrect training data, they can significantly reduce the misclassifications, while their goal is to improve final classification accuracies, not detection of errors in training data [3]. In addition, Hipp and et al proposed a developed algorithm of data mining to extract the structure of the data. Deviation from this structure can then be hypothesized to be incorrect [4]. Grüning showed that classifiers can be used in detection of conflicts in datasets and gave practical recommendations for data correction. This approach uses support vector machines as a classification algorithm [5]. Marcus and Maletic used different methods of data mining which include statistical methods, clustering, pattern-based methods and ordinal association rules for atomatic error detection in real dataset. Their Experiments showed tthaeeexperiments showed that ordinal association rule is more efficient than the other methods, but it is appropriate only for the datasets whose lots of their attribute type is decimal or date [6]. In this paper fuzzy association rules are used to enhance the data quality. The structure of this paper is as follows: Section 2 presents an introduction of data quality and reasons of using data mining for improving data quality. In Section 3, the proposed method is explained, and in Section 4, the experimental results of proposed method have been given and analyzed. 2 DATA QUALITY AND DATA MINING Definition of data quality depends on the considered purpose, so there is no unique definition which can be stated formally. In literature, "appropriate for use" or "to meet end user needs" are used. According to the general

2 JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN definition of quality management, we define quality of data to meet customer needs. Contrary to popular belief, the quality is not necessarily error at the level of zero [7]. Data will have high quality if they are appropriate for applications, decision making and planning. Researches have defined different dimensions for data quality. Each dimension shows a particular view of quality. The more interesting dimensions are as follow: Consistency: the rate of violation from the defined significant rules on the dataset. Availability: the rate of data availability, being easy and the speed of data retrieval. Accuracy: the distance between value of v and value of v ' that represents the entity which v tries to portray it. Completeness: the database includes all of the related entities. Value added: the rate of being useful. Data mining is an automatic process to extract the patterns as an implicit knowledge in the database. Data mining utilizes multiple scientific fields simultaneously, such as the technology of data base, artificial intelligence, machine learning, neural networks, statistics, pattern recognizer, knowledge based systems and information retrieval. Data mining methods are appropriate for improving different dimensions of data quality, since they are aimed at finding abnormal patterns in large volumes of dataset. In 2, Hipp introduced data quality mining as a new approach from research and application viewpoint. The goal of data quality mining is to employ data mining methods in order to detect, quantify, explain and correct data quality deficiencies in huge databases [4]. In this paper, the accuracy dimension of data quality has been considered using data mining method. 3 THE PROPOSED IDEA As Among data mining methods, association rules is a very active research topic that has been used widely in lots of cases such as basket analysis, decision maker support systems, theft detection, etc. Data quality is also one of the topics where using association rules can be useful, because In comparison with other methods of data mining, association rules are understandable and can clearly discover and describe the dependency between data. Also association rules are independent of each other. Therefore, recision of simpler rules has no effect on the other rules. One of the defects of using association rules for data cleaning is that association rules can not show properly the associations between quantitative values. As an example in Table I, job and degree are categorical attributes and is numerical attribute. If we want to obtain association rules in these data with support>, via Apriori algorithm, only we can find some BS degrees, whereas there is an association between and job or degree and. For example, some one who have a high degree level have more than the people who have a mean degree level. Apriori algorithm is not capable of discovering such associations between quantitative values. As much of data in real world contain quantitative values and there may be errors in this type of values, the proposed method in [4] cannot be an applicable to detect error. In this paper, the presented method in [4] will be extended and association rules between quantitative values will be discovered with the help of fuzzy sets concept. 3. Fuzzy Association Rule Fuzzy association rules are the rules which are extracted from a fuzzy dataset. A fuzzy rule in the form of A B where A, B are fuzzy sets and A B, is called fuzzy association rule. Using of fuzzy concepts has some advantages. First, the discovered association rules are more understandable. These concepts make a transition between numerical values of data and categorical concepts. Second, these concepts help discover the rules between quantitative values. For example, the following fuzzy sets are defined for the data shown in Table I: High job level = {manager, deputy} Mean job level = {designer} High degree grade = {PHD, MS} Mean degree grade = {BS, 2th} High = {4-5} Mean = {-2} TABLE INSTANCE DATABASE Trans_id Job Degree Income desiger BS 2 desiger BS 5 3 desiger 2th 2 4 manaer PHD 4 5 manage MS 45 6 deputy PHD 5 The following fuzzy association rules may be stated by the mentioned fuzzy sets: High job level High degree grade Mean job level Mean degree grade High job level High Mean job level Mean Now assume the following transaction: Transaction_id 7: job = deputy, degree = PHD, = Considering the dataset shown by Table I, an error can be detected in the above transaction, as the person who is deputy cannot have of, while his is expected to be much higher. This transaction violates "High job level High " fuzzy rule. It should be noted that the error would not be detected by the rules extracted by Apriori algorithm. 3.2 proposed method The proposed method is consisted of steps that the description of each step is as follows:

3 JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN a) Data preprocessing: in this step, data is converted to standard format and lost values are managed. For example, the values of date attribute in some may be in the form of YYYY/MM/DD and in some other in the form of YYYY-MM-DD. To obtain a correct model of the associations, implementation of this step is necessary. b) Mapping data to fuzzy values: fuzzy sets and fuzzy membership functions for quantitative data are defined in accordance with the knowledge of experienced experts. Then, data values are mapped to fuzzy values by fuzzy membership functions. Table II shows fuzzy mapping for and job attributes of Table I. Fuzzytrans_id TABLE 2 MAPPING TABLE TO FUZZY VALUE High Mean High job level Mean job level.3. c) Fuzzy association rules extraction: Fuzzy association rules are extracted, via the presented algorithm in [9], in which all attributes are weighted equally. In this step, the rules with high confidence level are considered. Thus, rule set R is limited to R r R confidence ( r) where is the minimum confidence. Table III shows some extracted rules from Table II. TABLE 3 FUZZY ASSOCIATION RULES EXTRACTED FROM TABLE 2 Fuzzy association rule Confidence High job level High.44 Mean job level Mean.48 d) Consistency check of with discovered rules: the consistency of with discovered rules is checked in this step. Each of may violate some rules and may be consistent with other rules. Also, it may not fire some of the rules. Let R be the set of fuzzy association rules and Let D be the database of fuzzy. Let r X Y be a Fuzzy association rule and be membership function of X. X the mapping that determines whether a fuzzy transaction T D violates a rule r R, is defined as: In (), Y shows the degree that fuzzy transaction T violates r, the more Y, the more inconsistency of transaction T with rule r. As an example, the consistency check for transaction 7 is shown in Table IV. TABLE 4 CONSISTENCY CHECK FOR TRANSACTION6 WITH RULES IN TABLE 3 Fuzzytrans_id 7 Violate: D R [,] Y if X rule_ satisfy Y rule_ satisfy else rule_satisfy parameter is the threshold for fuzzy rule satisfaction. Noncorrelated rule Mean job level Mean Consistent rule High job level High degree grade Inconsistent rule High job level High e) Scoring and ranking the : A score is assigned to each transaction by summing the confidence values of the rules it violates. The score of each transaction is computed as following: score ( R T r R ) : D R confidence ( r). violate ( T, r) The tuning parameter R allows assessing the confidences depending on their value [4]. For example the score of transaction 7 is (.44). The that have scores higher than score_treshold, the minimal threshold for scores, will be presented as a list sorted according to the rules that are violated or hold. Based on this information together with her background knowledge the user will decide upon the trustworthiness of single or groups of similar and finally upon the quality of the whole dataset. Used algorithm with according to upper proposed method is shown in figure. Preprocess records // for enhancing data quality Map dataset to fuzzy value Generate fuzzy association rules with determined support and confidence For all unmarked transaction in the dataset For all generated fuzzy association rules If transaction doesn t satisfy the rule Then mark transaction as a possible error and save confidence of rule End for Set score of each transaction with summing saved confidence End for Sort marked transaction by the score in descending order Output marked with consequent of the violated rules Fig.. Proposed algorithm

4 JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN EVALUATION RESULTS In this section, proposed method is evaluated and the results are compared with presented method in [4]. All the evaluations have been implemented on a system equipped with Core 2 Duo 2.26 GHz CPU, 3GB RAM and Win 7. Usually for evaluating the performance of information retrieval and classification algorithms, precision and recall measures are used. These measures are defined by (3) and (4) based on the confusion matrix entries, shown in Table V. In this paper, in addition to these two measures, the runtime of the two methods has been also compared. Real Classification Correct Incorrect Precision = TP TP FP Recall = TP TP FN TABLE 5 CONFUSION MATRIX Transactions classification by proposed mthod Correct TP FP Incorrect FN TN In order to evaluate the proposed method, two datasets has been used. One of the datasets is Adult dataset from UCI and another dataset is real dataset that is Personnel data of IT company. Adult includes two numerical attributes called and hpw and two categorical attributes called job and degree. Also, personnel data includes two numerical attributes called and valuation score and two categorical attributes called job and degree. instances of the of this datasets have been selected and manipulated to make imitation errors. Then, proposed method and the presented one in [4] are applied to the dataset. In the tests related to precision and recall, min_support, min_confidence, score_treshold and min_rule_satisfy are considered to be.2,.7,.3 and.4, respectively. The tuning parameter is equal 7 in two methods. Because based on the experimental result of Hipp this value is well when it is important for us that don t violate the rules with high confidence value [4]. Figure 2 compares the precision of the proposed method with the method proposed in [4], in terms of number of created errors in Adult and Personal datasets. The figure shows that the proposed method outperforms the method proposed in [4], as lots of the association between quantitative data in datasets are not discovered by Apriori algorithm and play no role in ranking. Some attributes values are in a range which (3) (4) causes rules support would not be satisfied (above example). Precision curves have wavy shapes, because any change in data causes a change in the support of rules which can affect their detections. Precision Number Of Error In Dataset Adult Dataset-Fuzzy Association Rule Adult Dataset-Apriorit Real Dataset-Fuzzy Association Rule Real Dataset-Apriori Fig. 2. The precision measure for the proposed method and the method described in [4] for two datasets Figure 3 also compares the recall measure of Adult and Personal datasets for two methods. It is clear that the proposed method delivers better results. Recall Number Of Error In Dataset Adult Dataset-Fuzzy Association Rule Adult Dataset-Apriorit Real Dataset-Fuzzy Association Rule Real Dataset-Apriori Fig. 3. The recall measure for the proposed method and the method described in [4] for two datasets Figures 4, 5 compare the precision and recall of the proposed method with the method proposed in [4], in terms of varying min_support and min_confidence for Adult dataset. The figures show that increasing the min_support or min_confidence causes the decreasing the precision and recall. Because increasing the min_support or min_confidence causes some association rules are not discovered and play no rule in error detection. Therefore some errors are not detected.

5 JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN Precision-Recall Varying min_support Precision-Fuzzy Association Rule Precision-Apriori Recall-Fuzzy Association Rule Recall-Apriori Fig. 4 The precision and recall measure for the proposed method and the method described in [4] for Adult Figure 6 shows the runtime of the proposed method and the method proposed in [4]. Precision-Recall Varying min_confidence Precision-Fuzzy Association Rule Precision-Apriori Recall-Fuzzy Association Rule Recall-Apriori Fig. 5 The precision and recall measure for the proposed method and the method described in [4] for Adult RunTime Number Of Transactions Fuzzy Association Rule Apriori Fig. 6 The runtime of the proposed method and the mothod presented in [4] for Personal dataset According to figure 6, the runtime of the proposed method is better. The main reason is due to the difference in runtime of association rules discovery. Fuzzy association rules algorithm is much faster than Apriori algorithm, because Apriori algorithm must check all of the numerical and categorical data in the dataset to find frequent itemsets. Also, it should find the support of each of itemsets which is time consuming due to variety of numeric values. While fuzzy association rules algorithm only works with the fuzzy sets whose numbers are more limited and are defined by user. 5 CONCLUSION Automated data quality improvement is a necessity for finding incorrect data in large databases. In this paper, a method based on data mining approaches is presented to improve data quality. This paper s proposed approach for data quality improvement uses fuzzy association rules, by which hidden rules in datasets will be discovered. Then, the consistency of with these rules is checked and a score is assigned to each transaction. The high scores are assigned to the which are suspicious to have defects. User s knowledge on suspicious and background knowledge about data will be used in the process of determining the accuracy of and ultimately total quality of dataset. Evaluation results show the effectiveness of the proposed method. REFERENCES [] D. Luebbers, U. Grimmer, M. Jarke, Systematic Development of Data Mining-Based Data Quality Tools, Proc. of the 29-th International Conference on Very Large Data Bases, Berlin, Germany, pp , 23. [2] J. Hipp, M. Müller, J. Hohendorff, F. Naumann, Rule-Based Measurement of Data Quality in Nominal Data, Proc. of the 2th International Conference on Information Quality (ICIQ), Cambridge, USA, 27. [3] C. Brodley, M. Friedl, Identifying Mislabeled Training Data, Journal of Artificial Intelligence Research. Vol., pp. 3-67, 999. [4] J. Hipp, U. Güntzer, U. Grimmer, Data Quality Mining Making a Virtue of Necessity, Proc. of the 6-th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Santa Barbara, California, pp , 2. [5] F. Grüning, Data Quality Mining: Employing Classifiers for Assuring consistent Datasets, Proc. of the 2-th Information Technologies in Environmental Engineering, Springer-Verlag Berlin Heidelberg, pp , 27. [6] J. Maletic, A. Marcus, Data cleansing: Beyond integrity analysis, Proc. of the Conference on Information Quality, MIT, Boston, pp. 2-29, 2. [7] J. Geiger, Data Quality Management: The Most Critical Initiative You Can Implement, Intelligent Solutions, Inc., Boulder, Paper 98-29, 24. [8] J. Maletic, A. Marcus, K. Lin, Ordinal Association Rules for Error Identification in DataSets, Proc. of -th Intl. conf. Information and Knowledge Management(CIKM), Atlanta, GA, pp , 2 [9] D. Olson, Y. Li, Mining Fuzzy Wighted Association Rules, Proc. of the 4th IEEE Intl. Conference on System Sciences, Hawaii, 27. []

6 JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN Fatemeh Ghorbanpour A. received B.Sc. and M.Sc. degrees in Computer Engineering- Software from Iran Universities. Her current research interests include information retrieval and data mining. She is a member of the IEEE. Dr. M. Mohsen Pedram received his B.Sc., M.Sc. and Ph.D. in electronic engineering from the Iran universities. His major research interests are Machine Learning, Image Processing, Artificial intelligence, Data mining, and Pattern Recognition. He has published many papers in various fields. At present, he is teaching in Department of Computer Engineering in Iran. Dr. Kambiz Badie received his B.Sc., M.Sc. and Ph.D. in electronic engineering from the Tokyo Institute of Technology, Japan, majoring in Pattern Recognition & Artificial Intelligence. His major research interests are Machine Learning, Cognitive Modeling, and Systematic Knowledge Processing in general, and Analogical Knowledge Processing Experience-Based Modeling, and Interpretative Modeling in particular with emphasis on idea and technique generation. He has published many papers in various fields. At present, he is the Director of IT Faculty at Iran Telecom Research Center Mohammad Alishahi received B.Sc. degree in Computer Engineering- Software and M.Sc. degree in Industrial Engineering from Iran Universities. His current research interests include data mining and Management information systems.

Data Mining: A Preprocessing Engine

Data Mining: A Preprocessing Engine Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Online Farsi Handwritten Character Recognition Using Hidden Markov Model

Online Farsi Handwritten Character Recognition Using Hidden Markov Model Online Farsi Handwritten Character Recognition Using Hidden Markov Model Vahid Ghods*, Mohammad Karim Sohrabi Department of Electrical and Computer Engineering, Semnan Branch, Islamic Azad University,

More information

Building A Smart Academic Advising System Using Association Rule Mining

Building A Smart Academic Advising System Using Association Rule Mining Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 raedamin@just.edu.jo Qutaibah Althebyan +962796536277 qaalthebyan@just.edu.jo Baraq Ghalib & Mohammed

More information

How To Use Neural Networks In Data Mining

How To Use Neural Networks In Data Mining International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,

More information

Application of Data Mining Techniques in Intrusion Detection

Application of Data Mining Techniques in Intrusion Detection Application of Data Mining Techniques in Intrusion Detection LI Min An Yang Institute of Technology leiminxuan@sohu.com Abstract: The article introduced the importance of intrusion detection, as well as

More information

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Robust Outlier Detection Technique in Data Mining: A Univariate Approach Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,

More information

Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report

Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report G. Banos 1, P.A. Mitkas 2, Z. Abas 3, A.L. Symeonidis 2, G. Milis 2 and U. Emanuelson 4 1 Faculty

More information

An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus

An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus Tadashi Ogino* Okinawa National College of Technology, Okinawa, Japan. * Corresponding author. Email: ogino@okinawa-ct.ac.jp

More information

A Survey on Intrusion Detection System with Data Mining Techniques

A Survey on Intrusion Detection System with Data Mining Techniques A Survey on Intrusion Detection System with Data Mining Techniques Ms. Ruth D 1, Mrs. Lovelin Ponn Felciah M 2 1 M.Phil Scholar, Department of Computer Science, Bishop Heber College (Autonomous), Trichirappalli,

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

More information

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

More information

Chapter ML:XI. XI. Cluster Analysis

Chapter ML:XI. XI. Cluster Analysis Chapter ML:XI XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

Visualization of large data sets using MDS combined with LVQ.

Visualization of large data sets using MDS combined with LVQ. Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

More information

ARTIFICIAL INTELLIGENCE METHODS IN EARLY MANUFACTURING TIME ESTIMATION

ARTIFICIAL INTELLIGENCE METHODS IN EARLY MANUFACTURING TIME ESTIMATION 1 ARTIFICIAL INTELLIGENCE METHODS IN EARLY MANUFACTURING TIME ESTIMATION B. Mikó PhD, Z-Form Tool Manufacturing and Application Ltd H-1082. Budapest, Asztalos S. u 4. Tel: (1) 477 1016, e-mail: miko@manuf.bme.hu

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

SVM Ensemble Model for Investment Prediction

SVM Ensemble Model for Investment Prediction 19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of

More information

Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results

Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results , pp.33-40 http://dx.doi.org/10.14257/ijgdc.2014.7.4.04 Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results Muzammil Khan, Fida Hussain and Imran Khan Department

More information

How To Understand The Impact Of A Computer On Organization

How To Understand The Impact Of A Computer On Organization International Journal of Research in Engineering & Technology (IJRET) Vol. 1, Issue 1, June 2013, 1-6 Impact Journals IMPACT OF COMPUTER ON ORGANIZATION A. D. BHOSALE 1 & MARATHE DAGADU MITHARAM 2 1 Department

More information

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand

More information

RULE-BASE DATA MINING SYSTEMS FOR

RULE-BASE DATA MINING SYSTEMS FOR RULE-BASE DATA MINING SYSTEMS FOR CUSTOMER QUERIES A.Kaleeswaran 1, V.Ramasamy 2 Assistant Professor 1&2 Park College of Engineering and Technology, Coimbatore, Tamil Nadu, India. 1&2 Abstract: The main

More information

Prediction of Stock Performance Using Analytical Techniques

Prediction of Stock Performance Using Analytical Techniques 136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

More information

A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan

A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan , pp.217-222 http://dx.doi.org/10.14257/ijbsbt.2015.7.3.23 A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan Muhammad Arif 1,2, Asad Khatak

More information

Data Mining Applications in Manufacturing

Data Mining Applications in Manufacturing Data Mining Applications in Manufacturing Dr Jenny Harding Senior Lecturer Wolfson School of Mechanical & Manufacturing Engineering, Loughborough University Identification of Knowledge - Context Intelligent

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Machine Learning in Hospital Billing Management. 1. George Mason University 2. INOVA Health System

Machine Learning in Hospital Billing Management. 1. George Mason University 2. INOVA Health System Machine Learning in Hospital Billing Management Janusz Wojtusiak 1, Che Ngufor 1, John M. Shiver 1, Ronald Ewald 2 1. George Mason University 2. INOVA Health System Introduction The purpose of the described

More information

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD 72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD Paulo Gottgtroy Auckland University of Technology Paulo.gottgtroy@aut.ac.nz Abstract This paper is

More information

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

More information

How To Identify A Churner

How To Identify A Churner 2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management

More information

A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique

A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique Aida Parbaleh 1, Dr. Heirsh Soltanpanah 2* 1 Department of Computer Engineering, Islamic Azad University, Sanandaj

More information

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Enhanced Boosted Trees Technique for Customer Churn Prediction Model IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm

Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm R.Karthiyayini 1, J.Jayaprakash 2 Assistant Professor, Department of Computer Applications, Anna University (BIT Campus),

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH 205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology

More information

Performance Evaluation of Requirements Engineering Methodology for Automated Detection of Non Functional Requirements

Performance Evaluation of Requirements Engineering Methodology for Automated Detection of Non Functional Requirements Performance Evaluation of Engineering Methodology for Automated Detection of Non Functional J.Selvakumar Assistant Professor in Department of Software Engineering (PG) Sri Ramakrishna Engineering College

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Knowledge-Driven Decision Support System Based on Knowledge Warehouse and Data Mining for Market Management

Knowledge-Driven Decision Support System Based on Knowledge Warehouse and Data Mining for Market Management Knowledge-Driven Decision Support System Based on Knowledge Warehouse and Data Mining for Market Management Dr. Murtadha M. Hamad 1 and Banaz Anwer Qader 2 1,2 College of Computer - Anbar University Anbar

More information

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor

More information

Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type.

Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type. Chronological Sampling for Email Filtering Ching-Lung Fu 2, Daniel Silver 1, and James Blustein 2 1 Acadia University, Wolfville, Nova Scotia, Canada 2 Dalhousie University, Halifax, Nova Scotia, Canada

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

More information

Miracle Integrating Knowledge Management and Business Intelligence

Miracle Integrating Knowledge Management and Business Intelligence ALLGEMEINE FORST UND JAGDZEITUNG (ISSN: 0002-5852) Available online www.sauerlander-verlag.com/ Miracle Integrating Knowledge Management and Business Intelligence Nursel van der Haas Technical University

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering

A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering Khurum Nazir Junejo, Mirza Muhammad Yousaf, and Asim Karim Dept. of Computer Science, Lahore University of Management Sciences

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Martin Hlosta, Rostislav Stríž, Jan Kupčík, Jaroslav Zendulka, and Tomáš Hruška A. Imbalanced Data Classification

More information

Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India

Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India Volume 5, Issue 6, June 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Multiple Pheromone

More information

Mining Association Rules: A Database Perspective

Mining Association Rules: A Database Perspective IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.12, December 2008 69 Mining Association Rules: A Database Perspective Dr. Abdallah Alashqur Faculty of Information Technology

More information

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR

More information

DATA PREPARATION FOR DATA MINING

DATA PREPARATION FOR DATA MINING Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI

More information

An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework

An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework Jakrarin Therdphapiyanak Dept. of Computer Engineering Chulalongkorn University

More information

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE S. Anupama Kumar 1 and Dr. Vijayalakshmi M.N 2 1 Research Scholar, PRIST University, 1 Assistant Professor, Dept of M.C.A. 2 Associate

More information

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques. International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES

CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES International Journal of Scientific and Research Publications, Volume 4, Issue 4, April 2014 1 CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES DR. M.BALASUBRAMANIAN *, M.SELVARANI

More information

Modeling and Design of Intelligent Agent System

Modeling and Design of Intelligent Agent System International Journal of Control, Automation, and Systems Vol. 1, No. 2, June 2003 257 Modeling and Design of Intelligent Agent System Dae Su Kim, Chang Suk Kim, and Kee Wook Rim Abstract: In this study,

More information

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer.

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer. REVIEW ARTICLE ISSN: 2321-7758 UPS EFFICIENT SEARCH ENGINE BASED ON WEB-SNIPPET HIERARCHICAL CLUSTERING MS.MANISHA DESHMUKH, PROF. UMESH KULKARNI Department of Computer Engineering, ARMIET, Department

More information

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013. Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing

More information

Evaluation & Validation: Credibility: Evaluating what has been learned

Evaluation & Validation: Credibility: Evaluating what has been learned Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model

More information

CLOUD COMPUTING AN EFFICIENT WAY TO PROVIDE FOR IT SERVICE IN IRAN METEOROLOGICAL ORGANIZATION

CLOUD COMPUTING AN EFFICIENT WAY TO PROVIDE FOR IT SERVICE IN IRAN METEOROLOGICAL ORGANIZATION CLOUD COMPUTING AN EFFICIENT WAY TO PROVIDE FOR IT SERVICE IN IRAN METEOROLOGICAL ORGANIZATION Sedigheh Mohammadesmail and *Roghayyeh Masoumpour Amirabadi Department of Library and Information Science,

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product

Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago sagarikaprusty@gmail.com Keywords:

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

Domain Classification of Technical Terms Using the Web

Domain Classification of Technical Terms Using the Web Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

More information

Diagnosis of Students Online Learning Portfolios

Diagnosis of Students Online Learning Portfolios Diagnosis of Students Online Learning Portfolios Chien-Ming Chen 1, Chao-Yi Li 2, Te-Yi Chan 3, Bin-Shyan Jong 4, and Tsong-Wuu Lin 5 Abstract - Online learning is different from the instruction provided

More information

Management Science Letters

Management Science Letters Management Science Letters 4 (2014) 905 912 Contents lists available at GrowingScience Management Science Letters homepage: www.growingscience.com/msl Measuring customer loyalty using an extended RFM and

More information

Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features

Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features Semantic Video Annotation by Mining Association Patterns from and Speech Features Vincent. S. Tseng, Ja-Hwung Su, Jhih-Hong Huang and Chih-Jen Chen Department of Computer Science and Information Engineering

More information

NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE

NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE www.arpapress.com/volumes/vol13issue3/ijrras_13_3_18.pdf NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE Hebah H. O. Nasereddin Middle East University, P.O. Box: 144378, Code 11814, Amman-Jordan

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

1. Classification problems

1. Classification problems Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification

More information

Big Data with Rough Set Using Map- Reduce

Big Data with Rough Set Using Map- Reduce Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,

More information

Mining changes in customer behavior in retail marketing

Mining changes in customer behavior in retail marketing Expert Systems with Applications 28 (2005) 773 781 www.elsevier.com/locate/eswa Mining changes in customer behavior in retail marketing Mu-Chen Chen a, *, Ai-Lun Chiu b, Hsu-Hwa Chang c a Department of

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

College information system research based on data mining

College information system research based on data mining 2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore College information system research based on data mining An-yi Lan 1, Jie Li 2 1 Hebei

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

Web Usage Association Rule Mining System

Web Usage Association Rule Mining System Interdisciplinary Journal of Information, Knowledge, and Management Volume 6, 2011 Web Usage Association Rule Mining System Maja Dimitrijević The Advanced School of Technology, Novi Sad, Serbia dimitrijevic@vtsns.edu.rs

More information

EXTENDED ANGEL: KNOWLEDGE-BASED APPROACH FOR LOC AND EFFORT ESTIMATION FOR MULTIMEDIA PROJECTS IN MEDICAL DOMAIN

EXTENDED ANGEL: KNOWLEDGE-BASED APPROACH FOR LOC AND EFFORT ESTIMATION FOR MULTIMEDIA PROJECTS IN MEDICAL DOMAIN EXTENDED ANGEL: KNOWLEDGE-BASED APPROACH FOR LOC AND EFFORT ESTIMATION FOR MULTIMEDIA PROJECTS IN MEDICAL DOMAIN Sridhar S Associate Professor, Department of Information Science and Technology, Anna University,

More information

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang

More information

APPLYING GMDH ALGORITHM TO EXTRACT RULES FROM EXAMPLES

APPLYING GMDH ALGORITHM TO EXTRACT RULES FROM EXAMPLES Systems Analysis Modelling Simulation Vol. 43, No. 10, October 2003, pp. 1311-1319 APPLYING GMDH ALGORITHM TO EXTRACT RULES FROM EXAMPLES KOJI FUJIMOTO* and SAMPEI NAKABAYASHI Financial Engineering Group,

More information

Web Mining as a Tool for Understanding Online Learning

Web Mining as a Tool for Understanding Online Learning Web Mining as a Tool for Understanding Online Learning Jiye Ai University of Missouri Columbia Columbia, MO USA jadb3@mizzou.edu James Laffey University of Missouri Columbia Columbia, MO USA LaffeyJ@missouri.edu

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS

A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS Charanma.P 1, P. Ganesh Kumar 2, 1 PG Scholar, 2 Assistant Professor,Department of Information Technology, Anna University

More information

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive

More information

IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION

IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION http:// IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION Harinder Kaur 1, Raveen Bajwa 2 1 PG Student., CSE., Baba Banda Singh Bahadur Engg. College, Fatehgarh Sahib, (India) 2 Asstt. Prof.,

More information