A New Instance Weighting Method for Transfer Learning

Size: px
Start display at page:

Download "A New Instance Weighting Method for Transfer Learning"

Transcription

1 A New Instance Weighting Method for Transfer Learning BARIŞ KOÇER, AHMET ARSLAN Department of Computer Engineering Selçuk University Alaaddin Keykubat Kampusu, Muhendislik Mimarlık Fakültesi, Selçuklu / Konya TURKEY bariskocer@selcuk.edu.tr, ahmetarslan@selcuk.edu.tr Abstract: - Abstract knowledge which will be transferred from one task to other can be parameters, feature representation or training instance. Instance transfer is the intuitively appealing way for knowledge transfer but instances of the source task must be reweighted for target task. We developed a novel instance transferring and weighting method for inductive transfer learning problems using genetic algorithms. Results are better than one of the state of art instance transfer methods graph transfer [11]. Key-Words: Transfer learning; genetic algorithms; neural networks; instance transfer. 1 Introduction Traditional supervised machine learning techniques start from scratch with some training data and try to make predictions based on same distributed training data. If the distribution changes for example due to outdated training data, traditional machine learning task needs to be re-trained with new training data. However in real life, it may be very difficult to obtain more and more training data for every time when the distribution changed. A human uses its past experiences, instead of searching lots of new training data, while solving a new problem. This ability is the source of human intelligence to quickly adapt to new situations with sparse training data. In real life transfer learning can be described as using past experiences which is related to current problem in order to solve the problem more quickly. In machine learning, transfer learning can be described as extracting and transferring abstract knowledge from tasks to another related task which need extra training resource. So transfer learning techniques are inspired from human intelligence to adapt new situations. The need for transfer learning in machine learning is first discussed in a workshop on Learning to learn [1] and a good comprehensive survey for transfer learning is prepared by Pan and Yang [2]. There are two kinds of tasks in transfer learning. First one is source task which knowledge will be extracted from and mostly has enough training data and second one is target task which has insufficient labeled or unlabeled training. So target task needs auxiliary data or knowledge Main Problems of Transfer Learning: A transfer learning method should solve the problems below for efficient abstract knowledge transfer: -Determining of tasks relatedness: In order to have performance improvement in target task source and target tasks must be relevant. Otherwise all attempts for knowledge transfer will fail, even will reduce the performance of target task and this side effect is named as negative transfer -Determining knowledge transfer method: This is the main step of transfer learning methods because in this step best knowledge transfer strategy should be selected which will transfer as much useful knowledge as possible. -Determining knowledge transfer amount: If first and second problems are solved next step is determining how many knowledge will be transferred. For example in instance transfer approach if too many instances from source task is transferred than target task will bias through source task and negative transfer will happen. -Enabling reusability: This is second priority problem compared to first three. If transferred knowledge is in a reusable format similar target task can combine this knowledge with new knowledge and may have better performance. ISBN:

2 1.2. Transfer Learning Settings: Transfer learning problem can be categorized by absence or presence of labeled data for target or source task. Inductive transfer learning: When there is a little labeled data for target task and there are lots of labeled or unlabeled data for source task this type of transfer learning problems are named as inductive transfer learning. Transductive Transfer learning: When there is lots of labeled data for source task but there is no labeled data for target task this type of transfer learning problems are categorized in transductive transfer learning. Unsupervised Transfer learning: In this setting there is no labeled data for both source and target task but there much more unlabeled data for source task than target task Transfer Approaches: Each different transfer learning setting need different transfer approaches. Transfer approaches for transfer learning can be categorized in three main titles. Instance transfer: This approach is intuitively appealing and based on transferring or weighting suitable part of labeled source data to target task. This approach can be applied to both inductive and transductive transfer settings. Feature representation transfer: Finding a common feature representation which reduces differences between source and target task is another transfer learning approach and can be applied in all transfer learning approaches. Parameter transfer: Discovering shared priors of source and target task or transferring shared parameters between source and target task may improve performance of target task. This type of approach is named as parameter transfer and can only be applied in inductive transfer learning setting The rest of the paper is organized as follows. In Section 2 related transfer learning work are summarized, in Section 3 developed method is introduced, in Section 4 experimental settings and performance comparisons of the method against the graph transfer[11] are illustrated, in Section 5 experimental result are discussed and in Section 6 comparative analysis are included. 2. Related work: Transfer learning methods are applied to different problem domains. For example in reinforcement learning, skill transfer [3], action schema transfer [4] and control knowledge transfer [5] are good samples studies. We have focused instance transfer approach in inductive transfer setting. Inductive transfer setting can be described with indoor wi-fi localization task which is training a model in an indoor environment which is split into fixed cell. This task is simply predicting cell coordinate of a receiver using signal strength of access points. The problem which transfer learning becomes necessary is labeled data for indoor wi-fi localization task become out of date easily due new obstacles or reflection in time [6] or hardware of receiver or access points is changed [7]. Thus instead of obtaining new training data again transfer learning methods can be applied to adapt old training data to new situation. Transfer learning can also be used for obtaining good performance with sparse labeled data in wi-fi localization task. For example training data may be obtained only for a little part of a very big indoor environment and it is very difficult to obtain full map of the environment [9] so when transfer learning methods are applied, amount of labeled data need to build localization model is significantly reduced. We tested our method with text categorization task. One of the prior works about text categorization with transfer learning is studied by Dai et al. [10] which uses expectation maximization based Naïve Bayes classifier and another method proposed by Eaton et al. [11] which uses graph based transferability measurement and extracts transfer parameters from source tasks. However these text categorization algorithms are old works, these works are latest examples for instance transfer approach used in text categorization. Another sample for text categorization is proposed by Dai et al. [12]. This is a text categorization method which is modified version of AdaBoost algorithm to leverage old labeled data with the help of a little newly labeled data to build an accurate classification model. In this work we have developed a genetic algorithm based transfer learning method and only example for genetic algorithms used in transfer learning is [13]. In this early work knowledge which will be transferred is not evaluated for target task and knowledge which will be transferred selected randomly from solution pool. We modified this algorithm for classification task and modified it for better performance. ISBN:

3 3. Instance weighting via genetic algorithms: We focus on instance transfer approach for inductive transfer learning in this work because there is not any instance selection or weighting algorithm that uses the power of genetic algorithms. We used hybrid genetic algorithms with artificial neural network (ANN). Although we used a usual hybrid algorithm, this work is not a new GA/ANN hybrid algorithm instead this is an instance weighting approach which uses this hybrid algorithm. Two different set of data are used as usual in inductive transfer learning. One of them is labeled data of source tasks DS and the other is labeled target task data DT. Aim of the genetic algorithm is determining best ANN weights and weights for each DS instances, which used by ANN as learning rate. Fitness of the each individual is calculated by predictive accuracy of the ANN after trained by DS (with weights in individual) +DT (with weight 1) 1 epoch and using DT as test data. "1" is used as weight for DT data instances in order to establish bias toward to target task. We choose 1 for DT instances because we used 1 as biggest weight while choosing random initial weights. At this point we update network weights of individuals after 1 epoch fitness evaluation so we both use the exploitation ability of genetic algorithms and exploration ability of neural networks. So DS weights and ANN weights are coded into individual and when fitness evaluation is needed ANN weights are placed to corresponding place in ANN and ANN is trained with labeled data of source task DS by using each instances weight which is extracted from individual as learning rate. Thus instances which have bigger weight values affect more than small weight valued instances in fitness evaluation. Training data of target task DT is also used for fitness evaluation with maximum weight value in order to provide bias the network to target task. So in this hybrid model neural network is trained with DS + DT but we used maximum available weight of 1 for DT training instances in order to provide bias through to target task and fitness values are calculated by predictive accuracy of the trained network on DT dataset. After pre-determined count of generations is generated, best fitness valued individual is trained in an artificial neural network for pre-determined count of epochs using DS + DT dataset. Figure 1. Flowchart of proposed instance weighting method. 4.Experimental Setting Proposed method is tested on text categorization task. Text categorization is assigning a text document to one of the pre-determined main categories. We used genetic algorithm artificial neural network hybrid algorithm. We used 50 input node 100 hidden node and 1 output node in neural network so 5100 real valued weights are coded in individual for neural network. Input nodes are binary valued and each one corresponds to a word which is determined by WEKA s [14] string to word vector component from DS. We used 50 most discriminative words so 50 input features are used. Before determining the words, all header information is deleted from all documents. Each document in DS is converted to a training instance. If document includes desired word then corresponding input feature is set to 1 otherwise feature set to 0. There is only one output feature to ISBN:

4 decide whether instance belongs to desired sub category or not. We compared our result with graph based method [11] and almost same settings are used. We have had to choose this work because this is the one of the latest work for instance transfer approach in inductive transfer setting. Seven per cent of twenty newsgroups data set [8] i.e. 70 unique positive and 70 unique negative documents for each task are used. In order to prevent interfere, negative examples are selected from first sub categories of the main categories and positive examples of each task is selected remaining sub categories. So there are 13 tasks. When a task is selected as target task remaining tasks are assumed as source tasks and aim is transferring and weighting instances from these 12 source tasks and improving performance of the target task which has limited labeled data. All available source task data is also coded in individuals in genetic algorithms to determine instance weights. Population size is set to 50; maximum generation count is set to 20. Crossover rate is selected 75% and mutation rate is selected 1%. After 20 generation an feed-forward multilayer ANN is trained 100 epoch by backpropagation, using neural network weights and instance weights of the best fitness valued individual with DS+ DT data set. Every trial for different percent of DT is repeated 10 times independently by selecting desired training percent of data randomly from available training data and result in graphics are drawn from the mean value of these independent runs. Test results for different tasks, which are also included in [11], are illustrated in figure 2 to 5. Figure 3. Experimental results for rec.sport.baseball Figure 4. Experimental results for sci.space Figure 5. Experimental results for talk.politics.mideast Figure 2. Experimental results for comp.windows.x ISBN:

5 5.Experimental Results: Although we used 50 features instead of 100 which is the feature count of graph transfer method used in [11], proposed method has better results in sci.space and talk.politics.mideast data sets and has better result for high per cent of training data in comp.windows.x and rec.sport.baseball data sets. Performance of our method is worse than graph transfer as seen in Figure 2 and Figure 3 while using low percent of training data because instead of choosing some of the source tasks for instance transfer in graph transfer our method uses all available knowledge of source tasks so if there is a little training data it may decide wrong weights and there will be more negative effect than graph transfer. But this is not always same, because in figure 4 and 5 performance of proposed method is better beginning from low percent of training data. For high percent of training data there will be more positive effect and our method performs better than graph transfer as it seen in figure 2 to 5. 6.Conclusion: In this work we have demonstrated a novel instance weighting algorithm using genetic algorithms. We have also proposed solutions for the transfer learning problems which described in section 1.1 with the proposed instance weighting method because with the proposed method, task relatedness is calculated by natural selection. Only feasible information is transferred to target task using weighted instance transfer. Last advantage of the proposed method is of course reusability. When similar classification task are met, same solution pool can be evaluated and used. References [1] IPS95_LTL/transfer.workshop.1995.html [2] Pan, S.J., & Yang Q. (2008). A Survey on Transfer Learning. Department of Computer Science and Engineering Hong Kong University of Science and Technology. [3] Konidaris, G. and Barto, A Building portable options: skill transfer in reinforcement learning. In Proceedings of the 20th international Joint Conference on Artifical intelligence (Hyderabad, India, January 06-12, 2007). R. Sangal, H. Mehta, and R. K. Bagga, Eds. Ijcai Conference On Artificial Intelligence. Morgan Kaufmann Publishers, San Francisco, CA, [4] Cohen, P. R., Chang, Y., and Morrison, C. T Learning and transferring action schemas. In Proceedings of the 20th international Joint Conference on Artifical intelligence (Hyderabad, India, January 06-12, 2007). R. Sangal, H. Mehta, and R. K. Bagga, Eds. International Joint Conference On Artificial Intelligence. Morgan Kaufmann Publishers, San Francisco, CA, [5] Fernández, S., Aler, R., and Borrajo, D Transferring learned control-knowledge between planners. In Proceedings of the 20th international Joint Conference on Artifical intelligence (Hyderabad, India, January 06-12, 2007). R. Sangal, H. Mehta, and R. K. Bagga, Eds. International Joint Conference On Artificial Intelligence. Morgan Kaufmann Publishers, San Francisco, CA, [6] V.W. Zheng, Q. Yang, W. Xiang, and D. Shen, Transferring Localization Models over Time, Proc. 23rd Assoc. for the Advancement of Artificial Intelligence (AAAI) Conf. Artificial Intelligence, pp , July [7] V.W. Zheng, S.J. Pan, Q. Yang, and J.J. Pan, "Transferring Multi-Device Localization Models Using Latent Multi-Task Learning," Proc. 23rd Assoc. for the Advancement of Artificial Intelligence (AAAI) Conf. Artificial Intelligence, pp , July [8] Rennie, J.: 20 Newsgroups data set, sorted by date. Available online at (September 2003) [9] S.J. Pan, D. Shen, Q. Yang, and J.T. Kwok, Transferring Localization Models across Space, Proc. 23rd Assoc. for the Advancement of Artificial Intelligence (AAAI) Conf. Artificial Intelligence, pp , July [10] W. Dai, G. Xue, Q. Yang, and Y. Yu, Transferring Naive Bayes Classifiers for Text Classification, Proc. 22nd Assoc. for the Advancement of Artificial Intelligence (AAAI) Conf. Artificial Intelligence, pp , July [11] E. Eaton, M. desjardins, and T. Lane, Modeling Transfer Relationships between Learning Tasks for Improved Inductive ISBN:

6 Transfer, Proc. European Conf. Machine Learning and Knowledge Discovery in Databases (ECML/PKDD 08), pp , Sept [12] W. Dai, Q. Yang, G. Xue, and Y. Yu, Boosting for Transfer Learning, Proc. 24th Int l Conf. Machine Learning, pp , June [13] Koçer, B., Arslan, A Genetic Transfer Learning. Expert System with Application vol [14] Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann, San Francisco, CA (2000) ISBN:

Web Document Clustering

Web Document Clustering Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

A Perspective Analysis of Traffic Accident using Data Mining Techniques

A Perspective Analysis of Traffic Accident using Data Mining Techniques A Perspective Analysis of Traffic Accident using Data Mining Techniques S.Krishnaveni Ph.D (CS) Research Scholar, Karpagam University, Coimbatore, India 641 021 Dr.M.Hemalatha Asst. Professor & Head, Dept

More information

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining -

Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining - Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining - Hidenao Abe, Miho Ohsaki, Hideto Yokoi, and Takahira Yamaguchi Department of Medical Informatics,

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Martin Hlosta, Rostislav Stríž, Jan Kupčík, Jaroslav Zendulka, and Tomáš Hruška A. Imbalanced Data Classification

More information

PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES

PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES The International Arab Conference on Information Technology (ACIT 2013) PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES 1 QASEM A. AL-RADAIDEH, 2 ADEL ABU ASSAF 3 EMAN ALNAGI 1 Department of Computer

More information

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network General Article International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347-5161 2014 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Impelling

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool. International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm

More information

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

More information

Application of Data Mining in Medical Decision Support System

Application of Data Mining in Medical Decision Support System Application of Data Mining in Medical Decision Support System Habib Shariff Mahmud School of Engineering & Computing Sciences University of East London - FTMS College Technology Park Malaysia Bukit Jalil,

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

The Enron Corpus: A New Dataset for Email Classification Research

The Enron Corpus: A New Dataset for Email Classification Research The Enron Corpus: A New Dataset for Email Classification Research Bryan Klimt and Yiming Yang Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-8213, USA {bklimt,yiming}@cs.cmu.edu

More information

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information

More information

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,

More information

Introducing diversity among the models of multi-label classification ensemble

Introducing diversity among the models of multi-label classification ensemble Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and

More information

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington

More information

Analecta Vol. 8, No. 2 ISSN 2064-7964

Analecta Vol. 8, No. 2 ISSN 2064-7964 EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,

More information

Facilitating Business Process Discovery using Email Analysis

Facilitating Business Process Discovery using Email Analysis Facilitating Business Process Discovery using Email Analysis Matin Mavaddat Matin.Mavaddat@live.uwe.ac.uk Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process

More information

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques. International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant

More information

Random forest algorithm in big data environment

Random forest algorithm in big data environment Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest

More information

Prediction of Heart Disease Using Naïve Bayes Algorithm

Prediction of Heart Disease Using Naïve Bayes Algorithm Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,

More information

Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad Email: Prasad_vungarala@yahoo.co.in

Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad Email: Prasad_vungarala@yahoo.co.in 96 Business Intelligence Journal January PREDICTION OF CHURN BEHAVIOR OF BANK CUSTOMERS USING DATA MINING TOOLS Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad

More information

Effective Data Mining Using Neural Networks

Effective Data Mining Using Neural Networks IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 8, NO. 6, DECEMBER 1996 957 Effective Data Mining Using Neural Networks Hongjun Lu, Member, IEEE Computer Society, Rudy Setiono, and Huan Liu,

More information

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive

More information

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE S. Anupama Kumar 1 and Dr. Vijayalakshmi M.N 2 1 Research Scholar, PRIST University, 1 Assistant Professor, Dept of M.C.A. 2 Associate

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde

More information

Learning is a very general term denoting the way in which agents:

Learning is a very general term denoting the way in which agents: What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

More information

Rule based Classification of BSE Stock Data with Data Mining

Rule based Classification of BSE Stock Data with Data Mining International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification

More information

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence 8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

More information

COURSE RECOMMENDER SYSTEM IN E-LEARNING

COURSE RECOMMENDER SYSTEM IN E-LEARNING International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand

More information

Customer Classification And Prediction Based On Data Mining Technique

Customer Classification And Prediction Based On Data Mining Technique Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor

More information

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore. CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

More information

Representation of Electronic Mail Filtering Profiles: A User Study

Representation of Electronic Mail Filtering Profiles: A User Study Representation of Electronic Mail Filtering Profiles: A User Study Michael J. Pazzani Department of Information and Computer Science University of California, Irvine Irvine, CA 92697 +1 949 824 5888 pazzani@ics.uci.edu

More information

SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH

SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH 330 SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH T. M. D.Saumya 1, T. Rupasinghe 2 and P. Abeysinghe 3 1 Department of Industrial Management, University of Kelaniya,

More information

2. IMPLEMENTATION. International Journal of Computer Applications (0975 8887) Volume 70 No.18, May 2013

2. IMPLEMENTATION. International Journal of Computer Applications (0975 8887) Volume 70 No.18, May 2013 Prediction of Market Capital for Trading Firms through Data Mining Techniques Aditya Nawani Department of Computer Science, Bharati Vidyapeeth s College of Engineering, New Delhi, India Himanshu Gupta

More information

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection

More information

Artificial Neural Network Approach for Classification of Heart Disease Dataset

Artificial Neural Network Approach for Classification of Heart Disease Dataset Artificial Neural Network Approach for Classification of Heart Disease Dataset Manjusha B. Wadhonkar 1, Prof. P.A. Tijare 2 and Prof. S.N.Sawalkar 3 1 M.E Computer Engineering (Second Year)., Computer

More information

Machine Learning: Overview

Machine Learning: Overview Machine Learning: Overview Why Learning? Learning is a core of property of being intelligent. Hence Machine learning is a core subarea of Artificial Intelligence. There is a need for programs to behave

More information

Predicting Flight Delays

Predicting Flight Delays Predicting Flight Delays Dieterich Lawson jdlawson@stanford.edu William Castillo will.castillo@stanford.edu Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing

More information

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo 71251911@mackenzie.br,nizam.omar@mackenzie.br

More information

On the effect of data set size on bias and variance in classification learning

On the effect of data set size on bias and variance in classification learning On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent

More information

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376 Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

More information

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY

More information

Comparative Analysis of Classification Algorithms on Different Datasets using WEKA

Comparative Analysis of Classification Algorithms on Different Datasets using WEKA Volume 54 No13, September 2012 Comparative Analysis of Classification Algorithms on Different Datasets using WEKA Rohit Arora MTech CSE Deptt Hindu College of Engineering Sonepat, Haryana, India Suman

More information

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,

More information

Neural Networks in Data Mining

Neural Networks in Data Mining IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V6 PP 01-06 www.iosrjen.org Neural Networks in Data Mining Ripundeep Singh Gill, Ashima Department

More information

D A T A M I N I N G C L A S S I F I C A T I O N

D A T A M I N I N G C L A S S I F I C A T I O N D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.

More information

ANN Based Fault Classifier and Fault Locator for Double Circuit Transmission Line

ANN Based Fault Classifier and Fault Locator for Double Circuit Transmission Line International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Special Issue-2, April 2016 E-ISSN: 2347-2693 ANN Based Fault Classifier and Fault Locator for Double Circuit

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Random Forest Based Imbalanced Data Cleaning and Classification

Random Forest Based Imbalanced Data Cleaning and Classification Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem

More information

Software Defect Prediction Modeling

Software Defect Prediction Modeling Software Defect Prediction Modeling Burak Turhan Department of Computer Engineering, Bogazici University turhanb@boun.edu.tr Abstract Defect predictors are helpful tools for project managers and developers.

More information

New Ensemble Combination Scheme

New Ensemble Combination Scheme New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

Multiagent Reputation Management to Achieve Robust Software Using Redundancy

Multiagent Reputation Management to Achieve Robust Software Using Redundancy Multiagent Reputation Management to Achieve Robust Software Using Redundancy Rajesh Turlapati and Michael N. Huhns Center for Information Technology, University of South Carolina Columbia, SC 29208 {turlapat,huhns}@engr.sc.edu

More information

Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News

Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil brunorocha_33@hotmail.com 2 Network Engineering

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

In Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering, pages 842-846, Warsaw, Poland, December 2-4, 1999

In Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering, pages 842-846, Warsaw, Poland, December 2-4, 1999 In Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering, pages 842-846, Warsaw, Poland, December 2-4, 1999 A Bayesian Network Model for Diagnosis of Liver Disorders Agnieszka

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Siddhanth Gokarapu 1, J. Laxmi Narayana 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India 1

More information

Role of Neural network in data mining

Role of Neural network in data mining Role of Neural network in data mining Chitranjanjit kaur Associate Prof Guru Nanak College, Sukhchainana Phagwara,(GNDU) Punjab, India Pooja kapoor Associate Prof Swami Sarvanand Group Of Institutes Dinanagar(PTU)

More information

The Optimality of Naive Bayes

The Optimality of Naive Bayes The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most

More information

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India lav_dlr@yahoo.com

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013 A Short-Term Traffic Prediction On A Distributed Network Using Multiple Regression Equation Ms.Sharmi.S 1 Research Scholar, MS University,Thirunelvelli Dr.M.Punithavalli Director, SREC,Coimbatore. Abstract:

More information

Using Semantic Data Mining for Classification Improvement and Knowledge Extraction

Using Semantic Data Mining for Classification Improvement and Knowledge Extraction Using Semantic Data Mining for Classification Improvement and Knowledge Extraction Fernando Benites and Elena Sapozhnikova University of Konstanz, 78464 Konstanz, Germany. Abstract. The objective of this

More information

Domain Classification of Technical Terms Using the Web

Domain Classification of Technical Terms Using the Web Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

More information

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems

More information

Introduction to Data Mining Techniques

Introduction to Data Mining Techniques Introduction to Data Mining Techniques Dr. Rajni Jain 1 Introduction The last decade has experienced a revolution in information availability and exchange via the internet. In the same spirit, more and

More information

Network Intrusion Detection Using a HNB Binary Classifier

Network Intrusion Detection Using a HNB Binary Classifier 2015 17th UKSIM-AMSS International Conference on Modelling and Simulation Network Intrusion Detection Using a HNB Binary Classifier Levent Koc and Alan D. Carswell Center for Security Studies, University

More information

Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu

Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Machine Learning CS 6830 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu What is Learning? Merriam-Webster: learn = to acquire knowledge, understanding, or skill

More information

A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS

A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS Charanma.P 1, P. Ganesh Kumar 2, 1 PG Scholar, 2 Assistant Professor,Department of Information Technology, Anna University

More information

Roulette Sampling for Cost-Sensitive Learning

Roulette Sampling for Cost-Sensitive Learning Roulette Sampling for Cost-Sensitive Learning Victor S. Sheng and Charles X. Ling Department of Computer Science, University of Western Ontario, London, Ontario, Canada N6A 5B7 {ssheng,cling}@csd.uwo.ca

More information

Ensemble Data Mining Methods

Ensemble Data Mining Methods Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods

More information

Towards applying Data Mining Techniques for Talent Mangement

Towards applying Data Mining Techniques for Talent Mangement 2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Towards applying Data Mining Techniques for Talent Mangement Hamidah Jantan 1,

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

SURVEY PAPER ON INTELLIGENT SYSTEM FOR TEXT AND IMAGE SPAM FILTERING Amol H. Malge 1, Dr. S. M. Chaware 2

SURVEY PAPER ON INTELLIGENT SYSTEM FOR TEXT AND IMAGE SPAM FILTERING Amol H. Malge 1, Dr. S. M. Chaware 2 International Journal of Computer Engineering and Applications, Volume IX, Issue I, January 15 SURVEY PAPER ON INTELLIGENT SYSTEM FOR TEXT AND IMAGE SPAM FILTERING Amol H. Malge 1, Dr. S. M. Chaware 2

More information

A Hybrid Approach to Learn with Imbalanced Classes using Evolutionary Algorithms

A Hybrid Approach to Learn with Imbalanced Classes using Evolutionary Algorithms Proceedings of the International Conference on Computational and Mathematical Methods in Science and Engineering, CMMSE 2009 30 June, 1 3 July 2009. A Hybrid Approach to Learn with Imbalanced Classes using

More information

Predicting Student Performance by Using Data Mining Methods for Classification

Predicting Student Performance by Using Data Mining Methods for Classification BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance

More information