Machine Learning with WEKA

Size: px
Start display at page:

Download "Machine Learning with WEKA"

Transcription

1 Machine Learning with WEKA Eibe Frank Department of Computer Science, University of Waikato, New Zealand WEKA: A Machine Learning Toolkit The Explorer Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions

2 WEKA: the bird Copyright: Martin Kramer 2/22/2011 University of Waikato 2

3 WEKA: the software Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements Data Mining by Witten & Frank Main features: Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms 2/22/2011 University of Waikato 3

4 WEKA: versions There are several versions of WEKA: WEKA 3.0: book version compatible with description in data mining book WEKA 3.2: GUI version adds graphical user interfaces (book version is command-line only) WEKA 3.3: development version with lots of improvements This talk is based on the latest snapshot of WEKA 3.3 (soon to be WEKA 3.4) 2/22/2011 University of Waikato 4

5 WEKA only deals with flat age sex { female, chest_pain_type { typ_angina, asympt, non_anginal, cholesterol exercise_induced_angina { no, class { present, 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... 2/22/2011 University of Waikato 5

6 WEKA only deals with flat age sex { female, chest_pain_type { typ_angina, asympt, non_anginal, cholesterol exercise_induced_angina { no, class { present, 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... 2/22/2011 University of Waikato 6

7 2/22/2011 University of Waikato 7

8 2/22/2011 University of Waikato 8

9 2/22/2011 University of Waikato 9

10 Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called filters WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, 2/22/2011 University of Waikato 10

11 2/22/2011 University of Waikato 11

12 2/22/2011 University of Waikato 12

13 2/22/2011 University of Waikato 13

14 2/22/2011 University of Waikato 14

15 2/22/2011 University of Waikato 15

16 2/22/2011 University of Waikato 16

17 2/22/2011 University of Waikato 17

18 2/22/2011 University of Waikato 18

19 2/22/2011 University of Waikato 19

20 2/22/2011 University of Waikato 20

21 2/22/2011 University of Waikato 21

22 2/22/2011 University of Waikato 22

23 2/22/2011 University of Waikato 23

24 2/22/2011 University of Waikato 24

25 2/22/2011 University of Waikato 25

26 2/22/2011 University of Waikato 26

27 2/22/2011 University of Waikato 27

28 2/22/2011 University of Waikato 28

29 2/22/2011 University of Waikato 29

30 2/22/2011 University of Waikato 30

31 2/22/2011 University of Waikato 31

32 Explorer: building classifiers Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes nets, Meta -classifiers include: Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, 2/22/2011 University of Waikato 32

33 2/22/2011 University of Waikato 33

34 2/22/2011 University of Waikato 34

35 2/22/2011 University of Waikato 35

36 2/22/2011 University of Waikato 36

37 2/22/2011 University of Waikato 37

38 2/22/2011 University of Waikato 38

39 2/22/2011 University of Waikato 39

40 2/22/2011 University of Waikato 40

41 2/22/2011 University of Waikato 41

42 2/22/2011 University of Waikato 42

43 2/22/2011 University of Waikato 43

44 2/22/2011 University of Waikato 44

45 2/22/2011 University of Waikato 45

46 2/22/2011 University of Waikato 46

47 2/22/2011 University of Waikato 47

48 2/22/2011 University of Waikato 48

49 2/22/2011 University of Waikato 49

50 2/22/2011 University of Waikato 50

51 2/22/2011 University of Waikato 51

52 2/22/2011 University of Waikato 52

53 2/22/2011 University of Waikato 53

54 2/22/2011 University of Waikato 54

55 2/22/2011 University of Waikato 55

56 2/22/2011 University of Waikato 56

57 2/22/2011 University of Waikato 57

58 2/22/2011 University of Waikato 58

59 2/22/2011 University of Waikato 59

60 2/22/2011 University of Waikato 60

61 2/22/2011 University of Waikato 61

62 2/22/2011 University of Waikato 62

63 2/22/2011 University of Waikato 63

64 2/22/2011 University of Waikato 64

65 QuickTime 2/22/2011 and a TIFF (LZW) decompressor are needed to see this picture. University of Waikato 65

66 QuickTime 2/22/2011 and a TIFF (LZW) decompressor are needed to see this picture. University of Waikato 66

67 QuickTime 2/22/2011 and a TIFF (LZW) decompressor are needed to see this picture. University of Waikato 67

68 2/22/2011 University of Waikato 68

69 2/22/2011 University of Waikato 69

70 2/22/2011 University of Waikato 70

71 2/22/2011 University of Waikato 71

72 2/22/2011 University of Waikato 72

73 2/22/2011 University of Waikato 73

74 2/22/2011 University of Waikato 74

75 Quic k Time and a TIFF (LZW) dec ompres s or are needed to s ee this pic ture. 2/22/2011 University of Waikato 75

76 2/22/2011 University of Waikato 76

77 2/22/2011 University of Waikato 77

78 2/22/2011 University of Waikato 78

79 2/22/2011 University of Waikato 79

80 QuickTime and a TIFF (LZW) decompressor are needed to see this picture. 2/22/2011 University of Waikato 80

81 QuickTime and a TIFF (LZW) decompressor are needed to see this picture. 2/22/2011 University of Waikato 81

82 2/22/2011 University of Waikato 82

83 QuickTime and a TIFF (LZW) decompressor are needed to see this picture. 2/22/2011 University of Waikato 83

84 2/22/2011 University of Waikato 84

85 2/22/2011 University of Waikato 85

86 2/22/2011 University of Waikato 86

87 2/22/2011 University of Waikato 87

88 2/22/2011 University of Waikato 88

89 2/22/2011 University of Waikato 89

90 2/22/2011 University of Waikato 90

91 2/22/2011 University of Waikato 91

92 Explorer: clustering data WEKA contains clusterers for finding groups of similar instances in a dataset Implemented schemes are: k-means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to true clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution 2/22/2011 University of Waikato 92

93 2/22/2011 University of Waikato 93

94 2/22/2011 University of Waikato 94

95 2/22/2011 University of Waikato 95

96 2/22/2011 University of Waikato 96

97 2/22/2011 University of Waikato 97

98 2/22/2011 University of Waikato 98

99 2/22/2011 University of Waikato 99

100 2/22/2011 University of Waikato 100

101 2/22/2011 University of Waikato 101

102 2/22/2011 University of Waikato 102

103 2/22/2011 University of Waikato 103

104 2/22/2011 University of Waikato 104

105 2/22/2011 University of Waikato 105

106 2/22/2011 University of Waikato 106

107 2/22/2011 University of Waikato 107

108 Explorer: finding associations WEKA contains an implementation of the Apriori algorithm for learning association rules Works only with discrete data Can identify statistical dependencies between groups of attributes: milk, butter bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence 2/22/2011 University of Waikato 108

109 2/22/2011 University of Waikato 109

110 2/22/2011 University of Waikato 110

111 2/22/2011 University of Waikato 111

112 2/22/2011 University of Waikato 112

113 2/22/2011 University of Waikato 113

114 2/22/2011 University of Waikato 114

115 2/22/2011 University of Waikato 115

116 Explorer: attribute selection Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper, information gain, chi-squared, Very flexible: WEKA allows (almost) arbitrary combinations of these two 2/22/2011 University of Waikato 116

117 2/22/2011 University of Waikato 117

118 2/22/2011 University of Waikato 118

119 2/22/2011 University of Waikato 119

120 2/22/2011 University of Waikato 120

121 2/22/2011 University of Waikato 121

122 2/22/2011 University of Waikato 122

123 2/22/2011 University of Waikato 123

124 2/22/2011 University of Waikato 124

125 Explorer: data visualization Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style) Color-coded class values Jitter option to deal with nominal attributes (and to detect hidden data points) Zoom-in function 2/22/2011 University of Waikato 125

126 2/22/2011 University of Waikato 126

127 2/22/2011 University of Waikato 127

128 2/22/2011 University of Waikato 128

129 2/22/2011 University of Waikato 129

130 2/22/2011 University of Waikato 130

131 2/22/2011 University of Waikato 131

132 2/22/2011 University of Waikato 132

133 2/22/2011 University of Waikato 133

134 2/22/2011 University of Waikato 134

135 2/22/2011 University of Waikato 135

136 2/22/2011 University of Waikato 136

137 2/22/2011 University of Waikato 137

138 Conclusion: try it yourself! WEKA is available at Also has a list of projects based on WEKA WEKA contributors: Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer, Brent Martin, Peter Flach, Eibe Frank,Gabi Schmidberger,Ian H. Witten, J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall,Remco Bouckaert, Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang 2/22/2011 University of Waikato 138

An Introduction to WEKA. As presented by PACE

An Introduction to WEKA. As presented by PACE An Introduction to WEKA As presented by PACE Download and Install WEKA Website: http://www.cs.waikato.ac.nz/~ml/weka/index.html 2 Content Intro and background Exploring WEKA Data Preparation Creating Models/

More information

Introduction Predictive Analytics Tools: Weka

Introduction Predictive Analytics Tools: Weka Introduction Predictive Analytics Tools: Weka Predictive Analytics Center of Excellence San Diego Supercomputer Center University of California, San Diego Tools Landscape Considerations Scale User Interface

More information

Introduction Predictive Analytics Tools: Weka, R!

Introduction Predictive Analytics Tools: Weka, R! Introduction Predictive Analytics Tools: Weka, R! Predictive Analytics Center of Excellence San Diego Supercomputer Center University of California, San Diego! Available Data Mining Tools! COTs:! n IBM

More information

WEKA A Machine Learning Workbench for Data Mining

WEKA A Machine Learning Workbench for Data Mining Chapter 1 WEKA A Machine Learning Workbench for Data Mining Eibe Frank, Mark Hall, Geoffrey Holmes, Richard Kirkby, Bernhard Pfahringer, Ian H. Witten Department of Computer Science, University of Waikato,

More information

WEKA Explorer Tutorial

WEKA Explorer Tutorial Machine Learning with WEKA WEKA Explorer Tutorial for WEKA Version 3.4.3 Svetlana S. Aksenova aksenovs@ecs.csus.edu School of Engineering and Computer Science Department of Computer Science California

More information

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection

More information

An Introduction to the WEKA Data Mining System

An Introduction to the WEKA Data Mining System An Introduction to the WEKA Data Mining System Zdravko Markov Central Connecticut State University markovz@ccsu.edu Ingrid Russell University of Hartford irussell@hartford.edu Data Mining "Drowning in

More information

Pentaho Data Mining Last Modified on January 22, 2007

Pentaho Data Mining Last Modified on January 22, 2007 Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org

More information

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques. International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant

More information

WEKA Experiences with a Java Open-Source Project

WEKA Experiences with a Java Open-Source Project Journal of Machine Learning Research 11 (2010) 2533-2541 Submitted 6/10; Revised 8/10; Published 9/10 WEKA Experiences with a Java Open-Source Project Remco R. Bouckaert Eibe Frank Mark A. Hall Geoffrey

More information

Data Mining with Weka

Data Mining with Weka Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Data Mining with Weka a practical course on how to

More information

How To Understand How Weka Works

How To Understand How Weka Works More Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz More Data Mining with Weka a practical course

More information

Open-Source Machine Learning: R Meets Weka

Open-Source Machine Learning: R Meets Weka Open-Source Machine Learning: R Meets Weka Kurt Hornik Christian Buchta Achim Zeileis Weka? Weka is not only a flightless endemic bird of New Zealand (Gallirallus australis, picture from Wekapedia) but

More information

Web Document Clustering

Web Document Clustering Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

More information

Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/

Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/ Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/ Email: p.ducange@iet.unipi.it Office: Dipartimento di Ingegneria

More information

Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News

Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati

More information

WEKA Explorer User Guide for Version 3-4-3

WEKA Explorer User Guide for Version 3-4-3 WEKA Explorer User Guide for Version 3-4-3 Richard Kirkby Eibe Frank November 9, 2004 c 2002, 2004 University of Waikato Contents 1 Launching WEKA 2 2 The WEKA Explorer 2 Section Tabs................................

More information

Improving spam mail filtering using classification algorithms with discretization Filter

Improving spam mail filtering using classification algorithms with discretization Filter International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

How To Predict Web Site Visits

How To Predict Web Site Visits Web Site Visit Forecasting Using Data Mining Techniques Chandana Napagoda Abstract: Data mining is a technique which is used for identifying relationships between various large amounts of data in many

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY

More information

Contents WEKA Microsoft SQL Database

Contents WEKA Microsoft SQL Database WEKA User Manual Contents WEKA Introduction 3 Background information. 3 Installation. 3 Where to get WEKA... 3 Downloading Information... 3 Opening the program.. 4 Chooser Menu. 4-6 Preprocessing... 6-7

More information

WEKA. Machine Learning Algorithms in Java

WEKA. Machine Learning Algorithms in Java WEKA Machine Learning Algorithms in Java Ian H. Witten Department of Computer Science University of Waikato Hamilton, New Zealand E-mail: ihw@cs.waikato.ac.nz Eibe Frank Department of Computer Science

More information

Analytics on Big Data

Analytics on Big Data Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI Data Mining Knowledge Discovery, Data Warehousing and Machine Learning Final remarks Lecturer: JERZY STEFANOWSKI Email: Jerzy.Stefanowski@cs.put.poznan.pl Data Mining a step in A KDD Process Data mining:

More information

8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence 8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

DBTech Pro Workshop. Knowledge Discovery from Databases (KDD) Including Data Warehousing and Data Mining. Georgios Evangelidis

DBTech Pro Workshop. Knowledge Discovery from Databases (KDD) Including Data Warehousing and Data Mining. Georgios Evangelidis DBTechNet DBTech Pro Workshop Knowledge Discovery from Databases (KDD) Including Data Warehousing and Data Mining Dimitris A. Dervos dad@it.teithe.gr http://aetos.it.teithe.gr/~dad Georgios Evangelidis

More information

Data Mining & Data Stream Mining Open Source Tools

Data Mining & Data Stream Mining Open Source Tools Data Mining & Data Stream Mining Open Source Tools Darshana Parikh, Priyanka Tirkha Student M.Tech, Dept. of CSE, Sri Balaji College Of Engg. & Tech, Jaipur, Rajasthan, India Assistant Professor, Dept.

More information

More Data Mining with Weka

More Data Mining with Weka More Data Mining with Weka Class 5 Lesson 1 Simple neural networks Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Lesson 5.1: Simple neural networks Class

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

DATA MINING USING PENTAHO / WEKA

DATA MINING USING PENTAHO / WEKA DATA MINING USING PENTAHO / WEKA Yannis Angelis Channels & Information Exploitation Division Application Delivery Sector EFG Eurobank 1 Agenda BI in Financial Environments Pentaho Community Platform Weka

More information

Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data

Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data Proceedings of Student-Faculty Research Day, CSIS, Pace University, May 2 nd, 2014 Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

A Comparative Study of clustering algorithms Using weka tools

A Comparative Study of clustering algorithms Using weka tools A Comparative Study of clustering algorithms Using weka tools Bharat Chaudhari 1, Manan Parikh 2 1,2 MECSE, KITRC KALOL ABSTRACT Data clustering is a process of putting similar data into groups. A clustering

More information

Waffles: A Machine Learning Toolkit

Waffles: A Machine Learning Toolkit Journal of Machine Learning Research 12 (2011) 2383-2387 Submitted 6/10; Revised 3/11; Published 7/11 Waffles: A Machine Learning Toolkit Mike Gashler Department of Computer Science Brigham Young University

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

Visualizing class probability estimators

Visualizing class probability estimators Visualizing class probability estimators Eibe Frank and Mark Hall Department of Computer Science University of Waikato Hamilton, New Zealand {eibe, mhall}@cs.waikato.ac.nz Abstract. Inducing classifiers

More information

152 International Journal of Computer Science and Technology. Paavai Engineering College, India

152 International Journal of Computer Science and Technology. Paavai Engineering College, India IJCST Vol. 2, Issue 3, September 2011 Improving the Performance of Data Mining Algorithms in Health Care Data 1 P. Santhi, 2 V. Murali Bhaskaran, 1,2 Paavai Engineering College, India ISSN : 2229-4333(Print)

More information

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool. International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm

More information

Analysis Tools and Libraries for BigData

Analysis Tools and Libraries for BigData + Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I

More information

An intelligent Analysis of a City Crime Data Using Data Mining

An intelligent Analysis of a City Crime Data Using Data Mining 2011 International Conference on Information and Electronics Engineering IPCSIT vol.6 (2011) (2011) IACSIT Press, Singapore An intelligent Analysis of a City Crime Data Using Data Mining Malathi. A 1,

More information

RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE. Luigi Grimaudo 178627 Database And Data Mining Research Group

RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE. Luigi Grimaudo 178627 Database And Data Mining Research Group RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE Luigi Grimaudo 178627 Database And Data Mining Research Group Summary RapidMiner project Strengths How to use RapidMiner Operator

More information

THE COMPARISON OF DATA MINING TOOLS

THE COMPARISON OF DATA MINING TOOLS T.C. İSTANBUL KÜLTÜR UNIVERSITY THE COMPARISON OF DATA MINING TOOLS Data Warehouses and Data Mining Yrd.Doç.Dr. Ayça ÇAKMAK PEHLİVANLI Department of Computer Engineering İstanbul Kültür University submitted

More information

COC131 Data Mining - Clustering

COC131 Data Mining - Clustering COC131 Data Mining - Clustering Martin D. Sykora m.d.sykora@lboro.ac.uk Tutorial 05, Friday 20th March 2009 1. Fire up Weka (Waikako Environment for Knowledge Analysis) software, launch the explorer window

More information

CSC 177 Fall 2014 Team Project Final Report

CSC 177 Fall 2014 Team Project Final Report CSC 177 Fall 2014 Team Project Final Report Project Title, Data Mining on Farmers Market Data Instructor: Dr. Meiliu Lu Team Members: Yogesh Isawe Kalindi Mehta Aditi Kulkarni CSc 177 DM Project Cover

More information

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley,

More information

A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries

A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries Aida Mustapha *1, Farhana M. Fadzil #2 * Faculty of Computer Science and Information Technology, Universiti Tun Hussein

More information

CHAPTER 6 IMPLEMENTATION OF CONVENTIONAL AND INTELLIGENT CLASSIFIER FOR FLAME MONITORING

CHAPTER 6 IMPLEMENTATION OF CONVENTIONAL AND INTELLIGENT CLASSIFIER FOR FLAME MONITORING 135 CHAPTER 6 IMPLEMENTATION OF CONVENTIONAL AND INTELLIGENT CLASSIFIER FOR FLAME MONITORING 6.1 PROPOSED SETUP FOR FLAME MONITORING IN BOILERS The existing flame monitoring system includes the flame images

More information

Comparative Analysis of Classification Algorithms on Different Datasets using WEKA

Comparative Analysis of Classification Algorithms on Different Datasets using WEKA Volume 54 No13, September 2012 Comparative Analysis of Classification Algorithms on Different Datasets using WEKA Rohit Arora MTech CSE Deptt Hindu College of Engineering Sonepat, Haryana, India Suman

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.7 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Linear Regression Other Regression Models References Introduction Introduction Numerical prediction is

More information

Open-Source Machine Learning: R Meets Weka

Open-Source Machine Learning: R Meets Weka Open-Source Machine Learning: R Meets Weka Kurt Hornik, Christian Buchta, Michael Schauerhuber, David Meyer, Achim Zeileis http://statmath.wu-wien.ac.at/ zeileis/ Weka? Weka is not only a flightless endemic

More information

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

WEKA KnowledgeFlow Tutorial for Version 3-5-8

WEKA KnowledgeFlow Tutorial for Version 3-5-8 WEKA KnowledgeFlow Tutorial for Version 3-5-8 Mark Hall Peter Reutemann July 14, 2008 c 2008 University of Waikato Contents 1 Introduction 2 2 Features 3 3 Components 4 3.1 DataSources..............................

More information

Implementation of Breiman s Random Forest Machine Learning Algorithm

Implementation of Breiman s Random Forest Machine Learning Algorithm Implementation of Breiman s Random Forest Machine Learning Algorithm Frederick Livingston Abstract This research provides tools for exploring Breiman s Random Forest algorithm. This paper will focus on

More information

K-means Clustering Technique on Search Engine Dataset using Data Mining Tool

K-means Clustering Technique on Search Engine Dataset using Data Mining Tool International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 6 (2013), pp. 505-510 International Research Publications House http://www. irphouse.com /ijict.htm K-means

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Machine learning for algo trading

Machine learning for algo trading Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

Machine Learning. Hands-On for Developers and Technical Professionals

Machine Learning. Hands-On for Developers and Technical Professionals Brochure More information from http://www.researchandmarkets.com/reports/2785739/ Machine Learning. Hands-On for Developers and Technical Professionals Description: Dig deep into the data with a hands-on

More information

Data Mining Techniques for Prognosis in Pancreatic Cancer

Data Mining Techniques for Prognosis in Pancreatic Cancer Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree

More information

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376 Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

More information

CLUSTERING AND PREDICTIVE MODELING: AN ENSEMBLE APPROACH

CLUSTERING AND PREDICTIVE MODELING: AN ENSEMBLE APPROACH CLUSTERING AND PREDICTIVE MODELING: AN ENSEMBLE APPROACH Except where reference is made to the work of others, the work described in this thesis is my own or was done in collaboration with my advisory

More information

What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling

What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling MS4424 Data Mining & Modelling MS4424 Data Mining & Modelling Lecturer : Dr Iris Yeung Room No : P7509 Tel No : 2788 8566 Email : msiris@cityu.edu.hk 1 Aims To introduce the basic concepts of data mining

More information

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA Welcome Xindong Wu Data Mining: Updates in Technologies Dept of Math and Computer Science Colorado School of Mines Golden, Colorado 80401, USA Email: xwu@ mines.edu Home Page: http://kais.mines.edu/~xwu/

More information

International Journal of Advanced Computer Technology (IJACT) ISSN:2319-7900 PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS

International Journal of Advanced Computer Technology (IJACT) ISSN:2319-7900 PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS First A. Dr. D. Aruna Kumari, Ph.d, ; Second B. Ch.Mounika, Student, Department Of ECM, K L University, chittiprolumounika@gmail.com; Third C.

More information

Rule based Classification of BSE Stock Data with Data Mining

Rule based Classification of BSE Stock Data with Data Mining International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification

More information

Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring

Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring 714 Evaluation of Feature election Methods for Predictive Modeling Using Neural Networks in Credits coring Raghavendra B. K. Dr. M.G.R. Educational and Research Institute, Chennai-95 Email: raghavendra_bk@rediffmail.com

More information

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

More information

What s Cooking in KNIME

What s Cooking in KNIME What s Cooking in KNIME Thomas Gabriel Copyright 2015 KNIME.com AG Agenda Querying NoSQL Databases Database Improvements & Big Data Copyright 2015 KNIME.com AG 2 Querying NoSQL Databases MongoDB & CouchDB

More information

from Larson Text By Susan Miertschin

from Larson Text By Susan Miertschin Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Application of Data Mining in Medical Decision Support System

Application of Data Mining in Medical Decision Support System Application of Data Mining in Medical Decision Support System Habib Shariff Mahmud School of Engineering & Computing Sciences University of East London - FTMS College Technology Park Malaysia Bukit Jalil,

More information

Association Rule Mining: Exercises and Answers

Association Rule Mining: Exercises and Answers Association Rule Mining: Exercises and Answers Contains both theoretical and practical exercises to be done using Weka. The exercises are part of the DBTech Virtual Workshop on KDD and BI. Exercise 1.

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable

More information

Model Deployment. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/

Model Deployment. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/ Model Deployment Dr. Saed Sayad University of Toronto 2010 saed.sayad@utoronto.ca http://chem-eng.utoronto.ca/~datamining/ 1 Model Deployment Creation of the model is generally not the end of the project.

More information

Improving students learning process by analyzing patterns produced with data mining methods

Improving students learning process by analyzing patterns produced with data mining methods Improving students learning process by analyzing patterns produced with data mining methods Lule Ahmedi, Eliot Bytyçi, Blerim Rexha, and Valon Raça Abstract Employing data mining algorithms on previous

More information

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Market Basket Analysis Exercise: Customer Segmentation Exercise:

More information

Studying Auto Insurance Data

Studying Auto Insurance Data Studying Auto Insurance Data Ashutosh Nandeshwar February 23, 2010 1 Introduction To study auto insurance data using traditional and non-traditional tools, I downloaded a well-studied data from http://www.statsci.org/data/general/motorins.

More information

Data Mining of Web Access Logs

Data Mining of Web Access Logs Data Mining of Web Access Logs A minor thesis submitted in partial fulfilment of the requirements for the degree of Master of Applied Science in Information Technology Anand S. Lalani School of Computer

More information

Structural Health Monitoring Tools (SHMTools)

Structural Health Monitoring Tools (SHMTools) Structural Health Monitoring Tools (SHMTools) Getting Started LANL/UCSD Engineering Institute LA-CC-14-046 c Copyright 2014, Los Alamos National Security, LLC All rights reserved. May 30, 2014 Contents

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

A Perspective Analysis of Traffic Accident using Data Mining Techniques

A Perspective Analysis of Traffic Accident using Data Mining Techniques A Perspective Analysis of Traffic Accident using Data Mining Techniques S.Krishnaveni Ph.D (CS) Research Scholar, Karpagam University, Coimbatore, India 641 021 Dr.M.Hemalatha Asst. Professor & Head, Dept

More information

Knowledge Discovery in Data with FIT-Miner

Knowledge Discovery in Data with FIT-Miner Knowledge Discovery in Data with FIT-Miner Michal Šebek, Martin Hlosta and Jaroslav Zendulka Faculty of Information Technology, Brno University of Technology, Božetěchova 2, Brno {isebek,ihlosta,zendulka}@fit.vutbr.cz

More information

Predicting Students Final GPA Using Decision Trees: A Case Study

Predicting Students Final GPA Using Decision Trees: A Case Study Predicting Students Final GPA Using Decision Trees: A Case Study Mashael A. Al-Barrak and Muna Al-Razgan Abstract Educational data mining is the process of applying data mining tools and techniques to

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

Prediction and Diagnosis of Heart Disease by Data Mining Techniques

Prediction and Diagnosis of Heart Disease by Data Mining Techniques Prediction and Diagnosis of Heart Disease by Data Mining Techniques Boshra Bahrami, Mirsaeid Hosseini Shirvani* Department of Computer Engineering, Sari Branch, Islamic Azad University Sari, Iran Boshrabahrami_znu@yahoo.com;

More information

Analysis of Email Fraud Detection Using WEKA Tool

Analysis of Email Fraud Detection Using WEKA Tool Analysis of Email Fraud Detection Using WEKA Tool Author:Tarushi Sharma, M-Tech(Information Technology), CGC Landran Mohali, Punjab,India, Co-Author:Mrs.Amanpreet Kaur (Assistant Professor), CGC Landran

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Ensembles and PMML in KNIME

Ensembles and PMML in KNIME Ensembles and PMML in KNIME Alexander Fillbrunn 1, Iris Adä 1, Thomas R. Gabriel 2 and Michael R. Berthold 1,2 1 Department of Computer and Information Science Universität Konstanz Konstanz, Germany First.Last@Uni-Konstanz.De

More information

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream

More information

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web

More information

A Decision Tree for Weather Prediction

A Decision Tree for Weather Prediction BULETINUL UniversităŃii Petrol Gaze din Ploieşti Vol. LXI No. 1/2009 77-82 Seria Matematică - Informatică - Fizică A Decision Tree for Weather Prediction Elia Georgiana Petre Universitatea Petrol-Gaze

More information

First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms

First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms Azwa Abdul Aziz, Nor Hafieza IsmailandFadhilah Ahmad Faculty Informatics & Computing

More information

IT services for analyses of various data samples

IT services for analyses of various data samples IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical

More information