8. Machine Learning Applied Artificial Intelligence

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "8. Machine Learning Applied Artificial Intelligence"

Transcription

1 8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1

2 Retrospective Natural Language Processing Name and explain different areas of NLP What are the 7 levels of language understanding? What is tokenizing, sentence splitting, POS tagging, and parsing? What do language resources offer to NLP? Give examples What do NLP frameworks offer? Give examples What do NLP services offer? Give examples 2

3 Agenda Overview ML Applications ML Tasks ML Approaches ML Tools Services / Product Map 3

4 What is Machine Learning (ML)? Generating a model based on inputs and using it for making decisions or predictions ( rather than programming instructions explicitly ) 4

5 Agenda Overview ML Applications ML Tasks ML Approaches ML Tools Services / Product Map 5

6 Applications of ML: Spam filtering Task: classify new s as spam or not spam New s Automatically classified Spam filter ML input Manually classified Corrections 6

7 Stock market analysis Task: make recommendations on buying and selling stocks Current stock values Prediction Recommendation ML input Decision History of stock values 7 Image source: Wikimedia

8 Detecting credit card fraud Task: Detect fraud in credit card payments CC payments Automatically classified Fraud detection ML input Manually classified Corrections 8

9 Recommender systems Task: Recommending customers suitable products Order Recommendation of related products Recommender system ML input Purchasing behaviour of other customers or customer groups 9

10 Agenda Overview ML Applications ML Tasks ML Approaches ML Tools Services / Product Map 10

11 Categories of ML tasks Machine Learning Task Supervised Learning Unsupervised Learning Reinforcement Learning Classification Regression Clustering Feature selection / extraction Topic modeling P.S. Other categorizations / groupings are possible 11

12 Categories of ML tasks Supervised learning Given: Example inputs and desired outputs Goal: Learn a general rule that maps inputs to outputs Unsupervised learning Given: Data inputs (e.g., documents) Goal: Find structure in the inputs Reinforcement learning Setting: An agent interacts with a dynamic environment in which it must perform a goal Goal: Improving the agent s behaviour 12

13 Supervised learning subcategories Classification Given: Training inputs (records) which are divided into two or more classes Goal: Produce model to classify new inputs Examples: spam filter, fraud detection, Regression Given: Training data (records) with continuous (not discrete) output values Goal: Produce model to predict output values for new inputs Example: stock value prediction 13

14 Unsupervised learning subcategories Clustering Given: Set of input records Goal: Identifying clusters (groups of similar records) Example: Customer grouping Feature selection / extraction Given: Set of input records with attributes ( features ) Goal: Find a subset of the original attributes that are equally well suited for classification / clustering tasks Topic modeling Given: Set of text documents Goal: Find abstract topics that occur in several documents and classify documents accordingly 14

15 Agenda Overview ML Applications ML Tasks ML Approaches ML Tools Services / Product Map 15

16 Decision Tree Learning Used for supervised learning (classification, regression) Training input: Training data (records) with output values (discrete or continuous Learning result: decision tree that allows classifying / predicting output values of new data records Example (figure): Decision tree for classfying passengers on the Titanic in survived / died 16 Image source: Wikipedia

17 Artificial Neural Networks (ANN) Inspired by brain / nervous system: - Neurons connected via dentrites - Reduce resistance if fired repeatedly Artificial Neuron: - Weighted inputs - Function, e.g., weighted sum - Filter, e.g, threshold output Artificial Neural Network (ANN): - Input layer, output layer, and possibly intermediate layers of neurons - Training phase: weights are adjusted via known cases - Regognition phase: output is produced for new cases 17 Prof. Source: Dr. Bernhard Ivan Galkin, Humm, U. MASS Darmstadt Lowell University ( of Applied Sciences. )

18 Bayesian Networks Directed acyclic graph (DAG) with: - Nodes: random variables + probability function - Edges: conditional dependencies Example: - Probablility of rain - Sprinkler is turned on if it hasn t rained for a while - Grass is wet if it is raining or the sprinkler is turned on Bayes Network inference allows answering questions like: - What is the probability that it is raining, given the grass is wet? - What is the impact of turning the sprinkler on? 18 Source:

19 Inductive Logic Programming Given: - Set of logic facts (background knowledge), e.g. male(tom), female(eve), parent (Tom, Eve) - Positive and / or negative examples, e.g., daughter (Eve, Tom) Learning goal: - General rules that are consistent with the examples and the background knowledge, e.g., parent(p1, p2) and female(p2) daughter(p2, p1) male female George parent Helen Mary Tom Nancy 19 Eve

20 Agenda Overview ML Applications ML Tasks ML Approaches ML Tools Services / Product Map 20

21 WEKA 21

22 Tasks supported by WEKA Numerous approaches for supervised and unsupervised learning Preprocess Choose and modify the data being acted on Classify Cluster Train and test learning schemes that classify or perform regression Learn clusters for the data Associate Learn association rules for the data Select attributes Select the most relevant attributes in the data Visualize View an interactive 2D plot of the data 22

23 WEKA Datasets Collection of examples Each instance consists of attributes Attribute types: - Nominal (enumeration) - Numeric (real or integer number) - String outlook {sunny, overcast, windy {TRUE, temperature humidity play {yes, sunny,false,85,85,no sunny,true,80,90,no overcast,false,83,86,yes rainy,false,70,96,yes rainy,false,68,80,yes 23

24 WEKA GUI 24

25 Agenda Overview ML Applications ML Tasks ML Approaches ML Tools Services / Product Map 25

26 ML Services Map ML services ML development environments / frameworks ML libraries Web services for for experimenting with different ML approaches and configuring solutions IDEs and frameworks for experimenting with different ML approaches and configuring solutions Algorithms for classification, regression, clustering, feature selection / extraction, tropic modelling, etc. using different approaches, e.g., decision tree learning, Artificial Neural Networks, Bayes networks, inductive logic programming, Support Vector machines, Hidden Markov Chains, etc. 26

27 ML Product Map ML services ML development environments / frameworks bigml, wise.io, procog, ersatz, WEKA, Orange, Shogun, scikt-learn, ML libraries Eblearn, OpenNN, aisolver, CURRENNT, 27

28 ML product map (table) Product ML library ML development environment / framework Java Neural Network Framework Neuroph x x ML service Fast Artificial Neural Network Library eblearn x x Jaden x x OpenNN - Open Neural Networks Library aisolver CURRENNT x x x WEKA x x Orange x x Shogun x x scikit-learn x x bigml wise.io procog ersatz x x x x 28

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

Analytics on Big Data

Analytics on Big Data Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis

More information

Data Mining with Weka

Data Mining with Weka Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Data Mining with Weka a practical course on how to

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

More information

Web Document Clustering

Web Document Clustering Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

More information

Introduction to Machine Learning Connectionist and Statistical Language Processing

Introduction to Machine Learning Connectionist and Statistical Language Processing Introduction to Machine Learning Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Introduction to Machine Learning p.1/22

More information

Machine learning for algo trading

Machine learning for algo trading Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

Data Mining & Data Stream Mining Open Source Tools

Data Mining & Data Stream Mining Open Source Tools Data Mining & Data Stream Mining Open Source Tools Darshana Parikh, Priyanka Tirkha Student M.Tech, Dept. of CSE, Sri Balaji College Of Engg. & Tech, Jaipur, Rajasthan, India Assistant Professor, Dept.

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Data ining Practical achine Learning Tools and Techniques Slides for Chapter 2 of Data ining by I. H. Witten and E. rank Outline Terminology What s a concept Classification, association, clustering, numeric

More information

Implementation of Breiman s Random Forest Machine Learning Algorithm

Implementation of Breiman s Random Forest Machine Learning Algorithm Implementation of Breiman s Random Forest Machine Learning Algorithm Frederick Livingston Abstract This research provides tools for exploring Breiman s Random Forest algorithm. This paper will focus on

More information

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016 Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00

More information

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next

More information

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

An Introduction to WEKA. As presented by PACE

An Introduction to WEKA. As presented by PACE An Introduction to WEKA As presented by PACE Download and Install WEKA Website: http://www.cs.waikato.ac.nz/~ml/weka/index.html 2 Content Intro and background Exploring WEKA Data Preparation Creating Models/

More information

from Larson Text By Susan Miertschin

from Larson Text By Susan Miertschin Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.

More information

Data Mining. Practical Machine Learning Tools and Techniques. Classification, association, clustering, numeric prediction

Data Mining. Practical Machine Learning Tools and Techniques. Classification, association, clustering, numeric prediction Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 2 of Data Mining by I. H. Witten and E. Frank Input: Concepts, instances, attributes Terminology What s a concept? Classification,

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Introduction Predictive Analytics Tools: Weka

Introduction Predictive Analytics Tools: Weka Introduction Predictive Analytics Tools: Weka Predictive Analytics Center of Excellence San Diego Supercomputer Center University of California, San Diego Tools Landscape Considerations Scale User Interface

More information

Data mining knowledge representation

Data mining knowledge representation Data mining knowledge representation 1 What Defines a Data Mining Task? Task relevant data: where and how to retrieve the data to be used for mining Background knowledge: Concept hierarchies Interestingness

More information

Knowledge-based systems and the need for learning

Knowledge-based systems and the need for learning Knowledge-based systems and the need for learning The implementation of a knowledge-based system can be quite difficult. Furthermore, the process of reasoning with that knowledge can be quite slow. This

More information

Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/

Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/ Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/ Email: p.ducange@iet.unipi.it Office: Dipartimento di Ingegneria

More information

Learning is a very general term denoting the way in which agents:

Learning is a very general term denoting the way in which agents: What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

More information

Analysis Tools and Libraries for BigData

Analysis Tools and Libraries for BigData + Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I

More information

Monday Morning Data Mining

Monday Morning Data Mining Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse Outline: - data mining - IceCube - Data mining in IceCube Computer Scientists are different... Fakultät Physik Fakultät Physik

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley,

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool. International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm

More information

Practical Introduction to Machine Learning and Optimization. Alessio Signorini

Practical Introduction to Machine Learning and Optimization. Alessio Signorini <alessio.signorini@oneriot.com> Practical Introduction to Machine Learning and Optimization Alessio Signorini Everyday's Optimizations Although you may not know, everybody uses daily some sort of optimization

More information

Machine Learning. 01 - Introduction

Machine Learning. 01 - Introduction Machine Learning 01 - Introduction Machine learning course One lecture (Wednesday, 9:30, 346) and one exercise (Monday, 17:15, 203). Oral exam, 20 minutes, 5 credit points. Some basic mathematical knowledge

More information

Pentaho Data Mining Last Modified on January 22, 2007

Pentaho Data Mining Last Modified on January 22, 2007 Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org

More information

Model Deployment. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/

Model Deployment. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/ Model Deployment Dr. Saed Sayad University of Toronto 2010 saed.sayad@utoronto.ca http://chem-eng.utoronto.ca/~datamining/ 1 Model Deployment Creation of the model is generally not the end of the project.

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

Classification and Prediction

Classification and Prediction Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

More information

Big Data: The Science of Patterns. Dr. Lutz Hamel Dept. of Computer Science and Statistics hamel@cs.uri.edu

Big Data: The Science of Patterns. Dr. Lutz Hamel Dept. of Computer Science and Statistics hamel@cs.uri.edu Big Data: The Science of Patterns Dr. Lutz Hamel Dept. of Computer Science and Statistics hamel@cs.uri.edu The Blessing and the Curse: Lots of Data Outlook Temp Humidity Wind Play Sunny Hot High Weak No

More information

Evaluation of Machine Learning Techniques for Green Energy Prediction

Evaluation of Machine Learning Techniques for Green Energy Prediction arxiv:1406.3726v1 [cs.lg] 14 Jun 2014 Evaluation of Machine Learning Techniques for Green Energy Prediction 1 Objective Ankur Sahai University of Mainz, Germany We evaluate Machine Learning techniques

More information

Data Mining and Neural Networks in Stata

Data Mining and Neural Networks in Stata Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca mario.lucchini@unimib.it maurizio.pisati@unimib.it

More information

Machine Learning: Overview

Machine Learning: Overview Machine Learning: Overview Why Learning? Learning is a core of property of being intelligent. Hence Machine learning is a core subarea of Artificial Intelligence. There is a need for programs to behave

More information

Event driven trading new studies on innovative way. of trading in Forex market. Michał Osmoła INIME live 23 February 2016

Event driven trading new studies on innovative way. of trading in Forex market. Michał Osmoła INIME live 23 February 2016 Event driven trading new studies on innovative way of trading in Forex market Michał Osmoła INIME live 23 February 2016 Forex market From Wikipedia: The foreign exchange market (Forex, FX, or currency

More information

Overview. Introduction to Machine Learning. Definition of Learning. A Sample Data Set. Connectionist and Statistical Language Processing

Overview. Introduction to Machine Learning. Definition of Learning. A Sample Data Set. Connectionist and Statistical Language Processing Overview Introduction to Machine Learning Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes definition of learning sample

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.

More information

Neural Networks. Neural network is a network or circuit of neurons. Neurons can be. Biological neurons Artificial neurons

Neural Networks. Neural network is a network or circuit of neurons. Neurons can be. Biological neurons Artificial neurons Neural Networks Neural network is a network or circuit of neurons Neurons can be Biological neurons Artificial neurons Biological neurons Building block of the brain Human brain contains over 10 billion

More information

Neural Networks and Support Vector Machines

Neural Networks and Support Vector Machines INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,

More information

NEURAL NETWORKS IN DATA MINING

NEURAL NETWORKS IN DATA MINING NEURAL NETWORKS IN DATA MINING 1 DR. YASHPAL SINGH, 2 ALOK SINGH CHAUHAN 1 Reader, Bundelkhand Institute of Engineering & Technology, Jhansi, India 2 Lecturer, United Institute of Management, Allahabad,

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques. International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant

More information

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010. Title Introduction to Data Mining Dr Arulsivanathan Naidoo Statistics South Africa OECD Conference Cape Town 8-10 December 2010 1 Outline Introduction Statistics vs Knowledge Discovery Predictive Modeling

More information

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Machine Learning and Data Mining. Fundamentals, robotics, recognition Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,

More information

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

More information

Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu

Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Machine Learning CS 6830 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu What is Learning? Merriam-Webster: learn = to acquire knowledge, understanding, or skill

More information

D A T A M I N I N G C L A S S I F I C A T I O N

D A T A M I N I N G C L A S S I F I C A T I O N D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.

More information

THE COMPARISON OF DATA MINING TOOLS

THE COMPARISON OF DATA MINING TOOLS T.C. İSTANBUL KÜLTÜR UNIVERSITY THE COMPARISON OF DATA MINING TOOLS Data Warehouses and Data Mining Yrd.Doç.Dr. Ayça ÇAKMAK PEHLİVANLI Department of Computer Engineering İstanbul Kültür University submitted

More information

Why is Internal Audit so Hard?

Why is Internal Audit so Hard? Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets

More information

1. Classification problems

1. Classification problems Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification

More information

Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News

Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati

More information

More Data Mining with Weka

More Data Mining with Weka More Data Mining with Weka Class 5 Lesson 1 Simple neural networks Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Lesson 5.1: Simple neural networks Class

More information

Data Mining of Web Access Logs

Data Mining of Web Access Logs Data Mining of Web Access Logs A minor thesis submitted in partial fulfilment of the requirements for the degree of Master of Applied Science in Information Technology Anand S. Lalani School of Computer

More information

Data Mining Techniques for Prognosis in Pancreatic Cancer

Data Mining Techniques for Prognosis in Pancreatic Cancer Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree

More information

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) Machine Learning 1 Attribution Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) 2 Outline Inductive learning Decision

More information

Machine Learning Capacity and Performance Analysis and R

Machine Learning Capacity and Performance Analysis and R Machine Learning and R May 3, 11 30 25 15 10 5 25 15 10 5 30 25 15 10 5 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 100 80 60 40 100 80 60 40 100 80 60 40 30 25 15 10 5 25 15 10

More information

Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg

Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual

More information

Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python. Vikram Kamath Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI Data Mining Knowledge Discovery, Data Warehousing and Machine Learning Final remarks Lecturer: JERZY STEFANOWSKI Email: Jerzy.Stefanowski@cs.put.poznan.pl Data Mining a step in A KDD Process Data mining:

More information

Machine Learning and Statistics: What s the Connection?

Machine Learning and Statistics: What s the Connection? Machine Learning and Statistics: What s the Connection? Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh, UK August 2006 Outline The roots of machine learning

More information

Management Decision Making. Hadi Hosseini CS 330 David R. Cheriton School of Computer Science University of Waterloo July 14, 2011

Management Decision Making. Hadi Hosseini CS 330 David R. Cheriton School of Computer Science University of Waterloo July 14, 2011 Management Decision Making Hadi Hosseini CS 330 David R. Cheriton School of Computer Science University of Waterloo July 14, 2011 Management decision making Decision making Spreadsheet exercise Data visualization,

More information

Predictive Dynamix Inc

Predictive Dynamix Inc Predictive Modeling Technology Predictive modeling is concerned with analyzing patterns and trends in historical and operational data in order to transform data into actionable decisions. This is accomplished

More information

Scikit-Learn GUI. NETSI Team: Abhilash Nair, Sean Dai, Graham Wright, Rohit Kale. Client: Dr. Olufisayo Omojokun

Scikit-Learn GUI. NETSI Team: Abhilash Nair, Sean Dai, Graham Wright, Rohit Kale. Client: Dr. Olufisayo Omojokun Scikit-Learn GUI NETSI Team: Abhilash Nair, Sean Dai, Graham Wright, Rohit Kale Client: Dr. Olufisayo Omojokun Presentation Overview Introduction to Machine Learning Importance of Machine Learning Feasibility

More information

Role of Neural network in data mining

Role of Neural network in data mining Role of Neural network in data mining Chitranjanjit kaur Associate Prof Guru Nanak College, Sukhchainana Phagwara,(GNDU) Punjab, India Pooja kapoor Associate Prof Swami Sarvanand Group Of Institutes Dinanagar(PTU)

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

ANALYTICS IN BIG DATA ERA

ANALYTICS IN BIG DATA ERA ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Prediction of Cancer Count through Artificial Neural Networks Using Incidence and Mortality Cancer Statistics Dataset for Cancer Control Organizations

Prediction of Cancer Count through Artificial Neural Networks Using Incidence and Mortality Cancer Statistics Dataset for Cancer Control Organizations Using Incidence and Mortality Cancer Statistics Dataset for Cancer Control Organizations Shivam Sidhu 1,, Upendra Kumar Meena 2, Narina Thakur 3 1,2 Department of CSE, Student, Bharati Vidyapeeth s College

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

More information

Using Artificial Intelligence and Machine Learning Techniques. Some Preliminary Ideas. Presentation to CWiPP 1/8/2013 ICOSS Mark Tomlinson

Using Artificial Intelligence and Machine Learning Techniques. Some Preliminary Ideas. Presentation to CWiPP 1/8/2013 ICOSS Mark Tomlinson Using Artificial Intelligence and Machine Learning Techniques. Some Preliminary Ideas. Presentation to CWiPP 1/8/2013 ICOSS Mark Tomlinson Artificial Intelligence Models Very experimental, but timely?

More information

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann

More information

The KDD Process: Applying Data Mining

The KDD Process: Applying Data Mining The KDD Process: Applying Nuno Cavalheiro Marques (nmm@di.fct.unl.pt) Spring Semester 2010/2011 MSc in Computer Science Outline I 1 Knowledge Discovery in Data beyond the Computer 2 by Visualization Lift

More information

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY

More information

Rule based Classification of BSE Stock Data with Data Mining

Rule based Classification of BSE Stock Data with Data Mining International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification

More information

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

More information

A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining

A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining Sakshi Department Of Computer Science And Engineering United College of Engineering & Research Naini Allahabad sakshikashyap09@gmail.com

More information

Predictive Data modeling for health care: Comparative performance study of different prediction models

Predictive Data modeling for health care: Comparative performance study of different prediction models Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath hiremat.nitie@gmail.com National Institute of Industrial Engineering (NITIE) Vihar

More information

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1 Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms

First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms Azwa Abdul Aziz, Nor Hafieza IsmailandFadhilah Ahmad Faculty Informatics & Computing

More information

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE S. Anupama Kumar 1 and Dr. Vijayalakshmi M.N 2 1 Research Scholar, PRIST University, 1 Assistant Professor, Dept of M.C.A. 2 Associate

More information

Improving spam mail filtering using classification algorithms with discretization Filter

Improving spam mail filtering using classification algorithms with discretization Filter International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information