Machine Learning. Goals. Does it works. Notes

Size: px
Start display at page:

Download "Machine Learning. Goals. Does it works. Notes"

Transcription

1 Machine learning Introduction All the techniques that we have seen until now allow us to build intelligent systems The limitation of these systems is that they only can solve the problems their are programmed for But we only should consider a system intelligent if is also able to observe its environment and learn from it The real intelligence resides in adaptation, to be able to integrate new knowledge, to solve new problems, to learn from mistakes (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28 Goals Introduction The goal is not to model human learning The goal is to overcome the limitations of usual AI applications (KBS, planning, NLP, problem solving,...): Their limit is in the knowledge that they have Their capacities can not reach outside that limits It is not possible to foresee all possible problems from the beginning We are looking for programs that can adapt without being reprogrammed (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28 Does it works Introduction Does Really Work? Tom Mitchell. AI Magazine 1997 Where and what can machine learning be applied for? Tasks very difficult to program (face recognition, voice,...) Adaptable applications (intelligent interfaces, spam filters, recommendation systems,...) Data mining (intelligent data analysis) (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28

2 Types of machine learning Introduction Inductive Learning: Models are built from the generalization of examples. We look for patterns that explain the common characteristics of the examples. Deductive Learning: Deduction is applied to obtain generalizations from a solved example and its explanation. Genetic learning: Algorithms inspired in the theory of evolution are applied to find general description to groups of examples. onnexionist learning: Generalization is performed by the adaptation mechanisms of artificial neural networks. (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28 Inductive learning Inductive Learning Is the area with the most number of methods Goal: To discover general rules or concepts from a limited set of examples (common patterns) It is based on the search of similar characteristics among examples All its methods are based on inductive reasoning (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28 Inductive Learning Inductive reasoning vs Deductive reasoning Inductive reasoning It obtains general knowledge from specific information The knowledge obtained is new Its not truth preserving (new information can invalidate the knowledge obtained) It has not well founded theory Deductive reasoning It obtains general knowledge from general knowledge The knowledge is not new (it is implicit in the initial knowledge) New knowledge can not invalidate the knowledge already obtained Its basis is mathematical logic (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28

3 Inductive learning Inductive Learning From a formal point of view its results are invalid We suppose that a limited number of examples represent the characteristics of the concept that we want to learn Just only one counterexample invalidates the results But, most of the human learning is inductive! (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28 Learning as search (I) Search and inductive learning The usual way to view inductive learning is as a search problem The goal is to discover a function/representation that summarizes the characteristics of a set of examples The space of search is all the possible concepts that can be built There are different ways to perform the search (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28 Learning as search (II) Search and inductive learning Space of search: Language used to describe the concepts = Set of concepts that can be described by the language Search operators: Heuristic operators that allow to explore the space of concepts Heuristic function: Preference function that guides the search (Bias) (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28

4 Types of inductive learning Search and inductive learning Supervised inductive learning Each example is labeled with the concept it belongs to Learning is performed by contrast among concepts A set of heuristics allows to generate different hypothesis There is a criteria of preference (bias) that allows to choose the most suitable hypothesis for the examples Result: The concept or concepts that describe better the examples (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28 Types of inductive learning Search and inductive learning Unsupervised inductive learning Examples are not labeled We want to discover a suitable way to cluster the objects Learning is based on the discovery of similarity/dissimilarity among examples A heuristic preference criteria will guide the search Result: A partition of the examples and a characterization of the partitions (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28 We can learn a concept as the set of questions that allows to distinguish it from others Using a tree as representation formalism we can store and organize these questions Each node from the tree is a question about an attribute The search space is the set of all possible trees of questions This representation is equivalent to a DNF (2 2n ) (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28

5 To reduce the computational cost of searching this space we must to choose a bias (what kind of concepts are preferred) Decision: Tree that gives the minimal description of the goal concept given a set of examples Reason: Such kind of tree will be the better to predict new instances (the probability that unnecessary conditions appear is reduced) Occam s razor: the hypothesis that introduces the fewest assumptions and postulates the fewest entities is to be preferred (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28 Algorithms for One of the first algorithms for building decision trees is ID3 (Quinlan 1986) It is in the family of algorithms for Top Down Induction Decision Trees (TDIDT) ID3 performs a search using a Hill-limbing strategy in the space of decision trees For each level of the tree an attribute is chosen and the set of examples is split using the values of the attribute. This process is repeated recursively for each partition The selection of the attribute is performed using an heuristic function (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28 Information Theory Information theory studies among other things the coding of messages and the cost of their transmition If we define a set of messages M = {m 1, m 2,..., m n }, each one with probability P(m i ), we can define the quantity of information (I ) that a message M contains as: I (M) = n P(m i )log(p(m i )) i=1 This value can be interpreted as the information needed to discriminate the messages from M (Number of bit necessary to code the messages) (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28

6 Quantity of information as heuristic We can use an analogy from message coding assuming that the classes are messages and the proportion of examples from each class is their probability A decision tree can be seen as the coding that allows to discriminate among classes We are looking for the minimal code that discriminates among classes Each attribute is evaluated to decide if it is a part of the code An attribute is better than other if allows to discriminate better among classes (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28 Quantity of information as heuristic At each level of the tree we have to find the attribute that allows to minimize the code (minimizes the size of the tree) The attribute that allows that is the attribute that left less quantity of information to cover by other attributes The election of an attribute should result in subsets of examples that are biased towards one class We need a measure of the quantity of information not covered by an attribute (Entropy, E) (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28 Information Gain Quantity of information (X - examples, - classification) I (X, ) = c i c i X log( c i X ) Entropy (A - attribute, [A(x) = v i ] - examples with value v i ) Information Gain E(X, A, ) = v i A [A(x) = v i ] I ([A(x) = v i ], ) X G(X, A, ) = I (X, ) E(X, A, ) (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28

7 Information Gain I(X,) A=v1 A=v2 A=v E(X,A,) G(X,A,)= I(X,) E(X,A,) (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28 ID3 Algorithm Algorithm: ID3 (X : Examples, : lassification, A: Attributes) if all examples are from the same class then return a leave with the class name else ompute the quantity of information of the examples (I) foreach attribute in A do ompute the entropy (E) and the information gain (G) Pick the attribute that maximizes G (a) Delete a from the list of attributes (A) Generate a root node for the attribute a foreach partition generated by the values of the attribute a do Tree i =ID3(X (a=v i ), (a=v i ),A-a) generate a new branch with a=v i and Tree i return the root node for a (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28 Example (1) Let it be the following set of examples Ex. Eyes Hair Height lass 1 Blue Blonde Tall + 2 Blue Dark Medium + 3 Brown Dark Medium 4 Green Dark Medium 5 Green Dark Tall + 6 Brown Dark Small 7 Green Blonde Small 8 Blue Dark Medium + (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28

8 Example (2) I (X, ) = 1/2 log(1/2) 1/2 log(1/2) = 1 E(X, eyes) = (blue) 3/8 ( 1 log(1) 0 log(0)) + (brown) 2/8 ( 1 log(1) 0 log(0)) + (green) 3/8 ( 1/3 log(1/3) 2/3 log(2/3)) = 0,344 E(X, hair) = (blonde) 2/8 ( 1/2 log(1/2) 1/2 log(1/2)) + (dark) 6/8 ( 1/2 log(1/2) 1/2 log(1/2)) = 1 E(X, height) = (tall) 2/8 ( 1 log(1) 0 log(0)) + (medium) 4/8 ( 1/2 log(1/2) 1/2 log(1/2)) + (small) 2/8 (0 log(0) 1 log(1)) = 0,5 (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28 Example (3) We can see that the attribute eyes is the one that maximizes the function. G(X, eyes) = 1 0,344 = 0,656 G(X, hair) = 1 1 = 0 G(X, height) = 1 0,5 = 0,5 (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28 Example (4) This attributes generates the first level of the tree EYES BLUE BROWN GREEN 1,2,8 + 3,6 4,7 5 + (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28

9 Example (5) Now only in the node corresponding to the value green we have a mix of classes, so we repeat the process with these examples. Ex. Hair Height lass 4 Dark Medium 5 Dark Tall + 7 Blonde Small (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28 Example (6) I (X, ) = 1/3 log(1/3) 2/3 log(2/3) = 0,918 E(X, hair) = (blonde) 1/3 (0 log(0) 1 log(1)) + (dark) 2/3 ( 1/2 log(1/2) 1/2 log(1/2)) = 0,666 E(X, height) = (tall) 1/3 (0log(0) 1 log(1)) + (medium) 1/3 ( 1 log(1) 0 log(0)) + (small) 1/3 (0 log(0) 1 log(1)) = 0 (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28 Example (7) Now the attribute with the maximum value is eyes. G(X, hair) = 0,918 0,666 = 0,252 G(X, height) = 0,918 0 = 0,918 (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28

10 Example (8) The resulting tree totally is discriminant. EYES 1,2,8 + BLUE 3,6 BROWN GREEN HEIGHT TALL MEDIUM SMALL (LSI-FIB-UP) Artificial Intelligence Term 2009/ / 28

Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/2013 1 / 34

Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/2013 1 / 34 Machine Learning Javier Béjar cbea LSI - FIB Term 2012/2013 Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/2013 1 / 34 Outline 1 Introduction to Inductive learning 2 Search and inductive learning

More information

Introduction to Learning & Decision Trees

Introduction to Learning & Decision Trees Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1 Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Machine Learning: Overview

Machine Learning: Overview Machine Learning: Overview Why Learning? Learning is a core of property of being intelligent. Hence Machine learning is a core subarea of Artificial Intelligence. There is a need for programs to behave

More information

Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

More information

Professor Anita Wasilewska. Classification Lecture Notes

Professor Anita Wasilewska. Classification Lecture Notes Professor Anita Wasilewska Classification Lecture Notes Classification (Data Mining Book Chapters 5 and 7) PART ONE: Supervised learning and Classification Data format: training and test data Concept,

More information

Course 395: Machine Learning

Course 395: Machine Learning Course 395: Machine Learning Lecturers: Maja Pantic (maja@doc.ic.ac.uk) Stavros Petridis (sp104@doc.ic.ac.uk) Goal (Lectures): To present basic theoretical concepts and key algorithms that form the core

More information

Classification and Prediction

Classification and Prediction Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

More information

Data Mining for Knowledge Management. Classification

Data Mining for Knowledge Management. Classification 1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

More information

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil brunorocha_33@hotmail.com 2 Network Engineering

More information

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next

More information

Specific Usage of Visual Data Analysis Techniques

Specific Usage of Visual Data Analysis Techniques Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia

More information

Learning is a very general term denoting the way in which agents:

Learning is a very general term denoting the way in which agents: What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

More information

Data mining techniques: decision trees

Data mining techniques: decision trees Data mining techniques: decision trees 1/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 2/39 1 Agenda Rule systems Building rule systems vs rule systems Quick reference 3/39

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Process Mining. ^J Springer. Discovery, Conformance and Enhancement of Business Processes. Wil M.R van der Aalst Q UNIVERS1TAT.

Process Mining. ^J Springer. Discovery, Conformance and Enhancement of Business Processes. Wil M.R van der Aalst Q UNIVERS1TAT. Wil M.R van der Aalst Process Mining Discovery, Conformance and Enhancement of Business Processes Q UNIVERS1TAT m LIECHTENSTEIN Bibliothek ^J Springer Contents 1 Introduction I 1.1 Data Explosion I 1.2

More information

Full and Complete Binary Trees

Full and Complete Binary Trees Full and Complete Binary Trees Binary Tree Theorems 1 Here are two important types of binary trees. Note that the definitions, while similar, are logically independent. Definition: a binary tree T is full

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016 Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00

More information

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA Welcome Xindong Wu Data Mining: Updates in Technologies Dept of Math and Computer Science Colorado School of Mines Golden, Colorado 80401, USA Email: xwu@ mines.edu Home Page: http://kais.mines.edu/~xwu/

More information

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.

More information

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010. Title Introduction to Data Mining Dr Arulsivanathan Naidoo Statistics South Africa OECD Conference Cape Town 8-10 December 2010 1 Outline Introduction Statistics vs Knowledge Discovery Predictive Modeling

More information

Data Mining Techniques

Data Mining Techniques 15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses

More information

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos) Machine Learning Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos) What Is Machine Learning? A computer program is said to learn from experience E with respect to some class of

More information

D A T A M I N I N G C L A S S I F I C A T I O N

D A T A M I N I N G C L A S S I F I C A T I O N D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Data Mining Applications in Fund Raising

Data Mining Applications in Fund Raising Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

Rule based Classification of BSE Stock Data with Data Mining

Rule based Classification of BSE Stock Data with Data Mining International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification

More information

Equational Reasoning as a Tool for Data Analysis

Equational Reasoning as a Tool for Data Analysis AUSTRIAN JOURNAL OF STATISTICS Volume 31 (2002), Number 2&3, 231-239 Equational Reasoning as a Tool for Data Analysis Michael Bulmer University of Queensland, Brisbane, Australia Abstract: A combination

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

Big Data: The Science of Patterns. Dr. Lutz Hamel Dept. of Computer Science and Statistics hamel@cs.uri.edu

Big Data: The Science of Patterns. Dr. Lutz Hamel Dept. of Computer Science and Statistics hamel@cs.uri.edu Big Data: The Science of Patterns Dr. Lutz Hamel Dept. of Computer Science and Statistics hamel@cs.uri.edu The Blessing and the Curse: Lots of Data Outlook Temp Humidity Wind Play Sunny Hot High Weak No

More information

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D. Data Mining on Social Networks Dionysios Sotiropoulos Ph.D. 1 Contents What are Social Media? Mathematical Representation of Social Networks Fundamental Data Mining Concepts Data Mining Tasks on Digital

More information

T-61.6010 Non-discriminatory Machine Learning

T-61.6010 Non-discriminatory Machine Learning T-61.6010 Non-discriminatory Machine Learning Seminar 1 Indrė Žliobaitė Aalto University School of Science, Department of Computer Science Helsinki Institute for Information Technology (HIIT) University

More information

IMPROVING PIPELINE RISK MODELS BY USING DATA MINING TECHNIQUES

IMPROVING PIPELINE RISK MODELS BY USING DATA MINING TECHNIQUES IMPROVING PIPELINE RISK MODELS BY USING DATA MINING TECHNIQUES María Fernanda D Atri 1, Darío Rodriguez 2, Ramón García-Martínez 2,3 1. MetroGAS S.A. Argentina. 2. Área Ingeniería del Software. Licenciatura

More information

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One

More information

Data Preprocessing. Week 2

Data Preprocessing. Week 2 Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

More information

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Machine Learning and Data Mining. Fundamentals, robotics, recognition Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,

More information

Federico Rajola. Customer Relationship. Management in the. Financial Industry. Organizational Processes and. Technology Innovation.

Federico Rajola. Customer Relationship. Management in the. Financial Industry. Organizational Processes and. Technology Innovation. Federico Rajola Customer Relationship Management in the Financial Industry Organizational Processes and Technology Innovation Second edition ^ Springer Contents 1 Introduction 1 1.1 Identification and

More information

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

The KDD Process: Applying Data Mining

The KDD Process: Applying Data Mining The KDD Process: Applying Nuno Cavalheiro Marques (nmm@di.fct.unl.pt) Spring Semester 2010/2011 MSc in Computer Science Outline I 1 Knowledge Discovery in Data beyond the Computer 2 by Visualization Lift

More information

Data Mining in the Application of Criminal Cases Based on Decision Tree

Data Mining in the Application of Criminal Cases Based on Decision Tree 8 Journal of Computer Science and Information Technology, Vol. 1 No. 2, December 2013 Data Mining in the Application of Criminal Cases Based on Decision Tree Ruijuan Hu 1 Abstract A briefing on data mining

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Data Mining and Neural Networks in Stata

Data Mining and Neural Networks in Stata Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca mario.lucchini@unimib.it maurizio.pisati@unimib.it

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

Optimization of C4.5 Decision Tree Algorithm for Data Mining Application

Optimization of C4.5 Decision Tree Algorithm for Data Mining Application Optimization of C4.5 Decision Tree Algorithm for Data Mining Application Gaurav L. Agrawal 1, Prof. Hitesh Gupta 2 1 PG Student, Department of CSE, PCST, Bhopal, India 2 Head of Department CSE, PCST, Bhopal,

More information

AnalysisofData MiningClassificationwithDecisiontreeTechnique

AnalysisofData MiningClassificationwithDecisiontreeTechnique Global Journal of omputer Science and Technology Software & Data Engineering Volume 13 Issue 13 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of

More information

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

Web Document Clustering

Web Document Clustering Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

More information

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann

More information

Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University. www.cs.cmu.edu/~awm awm@cs.cmu.

Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University. www.cs.cmu.edu/~awm awm@cs.cmu. Decision Trees Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 42-268-7599 Copyright Andrew W. Moore Slide Decision Trees Decision trees

More information

TIETS34 Seminar: Data Mining on Biometric identification

TIETS34 Seminar: Data Mining on Biometric identification TIETS34 Seminar: Data Mining on Biometric identification Youming Zhang Computer Science, School of Information Sciences, 33014 University of Tampere, Finland Youming.Zhang@uta.fi Course Description Content

More information

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate

More information

DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE

DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE 1 K.Murugan, 2 P.Varalakshmi, 3 R.Nandha Kumar, 4 S.Boobalan 1 Teaching Fellow, Department of Computer Technology, Anna University 2 Assistant

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant

More information

Data Mining for Business Analytics

Data Mining for Business Analytics Data Mining for Business Analytics Lecture 2: Introduction to Predictive Modeling Stern School of Business New York University Spring 2014 MegaTelCo: Predicting Customer Churn You just landed a great analytical

More information

A Study of Detecting Credit Card Delinquencies with Data Mining using Decision Tree Model

A Study of Detecting Credit Card Delinquencies with Data Mining using Decision Tree Model A Study of Detecting Credit Card Delinquencies with Data Mining using Decision Tree Model ABSTRACT Mrs. Arpana Bharani* Mrs. Mohini Rao** Consumer credit is one of the necessary processes but lending bears

More information

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

More information

An Introduction to Advanced Analytics and Data Mining

An Introduction to Advanced Analytics and Data Mining An Introduction to Advanced Analytics and Data Mining Dr Barry Leventhal Henry Stewart Briefing on Marketing Analytics 19 th November 2010 Agenda What are Advanced Analytics and Data Mining? The toolkit

More information

DATA MINING METHODS WITH TREES

DATA MINING METHODS WITH TREES DATA MINING METHODS WITH TREES Marta Žambochová 1. Introduction The contemporary world is characterized by the explosion of an enormous volume of data deposited into databases. Sharp competition contributes

More information

Foundations of Artificial Intelligence. Introduction to Data Mining

Foundations of Artificial Intelligence. Introduction to Data Mining Foundations of Artificial Intelligence Introduction to Data Mining Objectives Data Mining Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees Present

More information

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO Agenda Difficult text analytics tasks Feature extraction Bio-inspired

More information

Using Artificial Intelligence to Manage Big Data for Litigation

Using Artificial Intelligence to Manage Big Data for Litigation FEBRUARY 3 5, 2015 / THE HILTON NEW YORK Using Artificial Intelligence to Manage Big Data for Litigation Understanding Artificial Intelligence to Make better decisions Improve the process Allay the fear

More information

Knowledge-based systems and the need for learning

Knowledge-based systems and the need for learning Knowledge-based systems and the need for learning The implementation of a knowledge-based system can be quite difficult. Furthermore, the process of reasoning with that knowledge can be quite slow. This

More information

Supervised and unsupervised learning - 1

Supervised and unsupervised learning - 1 Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in

More information

Decision-Tree Learning

Decision-Tree Learning Decision-Tree Learning Introduction ID3 Attribute selection Entropy, Information, Information Gain Gain Ratio C4.5 Decision Trees TDIDT: Top-Down Induction of Decision Trees Numeric Values Missing Values

More information

FUNDAMENTALS OF MACHINE LEARNING FOR PREDICTIVE DATA ANALYTICS Algorithms, Worked Examples, and Case Studies

FUNDAMENTALS OF MACHINE LEARNING FOR PREDICTIVE DATA ANALYTICS Algorithms, Worked Examples, and Case Studies FreeChapter InformationBasedLearning 2015/6/12 18:02 Page i #1 FUNDAMENTALS OF MACHINE LEARNING FOR PREDICTIVE DATA ANALYTICS Algorithms, Worked Examples, and Case Studies John D. Kelleher Brian Mac Namee

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Machine Learning and Statistics: What s the Connection?

Machine Learning and Statistics: What s the Connection? Machine Learning and Statistics: What s the Connection? Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh, UK August 2006 Outline The roots of machine learning

More information

Self-Improving Supply Chains

Self-Improving Supply Chains Self-Improving Supply Chains Cyrus Hadavi Ph.D. Adexa, Inc. All Rights Reserved January 4, 2016 Self-Improving Supply Chains Imagine a world where supply chain planning systems can mold themselves into

More information

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca ablancogo@upsa.es Spain Manuel Martín-Merino Universidad

More information

Introduction to Data Mining Techniques

Introduction to Data Mining Techniques Introduction to Data Mining Techniques Dr. Rajni Jain 1 Introduction The last decade has experienced a revolution in information availability and exchange via the internet. In the same spirit, more and

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence 8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

More information

Implementation of Data Mining Techniques for Weather Report Guidance for Ships Using Global Positioning System

Implementation of Data Mining Techniques for Weather Report Guidance for Ships Using Global Positioning System International Journal Of Computational Engineering Research (ijceronline.com) Vol. 3 Issue. 3 Implementation of Data Mining Techniques for Weather Report Guidance for Ships Using Global Positioning System

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

How To Perform An Ensemble Analysis

How To Perform An Ensemble Analysis Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier

More information

Data Mining for Fun and Profit

Data Mining for Fun and Profit Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) Machine Learning 1 Attribution Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) 2 Outline Inductive learning Decision

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

203.4770: Introduction to Machine Learning Dr. Rita Osadchy

203.4770: Introduction to Machine Learning Dr. Rita Osadchy 203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:

More information

Efficient Integration of Data Mining Techniques in Database Management Systems

Efficient Integration of Data Mining Techniques in Database Management Systems Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France

More information

Indian Agriculture Land through Decision Tree in Data Mining

Indian Agriculture Land through Decision Tree in Data Mining Indian Agriculture Land through Decision Tree in Data Mining Kamlesh Kumar Joshi, M.Tech(Pursuing 4 th Sem) Laxmi Narain College of Technology, Indore (M.P) India k3g.kamlesh@gmail.com 9926523514 Pawan

More information

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India lav_dlr@yahoo.com

More information