Machine Learning and Data Mining. Fundamentals, robotics, recognition



Similar documents
Machine Learning Introduction

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Learning is a very general term denoting the way in which agents:

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Information Management course

Knowledge-based systems and the need for learning

Introduction to Data Mining

not possible or was possible at a high cost for collecting the data.

An Introduction to Data Mining

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Data Mining for Business Analytics

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

The Scientific Data Mining Process

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Machine Learning. CS494/594, Fall :10 AM 12:25 PM Claxton 205. Slides adapted (and extended) from: ETHEM ALPAYDIN The MIT Press, 2004

from Larson Text By Susan Miertschin

Data Mining Algorithms Part 1. Dejan Sarka

Introduction to Machine Learning Using Python. Vikram Kamath

Machine Learning: Overview

Data Mining Solutions for the Business Environment

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Statistics for BIG data

MA2823: Foundations of Machine Learning

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

: Introduction to Machine Learning Dr. Rita Osadchy

DATA MINING TECHNIQUES AND APPLICATIONS

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Introduction to Data Mining

Maschinelles Lernen mit MATLAB

Data Mining: Overview. What is Data Mining?

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Introduction to Data Mining

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Lecture Slides for INTRODUCTION TO. ETHEM ALPAYDIN The MIT Press, Lab Class and literature. Friday, , Harburger Schloßstr.

Machine Learning, Data Mining, and Knowledge Discovery: An Introduction

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

Course 395: Machine Learning

Foundations of Artificial Intelligence. Introduction to Data Mining

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Customer Classification And Prediction Based On Data Mining Technique

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

An Overview of Database management System, Data warehousing and Data Mining

A Review of Data Mining Techniques

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Specific Usage of Visual Data Analysis Techniques

Chapter 12 Discovering New Knowledge Data Mining

Data Mining Applications in Higher Education

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

An Overview of Knowledge Discovery Database and Data mining Techniques

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

Integrated Data Mining and Knowledge Discovery Techniques in ERP

Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University)

Perspectives on Data Mining

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Data Mining System, Functionalities and Applications: A Radical Review

Learning outcomes. Knowledge and understanding. Competence and skills

Machine Learning for Data Science (CS4786) Lecture 1

MS1b Statistical Data Mining

Introduction to Artificial Intelligence G51IAI. An Introduction to Data Mining

Data Mining Techniques

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Software Development Training Camp 1 (0-3) Prerequisite : Program development skill enhancement camp, at least 48 person-hours.

Data Mining Part 5. Prediction

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Machine Learning in Bioinformatics

Process Mining. ^J Springer. Discovery, Conformance and Enhancement of Business Processes. Wil M.R van der Aalst Q UNIVERS1TAT.

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Data Mining + Business Intelligence. Integration, Design and Implementation

Dynamic Data in terms of Data Mining Streams

Data Mining On Diabetics

Data Mining: An Introduction

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Data Mining for Fun and Profit

Gerard Mc Nulty Systems Optimisation Ltd BA.,B.A.I.,C.Eng.,F.I.E.I

IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

Comparison of K-means and Backpropagation Data Mining Algorithms

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Data Isn't Everything

Data mining and official statistics

Chapter 20: Data Analysis

An Introduction to Advanced Analytics and Data Mining

Obtaining Value from Big Data

Industrial Roadmap for Connected Machines. Sal Spada Research Director ARC Advisory Group

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY

8. Machine Learning Applied Artificial Intelligence

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

Transcription:

Machine Learning and Data Mining Fundamentals, robotics, recognition

Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations

Data Mining, Knowledge Discovery in Databases, Machine Learning, & All That Stuff What do these terms mean? Data Mining: Application of algorithms to search for patterns and relationships that may exist in large databases. Because data sets are so large, many relationships are possible. To search this space of possibilities, machine learning techniques are used. Part of the overall knowledge discovery process correct use of term data mining is that it is part of process concerned with finding patterns in data. In industry, data mining is often used for the whole process

Knowledge Discovery in Databases (KDD) Identifying non-trivial, valid, useful patterns in data. What is a pattern? Says something about the probability distribution of variables For example, credit card fraud: Variables may be: amount spent in one day, items purchased, where items where purchased, etc. Prob(fraud lives in Newark and 5 color TV s bought in NYC in 2 hours) = HIGH! What is data (?what are data?): For the purposes of data mining, database rows represent a record or instance columns describe attributes of that instance. like in constructive induction, decision trees, rough sets, etc.

Knowledge discovery usually involves domain expert and data mining analyst. Other steps include gathering, preparing, storing, and accessing data. Also, ultimately understanding the patterns discovered. Steps in Knowledge discovery: (from George Johns) Understand and define problem So that you don t mindless apply inappropriate algorithms and get worthless (or misleading!) results Extract data Data usually already exists in database, but not in the form you need! Also, you need to decide what data you need, this is where expert comes in handy Understand and clean data Make sure you have consistent data Make sure data is not too consistent. (Is what you are trying to predict one of the variables you know?)

Data engineering (perhaps part of understand & clean) Deal with missing variables, rescale data, combine similar attributes Algorithm engineering (this is the machine learning part) Figure out what algorithm to use or write one Run data mining algorithm art and science in this. Use part of data for training, part for testing Adjust parameters Preliminary evaluation of results Refine data and problem Use the results to accomplish some goal.

Machine Learning: This is the algorithm part of the data mining process But machine learning means more: Computer program that improves its performance at some task through experience. - Tom Mitchell (1997 - Machine Learning) A learning system uses sample data to generate an updated basis for improved [performance] on subsequent data from the same source and expresses the new basis in intelligible symbolic form. - Donald Michie (1991 - Computer Journal) Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the same task or tasks drawn from the same population more effectively the next time. Herbert Simon (1983 - Machine Learning I) Last two are from David W. Aha s tutorial on Machine Learning (http://www.aic.nrl.navy.mil/~aha/

What is learning according to Webster? To gain knowledge or understanding of or skill in by study, instruction, or experience To come to be able, to come to realize Robot learning - getting a robot to do all this good stuff

Challenges Specific to Robot Learning Situatedness in the world Noise, imperfection, occlusion, dynamics, hard to model, etc. Real time constraints Slow learners die easily in real world Supervised/Unsupervised mix Often supervised means a bad teacher Simultaneous and multimodal Need to walk and talk (not just either or)

How does this relate to data mining? What we will discuss is really machine learning algorithms Learning task is data driven. For example in checkers learning problem, many board positions are generated. Data is position of pieces and final state of game (win, lose, draw) Learning reduces to searching a large space of hypotheses to find one that best fits observed data. Find patterns or relationships that can be generalized Really the same problem as data mining. Machine learning algorithms deal with how this search is done.

How does this differ from what people have been doing for years? Scientific method: Hypothesis -> Design Experiment -> Gather Data -> Test hypothesis Data mining is more data centric Data may have been gathered as part of an overall process or may be just an accidental by-product. Examples: Data from discount cards at supermarket Credit card information Historical chemical process control data Information on patients gathered by hospitals Visual and chemical information collected by a robot in field

Another important difference: Not only can specific relationships be obtained, but also real knowledge may be deduced by a computer program From observed positions of planets, could Newton s law of gravitation be deduced automatically? Causal inference: Can causal relationships be inferred from statistical data? In fact, data mining techniques (Autoclass) have been use to learn new, useful information from astronomical data. Under certain assumptions, causation can be deduced from statistical data. This is a different type of discovery than determining correctness of model by data fitting.

Let s get more specific - Some common examples of data mining: Credit worthiness: Should a bank make a loan to someone? Data consists of financial attributes Income Debt Credit history Length of time in job Residence Each attribute a set of values (discrete or continuous) You also have a set of instances (cases) These may be used for training (supervised learning) (Sometimes you look for patterns, no training - unsupervised learning) Ultimate goal is to predict the likelihood of default based on a new customer s attributes.

Marketing: Use past buying habits to predict likelihood of customer purchasing some new product Health care Look for causal relationships between environment and disease Investment Predict the stock market? Textual datamining internet Bioinformatics image processing Astronomy Chemistry Speech recognition Machine learning methods applied to signal and image analysis Robotics speech, sensor arrays, images, sensor integration,prediction, recognition

Most data mining methods involve classification and clustering Given a collection of observed attributes (possibly a very high dimensional space), can the value of some target attribute be predicted? Classification: Supervised learning A set of data (instances, attribute values) with known outcome (class) is used to train algorithm New data is classified based upon trained algorithm Clustering: Unsupervised learning Patterns are discovered in data based upon is there significant clustering of cases such that assignment to classes can be made?

Methods: Regression (linear, non-linear, multiple variables) Decision Trees Neural networks Bayesian classifiers Baysian networks Constructive Induction (last lecture) Genetic algorithms Many resources available on Internet A few to start with http://www.kdnuggets.com/ http://www.ai.univie.ac.at/oefai/ml/ml-resources.html http://www.cs.cmu.edu/~tom/ Assignment: Download Enhancements to the Data Mining Process George Johns thesis (now a book as well) Read Chapter 1 What is Data Mining?

Questions and Problems What is your understanding of relations between Machine Learning, Data Mining and Knowledge Discovery in Data Bases Think about interesting data base that can be mined and propose the complete data mining method based on methods that you learned in this class. How data mining can be used in Intelligent Robotics? Propose a robot that will use Knowledge Discovery methods Propose an Internet Robot with Data Mining abilities