Data Mining An introduction

Size: px
Start display at page:

Download "Data Mining An introduction"

Transcription

1 Data Mining An introduction Devert Alexandre School of Software Engineering of USTC 13 February 2012 Slide 1/1

2 Table of Contents Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 2/1

3 Purpose Data mining Looking for data inside data Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 3/1

4 Purpose But what s the point of looking for data in data? Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 4/1

5 Purpose Data mining Looking for small meaningful data inside a lot of raw data Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 5/1

6 Table of Contents Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 6/1

7 Dataset A dataset is a lump of data, usually without much structure Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 7/1

8 Old Faithful Old Faithful is a geyser located in Wyoming, in Yellowstone National Park, in the United States Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 8/1

9 Old Faithful A geyser can teach us a lot about what s going on underground Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 9/1

10 Old Faithful Geologists are observing the geyser activity 1 eruption duration 2 time since previous eruption 3 geyser height duration interval height Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 10/1

11 Old Faithful A quick look shows us Old Faithful is not random interval duration Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 11/1

12 Old Faithful A quick look shows us Old Faithful is not random height interval Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 11/1

13 Old Faithful A quick look shows us Old Faithful is not random height duration Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 11/1

14 Old Faithful Geologists would like to know 1 how to sum-up all those data? 2 can we learn something new? 3 can we predict the eruptions? 4 can we detect anomalies? Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 12/1

15 Planets discovery Since 1992, astronomers found direct evidence of planets around others stars As of 4 February 2012, 758 known extrasolar planets around 707 stars. Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 13/1

16 Planets discovery One way to find planets works is the transit method It works very well! Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 14/1

17 Planets discovery The 7th of March 2009, the Kepler space observatory have been launched and put in Sun s orbit. Kepler performs the transit method on stars Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 15/1

18 Planets discovery Kepler s data are not easy to analyse Stars luminosity is variable No such a thing as a perfect sensor Useful signal level close to noise level Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 16/1

19 Planets discovery Typical extract of Kepler s data e4 Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 17/1

20 Planets discovery Kepler returns a lot of data : 100 Gb/months Years of work to look all the data High rate of false detections Confirming a planet candidate is expensive Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 18/1

21 On-demand streaming media One very popular usage for Internet is to watch movies Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 19/1

22 On-demand streaming media You have dozen of thousands of movies. You have millions of users. How to recommend to each user movies they will like? Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 20/1

23 On-demand streaming media So important, that company NetFlix offered $1 millions to solve that problem Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 21/1

24 On-demand streaming media Too much data to search by hand 78 millions of past recommendation to analyse What are the different kind of users What factors change users preferences Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 22/1

25 More applications More examples? Financial planning & prediction Molecules discovery for new drugs Large networks monitoring Factory monitoring Market studies... Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 23/1

26 Table of Contents Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 24/1

27 Problems categories Several types of problems have been identified Clustering Classification Regression Dimension reduction Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 25/1

28 Clustering Putting elements of a dataset in a group of related elements Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 26/1

29 Clustering Clustering try to Find number of different groups of similar data Which data belongs to which group There are many clustering algorithms Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 27/1

30 Applications Finding group is very popular data-mining application Finding group of customers Automatic suggestion Data fusion Picture segmentation Data compression Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 28/1

31 Data fusion Tags are very helpful for searching information But many tags for same things, or similar things Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 29/1

32 Data fusion Clustering the tags makes the tagging system ever more useful Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 30/1

33 Picture segmentation Take the colors of a picture, cluster them. Each pixel belong to a cluster. Cheap & effective processing step for object recognition! Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 31/1

34 Picture segmentation 1 Take many pictures of faces 2 Take their colors 3 Computes clusters of colors 4 Keep clusters containing skin-like color Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 32/1

35 Picture segmentation Instead of using just using the pixels color local orientation (Gabor filters) Fourier coefficients... Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 33/1

36 Picture segmentation Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 34/1

37 Data compression Find group of related colors to reduce number of colors 24 bits/pixels 4 bits/pixels 4 bits/pixels dithered Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 35/1

38 Data compression Find group of related colors to reduce number of colors 24 bits/pixels 4 bits/pixels 4 bits/pixels dithered Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 36/1

39 Classification You have separated your dataset into 2 or more groups. A classifier will tell to which group belongs new, incoming data instances The classifier is built from existing examples data And we call all that classification Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 37/1

40 Classification Note the (huge) differences with clustering We don t search for groups in data The groups are already defined There is a learning step building the classifier There are data not from the dataset it s what we classify Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 38/1

41 Example the example dataset all the dots the groups or class blue & yellow classifier the red line classification blue side or yellow side of the line Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 39/1

42 Classification Different algorithms give different informations say to which group data belong to say probability to belong to a given group Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 40/1

43 Classification Different algorithms works differently classifier built example by example iterative or online algorithm classifier Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 41/1

44 Applications Obviously, automatic recognition Finding objects in pictures Speech recognition Optical characters recognition Biometric identification Document classification... Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 42/1

45 Models Let s say you have a model to produce data. A model can be a simulation of the system you get data from equations of something you observe a relation between some variables of your data Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 43/1

46 Models Model to predict luminosity of a star size, mass, energy of the star number of planets distance, speed, mass, size of a planet Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 44/1

47 Regression Regression try to accomplish the following goal Tuning a model, such as the model give the best explanations of some dataset you get. Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 45/1

48 Example Model for the data y = ax + b dots regression data red line tuned model Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 46/1

49 Example Model for the data y = ax 2 + bx + c dots regression data red line tuned model Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 47/1

50 Regression Different algorithms give different informations find a tuning of the model to match the data say to which amount a tuning of the model matches the data Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 48/1

51 Regression Regression does not give explanations of data With enough parameters, any model can generate any data A wrong model might be able to generate the data you used for the regression Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 49/1

52 Applications Prediction Automatic recognition Data compression Anomaly detection Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 50/1

53 Prediction If you trust your model Tune your model with past data Generate future data with the tuned model Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 51/1

54 Data compression If your model is smaller than the dataset you used for the regression is accurate enough You have a lossy compression scheme! Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 52/1

55 Data compression 100 dots, but 2 coefficients might be good enough to sum-up the data Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 53/1

56 Anomaly detection & automatic recognition With some families of model Tune your model to match normal data Some models can tell how likely to be generated by the model some data are Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 54/1

57 Dimension reduction Raw data are often in unfamiliar spaces with weird geometries Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 55/1

58 Dimension reduction Take a space of handwritten letters shapes : allographs A space with 1000 dimensions, mapped into 3 dimensions Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 56/1

59 Dimension reduction How to define the distance between 2 shapes? How to build a meaningful map of the shapes? Where would be a new shape on the map? Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 57/1

60 Dimension reduction Take the Netflix movie database How to define the distance between 2 movies? How to build a meaningful map of the movies? Where would be a new movie on the map? Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 58/1

61 Dimension reduction Dimension reduction try to accomplish the following goal Mapping a dataset in a low-dimension space, such as related data are close, less related data are far Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 59/1

62 Applications Dimension reduction is often used as a pre-processing step Data visualization Automatic recognition Data compression Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 60/1

63 Data visualization Dimension reduction helps to simplify while preserving meaning of data Many algorithms would fail on the folded dataset, but work well on the unfolded version. Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 61/1

64 Automatic recognition Use dimension reduction of complex data, then clustering to find groups Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 62/1

65 Data compression Dimension reduction techniques to build codebooks You can approximate each 550 members of the Turkish parliament by combining those faces! Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 63/1

66 Table of Contents Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 64/1

67 Real-world data-mining Real-world data-mining never fits perfectly in a problem category Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 65/1

68 Real-world data-mining Real-world data-mining are a blend of tweaked versions of standard algorithms Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 66/1

69 Real-world data-mining My goal Giving you the basics to understand real-world data-mining techniques Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide 67/1

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Introducing Machine Learning

Introducing Machine Learning Introducing Machine Learning What is Machine Learning? Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational

More information

DATA SCIENCE CONSULTING GIVE YOUR DATA MEANING

DATA SCIENCE CONSULTING GIVE YOUR DATA MEANING DATA SCIENCE CONSULTING GIVE YOUR DATA MEANING GIVE YOUR DATA MEANING: WITH DATA SCIENCE CONSULTING! Comma Data Science Consulting supports in optimizing business challenges with stateof-the-art methods

More information

The Search for Other Earths. Dave Meyer

The Search for Other Earths. Dave Meyer The Search for Other Earths Dave Meyer The distance to Mars (when it is closest to Earth) is 4 light-minutes The Rover evidence is consistent with an ancient north Martian sea (larger than the Great

More information

From Data to next best action, using Predictive Analytics SPSS MODELER

From Data to next best action, using Predictive Analytics SPSS MODELER From Data to next best action, using Predictive Analytics SPSS MODELER Agenda Introduction to Predictive Analytics and Data Mining IBM SPSS Modeler Work Bench Data Preparation and Data Understanding Automated

More information

Hand-drawn Digital Logic Circuit Component Recognition using SVM

Hand-drawn Digital Logic Circuit Component Recognition using SVM Hand-drawn Digital Logic Circuit Component Recognition using SVM Mayuri D. Patare Post Graduate Student Jawaharlal Nehru Engineering College, Aurangabad, Maharashtra, India. Madhuri S. Joshi, PhD Professor

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

CHAPTER 5 SENDER AUTHENTICATION USING FACE BIOMETRICS

CHAPTER 5 SENDER AUTHENTICATION USING FACE BIOMETRICS 74 CHAPTER 5 SENDER AUTHENTICATION USING FACE BIOMETRICS 5.1 INTRODUCTION Face recognition has become very popular in recent years, and is used in many biometric-based security systems. Face recognition

More information

Predicting Short-Range Displacements From Sensor Data

Predicting Short-Range Displacements From Sensor Data Predicting Short-Range Displacements From Sensor Data 1 Introduction Maurice Shih and Jun-Ting Hsieh With the advent of Internet of Things (IoT) and mobile devices, a key question is, how much can we learn

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

2/3/2009. Color. Today! Sensing Color Coding color systems Models of Reflectance Applications

2/3/2009. Color. Today! Sensing Color Coding color systems Models of Reflectance Applications Color Today! Sensing Color Coding color systems Models of Reflectance Applications 1 Color Complexity Many theories, measurement techniques, and standards for colors, yet no one theory of human color perception

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

AN EFFICIENT PREPROCESSING AND POSTPROCESSING TECHNIQUES IN DATA MINING

AN EFFICIENT PREPROCESSING AND POSTPROCESSING TECHNIQUES IN DATA MINING INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 AN EFFICIENT PREPROCESSING AND POSTPROCESSING TECHNIQUES IN DATA MINING R.Tamilselvi 1, B.Sivasakthi 2, R.Kavitha

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications

Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications Data Mining In Modern Astronomy Sky Surveys: Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications Ching-Wa Yip cwyip@pha.jhu.edu; Bloomberg 518 Human are Great Pattern Recognizers

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;

More information

Extrasolar Planet Detection

Extrasolar Planet Detection Extrasolar Planet Detection 1 Introduction: As of February 20, 2013, 861 exoplanets planets that orbit stars other than our own Sun are known to exist. Additionally, at least 128 stars are known to have

More information

Layout Based Visualization Techniques for Multi Dimensional Data

Layout Based Visualization Techniques for Multi Dimensional Data Layout Based Visualization Techniques for Multi Dimensional Data Wim de Leeuw Robert van Liere Center for Mathematics and Computer Science, CWI Amsterdam, the Netherlands wimc,robertl @cwi.nl October 27,

More information

USING GREY WOLF OPTIMIZER FOR IMAGE REGISTRATION

USING GREY WOLF OPTIMIZER FOR IMAGE REGISTRATION USING GREY WOLF OPTIMIZER FOR IMAGE REGISTRATION Pranjali Rathee 1, Ritu Garg 2, Sonal Meena 3 1,2,3 Department of Computer Science Engineering, Indira Gandhi Delhi Technical University for Women, Kashmere

More information

CHAPTER 1` INTRODUCTION

CHAPTER 1` INTRODUCTION CHAPTER 1` INTRODUCTION 1.1 Introduction In electrical engineering field, image processing is any form of signal processing. For mostly the input is a photograph, which is in picture or video frame; while

More information

PHOTO CLUB MAY 5, 2015 DOTS, PIXELS AND BITS

PHOTO CLUB MAY 5, 2015 DOTS, PIXELS AND BITS PHOTO CLUB MAY 5, 2015 DOTS, PIXELS AND BITS SENSOR SIZE Sensor Size determines the number of Megapixels the camera has. In other words it defines the pixel width and height of the image saved. The actual

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

An Automatic License Plate Recognition (ALPR) System. Cpt. eng. Cristian MOLDER Military Technical Academy, Bucharest

An Automatic License Plate Recognition (ALPR) System. Cpt. eng. Cristian MOLDER Military Technical Academy, Bucharest An Automatic License Plate Recognition (ALPR) System Cpt. eng. Cristian MOLDER Military Technical Academy, Bucharest Applications of Automatic License Plate Recognition Parking Access Control Motorway

More information

K-means Clustering Technique on Search Engine Dataset using Data Mining Tool

K-means Clustering Technique on Search Engine Dataset using Data Mining Tool International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 6 (2013), pp. 505-510 International Research Publications House http://www. irphouse.com /ijict.htm K-means

More information

I. INTRODUCTION. A. Background

I. INTRODUCTION. A. Background Practical Design of an Automatic License Plate Recognition Using Image Processing Technique Mohammad Zakariya Siam Electrical Engineering Department, ISRA University, Amman-Jordan Abstract: The strategy

More information

Introduction to Data Mining. Chris Clifton Mining of Time Series Data

Introduction to Data Mining. Chris Clifton Mining of Time Series Data Introduction to Data Mining Chris Clifton Mining of Time Series Data Time-series database Mining Time-Series and Sequence Data Consists of sequences of values or events changing with time Data is recorded

More information

TIRE TYPE RECOGNITION THROUGH TREADS PATTERN RECOGNITION AND DOT CODE OCR

TIRE TYPE RECOGNITION THROUGH TREADS PATTERN RECOGNITION AND DOT CODE OCR TIRE TYPE RECOGNITION THROUGH TREADS PATTERN RECOGNITION AND DOT CODE OCR Tasneem Wahdan, Gheith A. Abandah, Alia Seyam, Alaa Awwad Computer Engineering Department The University of Jordan Amman 11942,

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Evaluation: the key to success Data Mining Practical Machine Learning Tools and Techniques Slides for Sections 5.1-5.4 Testing and Predicting Performance How predictive is the model we learned? Error on

More information

CPSC 340: Machine Learning and Data Mining. K-Means Clustering Fall 2015

CPSC 340: Machine Learning and Data Mining. K-Means Clustering Fall 2015 CPSC 340: Machine Learning and Data Mining K-Means Clustering Fall 2015 Admin Assignment 1 solutions posted after class. Tutorials for Assignment 2 on Monday. Random Forests Random forests are one of the

More information

Foundations - 2. Periodicity Detection, Time-series Correlation, Burst Detection. Temporal Information Retrieval

Foundations - 2. Periodicity Detection, Time-series Correlation, Burst Detection. Temporal Information Retrieval Foundations - 2 Periodicity Detection, Time-series Correlation, Burst Detection Temporal Information Retrieval Time Series An ordered sequence of values (data points) of variables at equally spaced time

More information

Machine Learning for Data Science (CS4786) Lecture 1

Machine Learning for Data Science (CS4786) Lecture 1 Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:

More information

Knowledge Discovery in Databases. Process Model for KDD

Knowledge Discovery in Databases. Process Model for KDD Knowledge Discovery in Databases Process Model for KDD 1 Characteristics of KDD Interactive Iterative Procedure to extract knowledge from data Knowledge being searched for is implicit previously unknown

More information

Recommendations Worth a Million

Recommendations Worth a Million Recommendations Worth a Million An Introduction to Clustering 15.071x The Analytics Edge Netflix Online DVD rental and streaming video service More than 40 million subscribers worldwide $3.6 billion in

More information

Classification Basic Concepts, Decision Trees, and Model Evaluation

Classification Basic Concepts, Decision Trees, and Model Evaluation Classification Basic Concepts, Decision Trees, and Model Evaluation Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification definition Given a collection of samples (training set) Each

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecture Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml3e CHAPTER 1: INTRODUCTION Big Data 3 Widespread

More information

Predicting Mode of Transport from iphone Accelerometer Data

Predicting Mode of Transport from iphone Accelerometer Data Predicting Mode of Transport from iphone Accelerometer Data Introduction Ben Nham, Kanya Siangliulue, and Serena Yeung In our project, we present a method for offline classification of transportation modes

More information

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Nine Common Types of Data Mining Techniques Used in Predictive Analytics 1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better

More information

Web Data Mining Trends & Techniques

Web Data Mining Trends & Techniques Web Data Mining Trends & Techniques Authors: Ujwala Manoj Patil & J.B. Patil Publication: August 2012 Team Members : Vishma Shah Pooja Vora Background & Problem Definition Three types of Mining: Data Mining

More information

Introduction to Machine Learning. What is Machine Learning?

Introduction to Machine Learning. What is Machine Learning? Introduction to Machine Learning CS195-5-2003 Thomas Hofmann 2002,2003 Thomas Hofmann CS195-5-2003-01-1 What is Machine Learning? Machine learning deals with the design of computer programs and systems

More information

Name Class Date. Interpreting Clusters and Outliers

Name Class Date. Interpreting Clusters and Outliers Name Class Date 11-2 Linear Best Fit Models Going Deeper Essential question: How can you use a trend line to make a prediction from a scatter plot? A cluster is a set of closely grouped data. Data may

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Accurate and Cheap Robot Range Finder

Accurate and Cheap Robot Range Finder Accurate and Cheap Robot Range Finder Ivan Papusha December 12, 28 Abstract A novel high-quality distance sensor for robotics applications is proposed. The sensor relies on triangulation with the offset

More information

Lecture Slides for. ETHEM ALPAYDIN The MIT Press,

Lecture Slides for. ETHEM ALPAYDIN The MIT Press, Lecture Slides for ETHEM ALPAYDIN The MIT Press, 2010 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml2e Why Learn? Machine learning is programming computers to optimize a performance criterion

More information

Preserving Class Discriminatory Information by. Context-sensitive Intra-class Clustering Algorithm

Preserving Class Discriminatory Information by. Context-sensitive Intra-class Clustering Algorithm Preserving Class Discriminatory Information by Context-sensitive Intra-class Clustering Algorithm Yingwei Yu, Ricardo Gutierrez-Osuna, and Yoonsuck Choe Department of Computer Science Texas A&M University

More information

The Improved Neural Network Algorithm of License Plate Recognition

The Improved Neural Network Algorithm of License Plate Recognition , pp. 49-54 http://dx.doi.org/10.14257/ijsip.2015.8.5.06 The Improved Neural Network Algorithm of License Plate Recognition Jingwei Dong 1, Meiting Sun 1, Gengrui Liang 2 and Kui Jin 1 1 School of Measure-Control

More information

Obtaining Value from Big Data

Obtaining Value from Big Data Obtaining Value from Big Data Course Notes in Transparency Format technology basics for data scientists Spring - 2014 Jordi Torres, UPC - BSC www.jorditorres.eu @JordiTorresBCN Data deluge, is it enough?

More information

Limitations window-based detection.

Limitations window-based detection. BAG-OF-WORDS MODEL The slides are from several sources through James Hays (Brown); Silvio Savarese (U. of Michigan); Bill Freeman and Antonio Torralba (MIT), including their own slides. Visual Perceptual

More information

Chemical Structure Image Extraction from Scientific Literature using Support Vector Machine (SVM)

Chemical Structure Image Extraction from Scientific Literature using Support Vector Machine (SVM) EECS 545 F07 Project Final Report Chemical Structure Image Extraction from Scientific Literature using Support Vector Machine (SVM) Jungkap Park, Yoojin Choi, Alexander Min, and Wonseok Huh 1 1. Motivation

More information

Chapter 13 Lecture. The Cosmic Perspective Seventh Edition. Other Planetary Systems: The New Science of Distant Worlds Pearson Education, Inc.

Chapter 13 Lecture. The Cosmic Perspective Seventh Edition. Other Planetary Systems: The New Science of Distant Worlds Pearson Education, Inc. Chapter 13 Lecture The Cosmic Perspective Seventh Edition Other Planetary Systems: The New Science of Distant Worlds 13.1 Detecting Planets Around Other Stars Our goals for learning: Why is it so challenging

More information

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

More information

Introduction to Data Science / Data Intensive Computing (CIS 4930/6930) Project IV

Introduction to Data Science / Data Intensive Computing (CIS 4930/6930) Project IV Introduction to Data Science / Data Intensive Computing (CIS 4930/6930) Instructor: Dr. Sanjay Ranka TA: Yupeng Yan yupeng@cise.ufl.edu April 16, 2014 Department of Computer and Information Science and

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

Introduction to Machine Learning, its potential usage in network area, & the proposed NMLRG

Introduction to Machine Learning, its potential usage in network area, & the proposed NMLRG Introduction to Machine Learning, its potential usage in network area, & the proposed NMLRG Proposed NMLRG IETF 94, November 2015 draft-jiang-nmlrg-network-machine-learning Sheng Jiang (Speaker) Page 1/10

More information

USTC Course for students entering Clemson F2013 Equivalent Clemson Course Counts for Clemson MS Core Area. CPSC 822 Case Study in Operating Systems

USTC Course for students entering Clemson F2013 Equivalent Clemson Course Counts for Clemson MS Core Area. CPSC 822 Case Study in Operating Systems USTC Course for students entering Clemson F2013 Equivalent Clemson Course Counts for Clemson MS Core Area 398 / SE05117 Advanced Cover software lifecycle: waterfall model, V model, spiral model, RUP and

More information

Optimization-based Whitening. Chenguang Zhu (SUNET ID: cgzhu)

Optimization-based Whitening. Chenguang Zhu (SUNET ID: cgzhu) 1. Introduction Optimization-based Whitening Chenguang Zhu (SUNET ID: cgzhu) In natural image understanding, the whitening step plays an important role, especially within many unsupervised feature learning

More information

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore. CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

More information

Component Based Recognition of Objects in an Office Environment

Component Based Recognition of Objects in an Office Environment massachusetts institute of technology computer science and artificial intelligence laboratory Component Based Recognition of Objects in an Office Environment Christian Morgenstern and Bernd Heisele AI

More information

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Machine Learning and Data Mining. Fundamentals, robotics, recognition Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,

More information

1. INTRODUCTION. Thinning plays a vital role in image processing and computer vision. It

1. INTRODUCTION. Thinning plays a vital role in image processing and computer vision. It 1 1. INTRODUCTION 1.1 Introduction Thinning plays a vital role in image processing and computer vision. It is an important preprocessing step in many applications such as document analysis, image compression,

More information

Is there life in space? Activity 2: Moving Stars and Their Planets

Is there life in space? Activity 2: Moving Stars and Their Planets Is there life in space? Activity 2: Moving Stars and Their Planets Overview In this activity, students are introduced to the wobble-method of detecting planets. The activity starts with an introduction

More information

K-nearest-neighbor: an introduction to machine learning

K-nearest-neighbor: an introduction to machine learning K-nearest-neighbor: an introduction to machine learning Xiaojin Zhu jerryzhu@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison slide 1 Outline Types of learning Classification:

More information

DATA MINING DEMYSTIFIED:

DATA MINING DEMYSTIFIED: DATA MINING DEMYSTIFIED: TECHNIQUES AND INFORMATION FOR MODEL IMPROVEMENT Nora Galambos, PhD Senior Data Scientist AIR Forum 2016 New Orleans, LA 1 Why Use Data Mining? Enables the extraction of information

More information

Full Resolution Image Compression with Recurrent Neural Networks

Full Resolution Image Compression with Recurrent Neural Networks Full Resolution Image Compression with Recurrent Neural Networks G Toderici, D Vincent, N Johnston, etc. Zhiyi Su presents on NDEL group presentation on 09/30/2016 Motivation Motivation/Objectives Further

More information

Data Mining for Fun and Profit

Data Mining for Fun and Profit Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

Clustering / Unsupervised Methods

Clustering / Unsupervised Methods Clustering / Unsupervised Methods Jason Corso, Albert Chen SUNY at Buffalo J. Corso (SUNY at Buffalo) Clustering / Unsupervised Methods 1 / 41 Clustering Introduction Until now, we ve assumed our training

More information

FPGA-based Rectification of Stereo Images

FPGA-based Rectification of Stereo Images FPGA-based Rectification of Stereo Images João Rodrigues 1, João Canas Ferreira 2 1 PhD Student, FEUP 2 Assistant Professor, DEEC, FEUP nijoao@gmail.com, jcf@fe.up.pt Abstract. In order to obtain depth

More information

8. Basic algorithmic strategies

8. Basic algorithmic strategies 8. Basic algorithmic strategies 8.1 Self-referential problems: Recursion 8.2 Search problems: Backtracking 8.3 Simulation: Monte-Carlo method 8.4 Graph algorithms 8.5 Image processing 8.1 Self-referential

More information

Introduction to Machine Learning. CAP5610: Machine Learning Instructor: Guo-Jun Qi

Introduction to Machine Learning. CAP5610: Machine Learning Instructor: Guo-Jun Qi Introduction to Machine Learning CAP5610: Machine Learning Instructor: Guo-Jun Qi Today s topics Course information, textbooks and grading policy Introduction to Machine Learning Simple machine algorithm

More information

Introduction to Learning & Decision Trees

Introduction to Learning & Decision Trees Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing

More information

Advanced Video Analysis & Imaging (5LSH0), Module 10A

Advanced Video Analysis & Imaging (5LSH0), Module 10A Advanced Video Analysis & Imaging (5LSH0), Module 10A Case Study 1: 3D Camera Modeling-Based Sports Video Analysis Peter H.N. de With & Jungong Han ( jp.h.n.de.with@tue.nl ) 1 Motivation System overview

More information

Digital Signal. Continuous. Continuous. amplitude. amplitude. Discrete-time Signal. Analog Signal. Discrete. Continuous. time. time.

Digital Signal. Continuous. Continuous. amplitude. amplitude. Discrete-time Signal. Analog Signal. Discrete. Continuous. time. time. Discrete amplitude Continuous amplitude Continuous amplitude Digital Signal Analog Signal Discrete-time Signal Continuous time Discrete time Digital Signal Discrete time 1 DSP (Digital Signal Processing)

More information

Agenda. Halftoning & its Applications in Multimedia Communication. What s digital halfoning? Applications: Common research topics:

Agenda. Halftoning & its Applications in Multimedia Communication. What s digital halfoning? Applications: Common research topics: Halftoning & its Applications in Multimedia Communication EIE Dept, PolyU Agenda Introduction Applications Common research topics Halftoning methods Measure of Performance Demonstrations 2 What s digital

More information

Activity-based Semantic Mapping of an Urban Environment

Activity-based Semantic Mapping of an Urban Environment Activity-based Semantic Mapping of an Urban Environment Denis F. Wolf and Gaurav S. Sukhatme Robotic Embedded Systems - University of Southern California denis gaurav@robotics.usc.edu We address the problem

More information

TECHNICAL UNIVERSITY OF CRETE DEPARTMENT OF ELECTRONIC AND COMPUTER ENGINEERING MACHINE VISION. Euripides G.M. Petrakis Michalis Zervakis

TECHNICAL UNIVERSITY OF CRETE DEPARTMENT OF ELECTRONIC AND COMPUTER ENGINEERING MACHINE VISION. Euripides G.M. Petrakis Michalis Zervakis TECHNICAL UNIVERSITY OF CRETE DEPARTMENT OF ELECTRONIC AND COMPUTER ENGINEERING MACHINE VISION Euripides G.M. Petrakis Michalis Zervakis http://www.intelligence.tuc/~petrakis http://courses.ece.tuc.gr

More information

Digital Signal. Continuous. Continuous. amplitude. amplitude. Discrete-time Signal. Analog Signal. Discrete. Continuous. time. time.

Digital Signal. Continuous. Continuous. amplitude. amplitude. Discrete-time Signal. Analog Signal. Discrete. Continuous. time. time. Discrete amplitude Continuous amplitude Continuous amplitude Digital Signal Analog Signal Discrete-time Signal Continuous time Discrete time Digital Signal Discrete time 1 Digital Signal contd. Analog

More information

MODULE 15 Clustering Large Datasets LESSON 34

MODULE 15 Clustering Large Datasets LESSON 34 MODULE 15 Clustering Large Datasets LESSON 34 Incremental Clustering Keywords: Single Database Scan, Leader, BIRCH, Tree 1 Clustering Large Datasets Pattern matrix It is convenient to view the input data

More information

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE Venu Govindaraju BIOMETRICS DOCUMENT ANALYSIS PATTERN RECOGNITION 8/24/2015 ICDAR- 2015 2 Towards a Globally Optimal Approach for Learning Deep Unsupervised

More information

Detecting sound events in basketball video archive

Detecting sound events in basketball video archive Detecting sound events in basketball video archive Dongqing Zhang, zd35@columbia.edu Dan Ellis, dpwe@ee.columbia.edu Electrical Engineering Department of Columbia University New York City, NY 10025 Abstract

More information

Study Guide: Solar System

Study Guide: Solar System Study Guide: Solar System 1. How many planets are there in the solar system? 2. What is the correct order of all the planets in the solar system? 3. Where can a comet be located in the solar system? 4.

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

MULTIMODAL BIOMETRIC SYSTEM USING IRIS FUSION

MULTIMODAL BIOMETRIC SYSTEM USING IRIS FUSION International Journal of Computer Science and Engineering (IJCSE) ISSN(P): 2278-9960; ISSN(E): 2278-9979 Vol. 3, Issue 4, July 2014, 43-48 IASET MULTIMODAL BIOMETRIC SYSTEM USING IRIS FUSION P. DIVYA,

More information

Introduction to Classification, aka Machine Learning

Introduction to Classification, aka Machine Learning Introduction to Classification, aka Machine Learning Classification: Definition Given a collection of examples (training set ) Each example is represented by a set of features, sometimes called attributes

More information

CHAPTER 6 MULTIMODAL BIOMETRIC RECOGNITION SYSTEM

CHAPTER 6 MULTIMODAL BIOMETRIC RECOGNITION SYSTEM 123 CHAPTER 6 MULTIMODAL BIOMETRIC RECOGNITION SYSTEM Some of the challenges faced by the unimodal biometric systems are noisy data, lack of universal data, less performance improvement and vulnerable

More information

A Systematic Approach on Data Pre-processing In Data Mining

A Systematic Approach on Data Pre-processing In Data Mining ISSN:2320-0790 A Systematic Approach on Data Pre-processing In Data Mining S.S.Baskar 1, Dr. L. Arockiam 2, S.Charles 3 1 Research scholar, Department of Computer Science, St. Joseph s College, Trichirappalli,

More information

PS 224, Fall 2014 HW 4

PS 224, Fall 2014 HW 4 1. True or False? Explain in one or two short sentences. (2x10 points) a. The fact that we have not yet discovered an Earth-size extrasolar planet in an Earth-like orbit tells us that such planets must

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Search Engine Architecture

Search Engine Architecture Search Engine Architecture Primary Goals of Search Engines Effectiveness (quality): to retrieve the most relevant set of documents for a query Processing text and storing text statistics to improve relevance

More information

How to Optimize OCR Quality

How to Optimize OCR Quality How to Optimize OCR Quality Ivan Gravanov Technical Project Manager ABBYY Europe, November 2010 Agenda FineReader Engine Object Model Agenda What is OCR Quality? Image Quality for OCR Scanning Settings

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

PRIVACY PRESERVING DATA

PRIVACY PRESERVING DATA CHAPTER 4 PRIVACY PRESERVING DATA MINING University of Kentucky February 2011 Based partly on Privacy Preserving Data Mining: Challenges & Opportunities by Ramakrishnan Srikant from Google, Inc, OVERVIEW

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

Applications of Deep Learning to the GEOINT mission. June 2015

Applications of Deep Learning to the GEOINT mission. June 2015 Applications of Deep Learning to the GEOINT mission June 2015 Overview Motivation Deep Learning Recap GEOINT applications: Imagery exploitation OSINT exploitation Geospatial and activity based analytics

More information

Visualization methods for patent data

Visualization methods for patent data Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes

More information

SPE MS Data Mining with Shapelets for Predicting Valve Failures in Gas Compressors Abstract 1. Introduction

SPE MS Data Mining with Shapelets for Predicting Valve Failures in Gas Compressors Abstract 1. Introduction SPE-180452-MS Data Mining with Shapelets for Predicting Valve Failures in Gas Compressors Om P. Patri, Arash S. Tehrani, Viktor K. Prasanna, Rajgopal Kannan, University of Southern California; Anand Panangadan,

More information

Combining Expert Representations and Neural Networks for Visualization of Clinical Data. Belmont Research Inc. Cambridge, Massachusetts

Combining Expert Representations and Neural Networks for Visualization of Clinical Data. Belmont Research Inc. Cambridge, Massachusetts From: AAAI Technical Report SS-94-01. Compilation copyright 1994, AAAI (www.aaai.org). All rights reserved. Combining Expert Representations and Neural Networks for Visualization of Clinical Data David

More information