# Practical Introduction to Machine Learning and Optimization. Alessio Signorini

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Practical Introduction to Machine Learning and Optimization Alessio Signorini

2 Everyday's Optimizations Although you may not know, everybody uses daily some sort of optimization technique: Timing your walk to catch a bus Picking the best road to get somewhere Groceries to buy for the week (and where) Organizing flights or vacation Buying something (especially online) Nowadays it is a fundamental tool for almost all corporations (e.g., insurance, groceries, banks,...)

3 Evolution: Successful Optimization Evolution is probably the most successful but least famous optimization system. The strongest species survive while the weakest die. Same goes for reproduction among the same specie. The world is tuning itself. Why do you think some of us are afraid of heights, speed or animals?

4 What is Optimization? Choosing the best element among a set of available alternatives. Sometimes it is sufficient to choose an element that is good enough. Seeking to minimize or maximize a real function by systematically choosing the values of real or integer variables from within an allowed set. First technique (Steepest Descent) invented by Gauss. Linear Programming invented in 1940s.

5 Various Flavors of Optimization I will mostly talk about heuristic techniques (return approximate results), but optimization has many subfields, as for example: Linear/Integer Programming Quadratic/Nonlinear Programming Stochastic/Robust Programming Constraint Satisfaction Heuristics Algorithms Called programming due to US Military programs

6 Machine Learning Automatically learn to recognize complex patterns and make intelligent decisions based on data. Today machine learning has lots of uses: Search Engines Speech and Handwriting Recognition Credit Cards Fraud Detection Computer Vision and Face Recognition Medical Diagnosis

7 Problems Types In a search engine, machine learning tasks can be generally divided in three main groups: Classification or Clustering Divide queries or pages in known groups or groups learned from the data. Examples: adult, news, sports,... Regression Learn to approximate an existing function. Examples: pulse of a page, stock prices,... Ranking Not interested in function value but to relative importance of items. Examples: pages or images ranking,...

8 Algorithms Taxonomy Algorithms for machine learning can be broadly subdivided between: Supervised Learning (e.g., classification) Unsupervised Learning (e.g., clustering) Reinforcement Learning (e.g., driving) Other approaches exists (e.g., semi-supervised learning, transduction learning, ) but the ones above are the most practical ones.

9 Whatever You Do, Get Lots of Data Whatever is the machine learning task, you need three fundamental things: Lots of clean input/example data Good selection of meaningful features A clear goal function (or good approximation) If you have those, there is hope for you. Now you just have to select the appropriate learning method and parameters.

10 Classification Divide objects among a set of known classes. You basically want to assign labels. Simple examples are: Categorize News Articles: sports, politics, Identify Adult or Spam pages Identify the Language: EN, IT, EL,... Features can be: words for text, genes for DNA, time/place/amount for credit cards transactions,...

11 Classification: naïve Bayes Commonly used everywhere, especially in spam filtering. For text classification it is technically a bad choice because it assumes words independence. During training it calculates a statistical model for words and categories. At classification time it uses those statistics to estimate the probability of each category.

12 Classification: DBACL Available under GPL at To train a category given some text use dbacl -l sport.bin sport.txt To classify unknown text use dbacl -U -c sport.bin -c politic.bin article.txt OUTPUT: sport.bin 100% To get negative logarithm of probabilities use dbacl -n -c sport.bin -c politic.bin article.txt OUTPUT: sport.bin politic.bin

13 Classification: Hierarchy When the categories are more than 5 or 6 do not attempt to classify against all of them. Instead, create a hierarchy. For example, first classify among sports and politics, if sports is chosen, then classify among basketball, soccer or golf. Pay attention: a logical hierarchy is not always the best for the classifier. For example, Nascar should go with Autos/Trucks and not sports.

14 Classification: Other Approaches There are many other approaches: Latent Semantic Indexing Neural Networks, Decision Trees Support Vector Machines And many other tools/libraries: Mallet LibSVM Classifier4J To implement, remember: log(x*y) = log(x) + log(y)

15 Clustering The objective of clustering is similar to classification but the labels are not know and need to be learned from the data. For example, you may want to cluster together all the news around the same topic, or similar results after a search. It is very useful in medicine/biology to find nonobvious groups or patterns among items, but also for sites like Pandora or Amazon.

16 Clustering: K-Means Probably the simplest and most famous clustering method. Works reasonably well and is usually fast. Requires to know at priori the number of clusters (i.e., not good for news or results). Define distance measure among items. Euclidean distance sqrt(sum[(pi-qi)^2]) is often a simple option. Not guaranteed to converge to best solution.

17 Clustering: Lloyd's Algorithm Each cluster has a centroid, which is usually the average of its elements. At startup: Partition randomly the objects in N clusters. At each iteration: Recompute centroid for each cluster. Assign each item to closest cluster. Stop after M iterations or when no changes.

18 Clustering: Lloyd's Algorithm Desired Cluster: 3 Items: 1,1,1,3,4,5,6,9,11 Random Centroids: 2, 5.4, 6.2 Iteration1: (2, 5.4, 6.2) [1,1,1,3] [4,5] [6,9,11] Iteration2: (1.5, 4.5, 8.6) [1,1,1] [3,4,5,6] [9,11] Iteration3: (1, 4.5, 10) [1,1,1] [3,4,5,6] [9,11]

19 Clustering: Lloyd's Algorithm Since it is very sensitive to startup assignments, it is sometimes useful to restart multiple times. When cluster numbers is not known but in a certain range, you can execute the algorithm for different N values and pick best solution. Software Available: Apache Mahout Mathlab kmeans

20 Clustering: Min-Hashing Simple and fast algorithm: 1) Create hash (e.g., MD5) of each word 2) Signature = smallest N hashes Example: Similar to what OneRiot has done with its own... 23ce4c f ecb 3ea e5e0 0f ce4c ea e5e0 7562ecb... The signature can be used directly as ID of the cluster. Or consider results as similar if there is a good overlap among signatures.

21 Decision Trees Decision trees are predictive models that map observations to conclusions on its target output. PRODUCT CEO G B BOARD G B B G OK TOBIAS Y N FAIL COMPETITOR Y N OK FAIL FAIL OK

22 Decision Trees After enough examples, it is possible to calculate the frequency of hitting each leaf. CEO G B PRODUCT BOARD G B B G OK 30% TOBIAS Y N FAIL 10% COMPETITOR Y N OK FAIL 30% 10% FAIL OK 10% 10%

23 Decision Trees From the frequencies, it is possible to extrapolate early results in nodes and make decisions early. CEO G B PRODUCT BOARD G B B G OK TOBIAS FAIL COMPETITOR 30% Y N 10% Y N CEO G B OK=60% OK=10% PRODUCT BOARD G B B G OK=30% OK=10% OK TOBIAS FAIL COMPETITOR 30% Y N 10% Y N OK FAIL 30% 10% FAIL OK 10% 10% OK FAIL 30% 10% FAIL OK 10% 10%

24 Decision Trees: Information Gain Most of the algorithms are based on Information Gain, a concept related to the Entropy of Information Theory. At each step, for each variable V left, compute Vi = ( -Pi * log(pi) ) + ( -Ni * log(ni) ) where Pi is the fraction of items labeled positive for variable Vi (e.g., CEO = Good) and Ni is the fraction labeled negative (e.g., CEO = Bad).

25 Decision Trees: C4.5 Available at To train create names and data file GOLF.names Play, Don't Play. outlook: temperature: Humidity: Windy: Then launch c4.5 -t 4 -f GOLF sunny, overcast, rain. continuous. continuous. true, false. GOLF.data sunny, 85, 85, false, Don't Play sunny, 80, 90, true, Don't Play overcast, 83, 78, false, Play rain, 70, 96, false, Play rain, 65, 70, true, Don't Play overcast, 64, 65, true, Play

26 Decision Trees: C4.5 Output Cycle Tree -----Cases Errors size window other window rate other rate total rate % % % % % % % % % % 0 0.0% 0 0.0% outlook = overcast: Play outlook = sunny: humidity <= 80 : Play humidity > 80 : Don't Play outlook = rain: windy = true: Don't Play windy = false: Play Trial Before Pruning After Pruning Size Errors Size Errors Estimate 0 8 0( 0.0%) 8 0( 0.0%) (38.5%) << 1 8 0( 0.0%) 8 0( 0.0%) (38.5%)

27 Support Vector Machine SVM can be used for classification, regression and ranking optimization. It is flexible and usually fast. Attempts to construct a set of hyperplanes which have the largest distance from the closest datapoint of each class. The explanation for regression it is even more complicated. I will skip it here but there are plenty of papers available on the web.

28 SVM: svm-light Available at To train an SVM model create data file RANKING 3 qid:1 1:0.53 2:0.12 3: qid:1 1:0.13 2:0.1 3: qid:1 1:0.27 2:0.5 3: qid:2 1: : qid:2 1:0.87 2:0.12 3:0.45 REGRESSION 1.4 1:0.53 2:0.12 3: :0.13 2:0.1 3: :0.27 2:0.5 3: : : :0.87 2:0.12 3:0.45 Then launch svm_learn pulse.data pulse.model

29 SVM: Other Tools Available There are hundreds of libraries for SVM: LibSVM Algorithm::SVM PyML TinySVM There are executables built on top of most of them and they usually accept the same input format of svm-light. May need a script to extract features importance.

30 Genetic Algorithms Genetic Algorithms are flexible, simple to implement and can be quickly adapted to lots of optimization tasks. They are based on evolutionary biology: strongest species survive, weakest die, offsprings are similar to parents but may have random differences. This kind of algorithms is used wherever there are lots of variables and values and approximated solutions are acceptable (e.g., protein folding).

31 GA: The Basic Algorithm Startup: Create N random solutions Algorithm: 1) compute fitness of solutions 2) breed M new solutions 3) kill M weakest solutions Repeat the algorithm for K iterations or until there is no improvement.

32 GA: The Basic Algorithm During breeding (step 2) remember to follow biology rules: Better individuals are more likely to find a (good) mate Offsprings carry genes from each parent There is always the possibility of some random genetic mutations

33 GA: Relevance Example Each solution is a set of weights for the various attributes (e.g., title weights, content weight,...). The fitness of each solution is given by the delta with editorial judgments (e.g., DCG). During breeding, you may take title and content weight from parent A, description from parent B, When a mutation occurs, the weight of that variable is picked at random.

34 One of the Problems I Work On We have a set of users and for each we know the movies they like among a given set. Extrapolate the set of features (e.g., Julia Roberts, Thriller, Funny,...) that each movie has so that it is liked by all the users which like it. Not interested in what the features are or represent: we are fine with just a bunch of IDs. This could save/improve the life of lots of people.

### Machine Learning using MapReduce

Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### 8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

### International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

### Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu

Machine Learning CS 6830 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu What is Learning? Merriam-Webster: learn = to acquire knowledge, understanding, or skill

### Machine Learning for NLP

Natural Language Processing SoSe 2015 Machine Learning for NLP Dr. Mariana Neves May 4th, 2015 (based on the slides of Dr. Saeedeh Momtazi) Introduction Field of study that gives computers the ability

### D A T A M I N I N G C L A S S I F I C A T I O N

D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.

### CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka

CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training

### Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

### Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00

### Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

### An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

### Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu

Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel

### Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next

### Big Data Analytics CSCI 4030

High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

### ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

1 1 Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures

### Foundations of Artificial Intelligence. Introduction to Data Mining

Foundations of Artificial Intelligence Introduction to Data Mining Objectives Data Mining Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees Present

### Flat Clustering K-Means Algorithm

Flat Clustering K-Means Algorithm 1. Purpose. Clustering algorithms group a set of documents into subsets or clusters. The cluster algorithms goal is to create clusters that are coherent internally, but

### An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

### Data Mining Essentials

This chapter is from Social Media Mining: An Introduction. By Reza Zafarani, Mohammad Ali Abbasi, and Huan Liu. Cambridge University Press, 2014. Draft version: April 20, 2014. Complete Draft and Slides

### Analytics on Big Data

Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis

### Machine Learning in Spam Filtering

Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

### Classification Techniques (1)

10 10 Overview Classification Techniques (1) Today Classification Problem Classification based on Regression Distance-based Classification (KNN) Net Lecture Decision Trees Classification using Rules Quality

### Master's projects at ITMO University. Daniil Chivilikhin PhD Student @ ITMO University

Master's projects at ITMO University Daniil Chivilikhin PhD Student @ ITMO University General information Guidance from our lab's researchers Publishable results 2 Research areas Research at ITMO Evolutionary

### Machine Learning. 01 - Introduction

Machine Learning 01 - Introduction Machine learning course One lecture (Wednesday, 9:30, 346) and one exercise (Monday, 17:15, 203). Oral exam, 20 minutes, 5 credit points. Some basic mathematical knowledge

### COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

### Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

### Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov

Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

### Learning is a very general term denoting the way in which agents:

What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

### Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

### Machine Learning: Overview

Machine Learning: Overview Why Learning? Learning is a core of property of being intelligent. Hence Machine learning is a core subarea of Artificial Intelligence. There is a need for programs to behave

### Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

### Final Project Report

CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

### Contents. Dedication List of Figures List of Tables. Acknowledgments

Contents Dedication List of Figures List of Tables Foreword Preface Acknowledgments v xiii xvii xix xxi xxv Part I Concepts and Techniques 1. INTRODUCTION 3 1 The Quest for Knowledge 3 2 Problem Description

### Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

### Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

### Mammoth Scale Machine Learning!

Mammoth Scale Machine Learning! Speaker: Robin Anil, Apache Mahout PMC Member! OSCON"10! Portland, OR! July 2010! Quick Show of Hands!# Are you fascinated about ML?!# Have you used ML?!# Do you have Gigabytes

### Machine learning for algo trading

Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with

### Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

### ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

### Maschinelles Lernen mit MATLAB

Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical

### K-Means Clustering Tutorial

K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July

### DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

### Data Mining on Streams

Data Mining on Streams Using Decision Trees CS 536: Machine Learning Instructor: Michael Littman TA: Yihua Wu Outline Introduction to data streams Overview of traditional DT learning ALG DT learning ALGs

### Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

### Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

### 14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)

Overview Kyrre Glette kyrrehg@ifi INF3490 Swarm Intelligence Particle Swarm Optimization Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) 3 Swarms in nature Fish, birds,

### Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

### Classification and Prediction

Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

### Predict Influencers in the Social Network

Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

### Data Mining for Knowledge Management. Classification

1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

### The Data Mining Process

Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

### Chapter 20: Data Analysis

Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

### Data Mining Practical Machine Learning Tools and Techniques

Some Core Learning Representations Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter of Data Mining by I. H. Witten and E. Frank Decision trees Learning Rules Association rules

### Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

### Machine Learning with MATLAB David Willingham Application Engineer

Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the

### Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

### Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek

Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...

### A Survey on Intrusion Detection System with Data Mining Techniques

A Survey on Intrusion Detection System with Data Mining Techniques Ms. Ruth D 1, Mrs. Lovelin Ponn Felciah M 2 1 M.Phil Scholar, Department of Computer Science, Bishop Heber College (Autonomous), Trichirappalli,

### Active Learning with Boosting for Spam Detection

Active Learning with Boosting for Spam Detection Nikhila Arkalgud Last update: March 22, 2008 Active Learning with Boosting for Spam Detection Last update: March 22, 2008 1 / 38 Outline 1 Spam Filters

### MA2823: Foundations of Machine Learning

MA2823: Foundations of Machine Learning École Centrale Paris Fall 2015 Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr TAs: Jiaqian Yu jiaqian.yu@centralesupelec.fr

### Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection

### Statistical Databases and Registers with some datamining

Unsupervised learning - Statistical Databases and Registers with some datamining a course in Survey Methodology and O cial Statistics Pages in the book: 501-528 Department of Statistics Stockholm University

### Neural Networks and Back Propagation Algorithm

Neural Networks and Back Propagation Algorithm Mirza Cilimkovic Institute of Technology Blanchardstown Blanchardstown Road North Dublin 15 Ireland mirzac@gmail.com Abstract Neural Networks (NN) are important

### Protein Protein Interaction Networks

Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

### DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

### Machine Learning Big Data using Map Reduce

Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories

### STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

### Clustering & Association

Clustering - Overview What is cluster analysis? Grouping data objects based only on information found in the data describing these objects and their relationships Maximize the similarity within objects

### SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

### Using Artificial Intelligence to Manage Big Data for Litigation

FEBRUARY 3 5, 2015 / THE HILTON NEW YORK Using Artificial Intelligence to Manage Big Data for Litigation Understanding Artificial Intelligence to Make better decisions Improve the process Allay the fear

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

### BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

### Machine Learning for Data Science (CS4786) Lecture 1

Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:

### Introduction to Artificial Intelligence G51IAI. An Introduction to Data Mining

Introduction to Artificial Intelligence G51IAI An Introduction to Data Mining Learning Objectives Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees

### Clustering in Machine Learning. By: Ibrar Hussain Student ID:

Clustering in Machine Learning By: Ibrar Hussain Student ID: 11021083 Presentation An Overview Introduction Definition Types of Learning Clustering in Machine Learning K-means Clustering Example of k-means

### Knowledge-based systems and the need for learning

Knowledge-based systems and the need for learning The implementation of a knowledge-based system can be quite difficult. Furthermore, the process of reasoning with that knowledge can be quite slow. This

### Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Machine Learning 1 Attribution Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) 2 Outline Inductive learning Decision

### Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject

Defending Networks with Incomplete Information: A Machine Learning Approach Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject Agenda Security Monitoring: We are doing it wrong Machine Learning

### Lecture #2. Algorithms for Big Data

Additional Topics: Big Data Lecture #2 Algorithms for Big Data Joseph Bonneau jcb82@cam.ac.uk April 30, 2012 Today's topic: algorithms Do we need new algorithms? Quantity is a quality of its own Joseph

### CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

### ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,

### An Introduction to Data Mining

An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

### Data Mining: Foundation, Techniques and Applications

Data Mining: Foundation, Techniques and Applications Lesson 1b :A Quick Overview of Data Mining Li Cuiping( 李 翠 平 ) School of Information Renmin University of China Anthony Tung( 鄧 锦 浩 ) School of Computing

### Lecture 10: Regression Trees

Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

### Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

### Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

### Machine Learning Capacity and Performance Analysis and R

Machine Learning and R May 3, 11 30 25 15 10 5 25 15 10 5 30 25 15 10 5 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 100 80 60 40 100 80 60 40 100 80 60 40 30 25 15 10 5 25 15 10

### Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

### IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

### Predictive Dynamix Inc

Predictive Modeling Technology Predictive modeling is concerned with analyzing patterns and trends in historical and operational data in order to transform data into actionable decisions. This is accomplished

### Understanding Network Traffic An Introduction to Machine Learning in Networking

Understanding Network Traffic An Introduction to Machine Learning in Networking Pedro CASAS FTW - Communication Networks Group Vienna, Austria 3rd TMA PhD School Department of Telecommunications AGH University

### Monday Morning Data Mining

Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse Outline: - data mining - IceCube - Data mining in IceCube Computer Scientists are different... Fakultät Physik Fakultät Physik

### Data Mining Practical Machine Learning Tools and Techniques

Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

### MS1b Statistical Data Mining

MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

### Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor

### Map-Reduce for Machine Learning on Multicore

Map-Reduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers - dual core to 12+-core Shift to more concurrent programming paradigms and languages Erlang,