Obtaining Value from Big Data

Similar documents
INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Machine Learning. CS494/594, Fall :10 AM 12:25 PM Claxton 205. Slides adapted (and extended) from: ETHEM ALPAYDIN The MIT Press, 2004

MA2823: Foundations of Machine Learning

Lecture Slides for INTRODUCTION TO. ETHEM ALPAYDIN The MIT Press, Lab Class and literature. Friday, , Harburger Schloßstr.

Machine Learning Introduction

Big Data Challenges. technology basics for data scientists. Spring Jordi Torres, UPC - BSC

Data Mining Techniques in CRM

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Introduction to Pattern Recognition

Machine Learning and Data Mining. Fundamentals, robotics, recognition

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Introduction to Machine Learning Using Python. Vikram Kamath

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

Machine Learning for Data Science (CS4786) Lecture 1

Maschinelles Lernen mit MATLAB

Azure Machine Learning, SQL Data Mining and R

Anomaly detection. Problem motivation. Machine Learning

B2B opportunity predictiona Big Data and Advanced. Analytics Approach. Insert

Data Mining for Fun and Profit

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Machine Learning: Overview

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

Machine Learning and Statistics: What s the Connection?

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining: Overview. What is Data Mining?

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Management course

Learning is a very general term denoting the way in which agents:

Introduction. A. Bellaachia Page: 1

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Digital and Big Data Opportunities in Credit Risk. Banking Congress Warsaw, October 2015

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

T Non-discriminatory Machine Learning

Data Warehousing and Data Mining for improvement of Customs Administration in India. Lessons learnt overseas for implementation in India

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

TIETS34 Seminar: Data Mining on Biometric identification

Introduction to Data Mining

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Data Mining System, Functionalities and Applications: A Radical Review

Data Mining course Master in Information Technologies Enginyeria Informàtica Tomàs Aluja. LIAM EIO. UPC Lluis Belanche LSI. UPC

TURKISH ORACLE USER GROUP

The Scientific Data Mining Process

Perspectives on Data Mining

Machine Learning, Data Mining, and Knowledge Discovery: An Introduction

MBA Data Mining & Knowledge Discovery

Big Data Challenges in Bioinformatics

The Data Mining Process

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90

INDIAN STATISTICAL INSTITUTE announces Training Program on Statistical Techniques for Data Mining & Business Analytics

Data, Measurements, Features

Data Mining Applications in Manufacturing

MS1b Statistical Data Mining

A Fraud Detection Approach in Telecommunication using Cluster GA

CREDIT CARD FRAUD DETECTION SYSTEM USING GENETIC ALGORITHM

Digital Identity & Authentication Directions Biometric Applications Who is doing what? Academia, Industry, Government

Lecture 9 : Business Intelligence and Information Systems for Decision Making

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

: Introduction to Machine Learning Dr. Rita Osadchy

Introduction to Data Mining

Statistics for BIG data

Management Decision Making. Hadi Hosseini CS 330 David R. Cheriton School of Computer Science University of Waterloo July 14, 2011

Data Mining Part 5. Prediction

8. Machine Learning Applied Artificial Intelligence

Data Mining Analytics for Business Intelligence and Decision Support

Research-based Learning (RbL) in Computing Courses for Senior Engineering Students

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Foundations of Artificial Intelligence. Introduction to Data Mining

Predicting borrowers chance of defaulting on credit loans

1 Choosing the right data mining techniques for the job (8 minutes,

Data Mining mit der JMSL Numerical Library for Java Applications

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

DATA MINING TECHNIQUES AND APPLICATIONS

TDA and Machine Learning: Better Together

Data Mining Solutions for the Business Environment

Why include analytics as part of the School of Information Technology curriculum?

Machine Learning What, how, why?

Dan French Founder & CEO, Consider Solutions

Data Mining with Weka

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

1 What is Machine Learning?

Course 395: Machine Learning

Teaching Computational Thinking using Cloud Computing: By A/P Tan Tin Wee

Introduction to Artificial Intelligence G51IAI. An Introduction to Data Mining

MACHINE LEARNING BASICS WITH R

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Session 10 : E-business models, Big Data, Data Mining, Cloud Computing

Learning outcomes. Knowledge and understanding. Competence and skills

Data Analytics and Business Intelligence (8696/8697)

Class 10. Data Mining and Artificial Intelligence. Data Mining. We are in the 21 st century So where are the robots?

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

Transcription:

Obtaining Value from Big Data Course Notes in Transparency Format technology basics for data scientists Spring - 2014 Jordi Torres, UPC - BSC www.jorditorres.eu @JordiTorresBCN

Data deluge, is it enough? 2

Data = Information? 3

Prediction using data models The information is non actionable knowledge 4

Obtaining value from data World is becoming instrumented and interconnected and we can take advantage of it if we can process it in real time. - Data + Data cannot be taken at face value Value + Information Knowledge Volume - The information is non actionable knowledge 5

Why Learn? Machine learning is programming computers to optimize a performance criterion using example data or past experience. There is no need to learn to calculate payroll Learning is used when: Human expertise does not exist, Humans are unable to explain their expertise Solution changes in time Solution needs to be adapted to particular cases Source: Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 6

What We Talk About When We Talk About Learning Learning general models from a data of particular examples Data is cheap and abundant (data warehouses, ); knowledge is expensive and scarce. Example in retail: Customer transactions to consumer behavior: People who bought A also bought B (www.amazon.com) Build a model that is a good and useful approximation to the data. Source: Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 7

Where can be appried? Retail: Market basket analysis, Customer relationship management (CRM) Finance: Credit scoring, fraud detection Manufacturing: Control, robotics, troubleshooting Medicine: Medical diagnosis Telecommunications: Spam filters, intrusion detection Bioinformatics: Motifs, alignment Web mining: Search engines SmartCities: City planning And... dozens and dozens Source: Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 8

Obtaining value from data In my opinion A Big Challenge is (Important research area) The majority of algorithms function well in thousands of registers, however at the moment they are impractical for thousands of milions. 9

What is Machine Learning? Optimize a performance criterion using example data or past experience. Statistics vs Computer science? Role of Statistics: Inference from a sample Role of Computer science: Efficient algorithms to Solve the optimization problem Representing and evaluating the model for inference Source: Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 10

Example: Learning Associations Basket analysis: P (Y X ) probability that somebody who buys X also buys Y where X and Y are products/services. Example: P ( chips beer ) = 0.7 Source: Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 11 1

Example: Classification Example: Credit scoring Differentiating between low-risk and high-risk customers from their income and savings Discriminant: IF income > θ 1 AND savings > θ 2 THEN low-risk ELSE high-risk Source: Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 12 1

Example: Classification Applications Also know as Pattern recognition Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style Character recognition: Different handwriting styles. Speech recognition: Temporal dependency. Medical diagnosis: From symptoms to illnesses Biometrics: Recognition/authentication using physical and/or behavioral characteristics: Face, iris, signature, etc... Source: Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 13

Example: Regression Example: Price of a used car x : car attributes y : price y = wx+w 0 Source: Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 14

Machine Learning Usefulness Supervised Learning: Uses Prediction of future cases: Use the rule to predict the output for future inputs Knowledge extraction: The rule is easy to understand Compression: The rule is simpler than the data it explains Outlier detection: Exceptions that are not covered by the rule, e.g., fraud Source: Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e The MIT Press (V1.0) 15

Machine Learning: an impressive world! 16

Machine Learning: an impressive world! 17

Decision Trees (case study bigml) 18

Right questions? Tech problem of business problem? what to look for in the data? how to model the data? where to start??? Effective analysis depends more on asking the right question or designing a good experiment than on tools and techniques. 19

DATA vs MODEL Large datasets provide the opportunity to take advantage of.effective results from coupling large datasets with relatively simply algorithms http://strata.oreilly.com/2012/11/four-data-themes-to-watch-from-strata-hadoop-world-2012.html? imm_mid=09b70d&cmp=em-strata-newsletters-nov14-direct#more-52859 20