# Machine Learning Capacity and Performance Analysis and R

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Machine Learning and R May 3, 11

2 P95 CPU AVG CPU Introduction Introduction Machine Learning Brief Introduction to Machine Learning and Data Mining What, Why and How How can this be applied to Capacity and Performance Analysis Data driven Patterns Example: Utilization Profiling in R Data Transformation Model Construction and Test Model Deployment Day of the Month Day of the Month Day of the Month CPU July Hours June Hours May Hours Day of the Month Day of the Month Day of the Month rdcuxsrv277.insidelive.net Memory July Hours June Hours May Make: UNIX Hours % CPU Utilization Forecast P95: F30=85.3 F60=87.8 F90=90.3 F180=97.7 AVG: F30= 57.3 F60= 58.8 F90= 60.2 F180= 64.5 rdcuxsrv277.insidelive.net Days Configuration Environment: QA OS: SunOS OS Version: G Number of CPU: 16 Total Memory: Machine Learning and R

3 Machine Learning: Definition There are Many: Here are a couple Definition: Tom M. Mitchell provided a widely quoted definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.[1] Definition: The field of machine learning studies the design of computer programs able to induce patterns, regularities, or rules from past experiences. Learner (a computer program) processes data representing past experiences and tries to either develop an appropriate response to future data, or describe in some meaningful way the data seen. [2] Machine Learning and R

4 Example: Handwritten Digits Elements of Machine Learning: Task T recognizing and classifying handwritten words within images Performance P percent of words correctly classified Experience E a database of handwritten words with given classifications Machine Learning and R

5 Methods Introduction Machine Learning Supervised learning[4]: Use a labeled (known) set of data to build models to perform classification or regression Use the model on new data to make predictions or describe the data Supervised algorithms: Linear Regression Trees Neural Networks Support Vector Machines Machine Learning and R

6 Methods Introduction Machine Learning Unsupervised learning[4]: Find hidden structure in unlabeled (unknown) data Unsupervised algorithms: Kmeans K Nearest Neighbor Hierarchical Clustering Association Rules Principal Components Machine Learning and R

7 Machine Learning is a Process Like application development CRISP-DM, for example[3]: Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment REPEAT AS NEEDED Machine Learning and R

8 Applications Introduction Machine Learning Some notable applications: Spam Filtering Yahoo Fraud / Anomaly Detection - Credit Card Stock Predications / Trading Models Recommendation - Netflix Social Network Analysis Facebook Internet Search Google Machine Learning and R

9 And you can make money! Data Mining Competitions: Netflix \$1M Heritage Health Prize - \$3M Kaggle Machine Learning and R

10 Capacity Planning and Performance Analysis: A simplified view of capacity planning: Hourly and Daily, Monday thru Friday stats Peak and Average Daily Utilization Simple linear regression on peak and average utilization Extrapolate days into the future. Machine Learning and R

11 Capacity Planning Simplified View: rdcuxsrv277.insidelive.net CPU July Memory July Forecast P95: F30=85.3 F60=87.8 F90=90.3 F180=97.7 AVG: F30= 57.3 F60= 58.8 F90= 60.2 F180= 64.5 Day of the Month Day of the Month P95 CPU AVG CPU rdcuxsrv277.insidelive.net Hours June Hours June % CPU Utilization Day of the Month Day of the Month Hours May Day of the Month Day of the Month Hours May Days Configuration Environment: QA Make: UNIX OS: SunOS OS Version: G Number of CPU: 16 Total Memory: Hours Hours * One page Server Utilization, Forecast, and Configuration developed using R Machine Learning and R

12 Capacity Planning Simplified View: Put all forecasts into a spreadsheet and sort by 30, 60, 90, or 180 forecast to find top Utilized servers Machine Learning and R

13 Capacity Planning Simplified View: Works well for stable business applications and environments. With servers capacity planning for critical servers is straight forward. Machine Learning and R

14 Capacity Planning Real world With 3k-4k-5k-+10k servers capacity planning is very difficult. Why? Many different environments, Production, Test, BCP, QA Many different applications: database, web server, Quant, Hadoop, etc. Many different hardware platforms: Unix, Linux, Windows, VMWare, etc. Things are not stable and well formed different applications have different utilization profiles, for different reasons. Exceptions occur Machine Learning and R

15 How can Machine Learning help? Lots and lots of data System have many different components and each component has its own function and collection of metrics that determine performance No problem is in isolation, there are many different sets of data that need to be correlated Many different relationships server, storage, database, application... Lots of historical data (Capacity Database?) Many repeating and familiar patterns Machine Learning and R

16 How can Machine Learning help? Elements of Machine Learning: Task T Classify resource utilization/performance Performance P Filtered list of key utilization classes Enhanced Monitoring Smart Alerting Experience E Historical data from a capacity database Machine Learning and R

17 Utilization Patterns Resource Utilization have patterns They are visual clues as to performance and future utilization These patterns can be grouped: Normal A flat utilization, stable environment, could be either consistently high/middle/low in its utilization. Cyclic Highly variable workload, maybe some consistency like month-end or quant load. Trend Increasing Organic growth in the utilization. Trend Decreasing Reduced workload, application retirement. Shift Upward Sharp increase in processing, could be related to broken processes, cluster failover to stand-by server, and/or new application deployment. Shift Downward Sharp decrease in processing, could be related to fixing processes, fail-back, or application retirement. Machine Learning and R

18 Utilization Patterns Machine Learning and R

19 What is R? Introduction Machine Learning Open source statistical programming language Great visualization packages Many different modeling packages Many different machine learning packages Almost a complete solution for building machine learning tools scaling is an issue, i.e. the problem has to fit in memory. Machine Learning and R

20 Support Vector Machine Machine Learning and R

21 Data Considerations Performance and Utilization data is time series DateTime S e r v e r AverageCPU w e b s e r v e r w e b s e r v e r w e b s e r v e r w e b s e r v e r w e b s e r v e r 94 ML data format is a matrix with the general form Y, X1, X2,...Xn Need to convert time series to matrix Y X1 X2 X3 X4 X5 w e b s e r v e r Data needs to be consistent and well formed, no missing or bad data Machine Learning and R

22 SVM Demonstration Data is generated, prototypes.r helperfunctions.r createdata confusionm printmissclassified demo 1.R Builds an initial SVM model, tunes the model and classifies new data with the model demo 2.R Improves the accuracy of the initial model Machine Learning and R

23 Generated Data Introduction Machine Learning Datasets contain 100 of each type of pattern, i.e. 600 servers There are 130 X data points/features representing 180 days, Monday thru Friday Randomly generated... Machine Learning and R

24 First Predictive Model Create the first model ### OUT OF THE BOX ## GET DATA Ynew < dget ( Y 7 ) data < c r e a t e D a t a ( Ynew ) ## SPLIT TO x and Y x < s u b s e t ( data, s e l e c t = c l a s s ) y < data \$ c l a s s ## BUILD MODEL model < svm ( c l a s s., data = data ) summary ( model ) C a l l : svm ( formula = c l a s s., data = data ) Parameters : SVM Type : C c l a s s i f i c a t i o n SVM K e r n e l : r a d i a l cost : 1 gamma : Number of Support Vectors : 545 ( ) Number of Classes : 6 Levels : C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Machine Learning and R

25 First Predictive Model Check the models accuracy > ## PREDICTIONS pred < p r e d i c t ( model, x ) # CHECK ACCURACY: confusionm ( pred, y ) P r e d i c t e d Values : Yp C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Y values : Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Confusion Matrix : Yp Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp C y c l i c Normal ShiftDown S h i f t U p TrendDown TrendUp Accuracy = [ 1 ] Machine Learning and R

26 First Predictive Model Model Tuning: > o b j < tune. svm ( c l a s s., data = data, gamma = 2ˆ( 1:1), c o s t = 2 ˆ ( 2 : 4 ) ) > summary ( o b j ) Parameter t u n i n g o f s v m : s a m p l i n g method : 10 f o l d c r o s s v a l i d a t i o n best parameters : gamma c o s t best performance : D e t a i l e d p e r f o r m a n c e r e s u l t s : gamma c o s t e r r o r d i s p e r s i o n Machine Learning and R

27 First Predictive Model Re-run the training data with tuned parameters > ### AFTER TUNING ## NEW MODEL WITH COST AND GAMMA model < svm ( c l a s s., data = data, cost =2.25, gamma=.01) ## RE DO THE PREDICTION pred < p r e d i c t ( model, x ) # CHECK ACCURACY confusionm ( pred, y ) P r e d i c t e d Values : Yp C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Y values : Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Confusion Matrix : Yp Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp C y c l i c Normal ShiftDown S h i f t U p TrendDown TrendUp Accuracy = [ 1 ] Machine Learning and R

28 First Predictive Model Missclassification Analysis > p r i n t M i s s C l a s s i f i e d ( pred, y ) P r e d i c t e d A c t u a l o b s e r v a t i o n 1 Normal C y c l i c Normal C y c l i c TrendUp S h i f t U p TrendUp S h i f t U p TrendDown ShiftDown TrendDown ShiftDown 586 > p l o t ( t ( data [ 1 5 6, 2 : ] ), t y p e= l, main= Debug, y l a b= %u t i l, y l i m=c ( 0, ) ) Machine Learning and R

29 First Predictive Model Use the model to classify new data ### NEW DATA ## READ IN DATA THAT MODEL HAS NOT SEEN Ynew < dget ( Y 6 ) data < c r e a t e D a t a ( Ynew ) ## SPLIT TO X and Y x < s u b s e t ( data, s e l e c t = c l a s s ) y < data \$ c l a s s ## PREDICT CLASS USING PREVIOUSLY CREATED MODEL pred < p r e d i c t ( model, x ) # CHECK ACCURACY confusionm ( pred, y ) P r e d i c t e d Values : Yp C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Y values : Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Confusion Matrix : Yp Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp C y c l i c Normal ShiftDown S h i f t U p TrendDown TrendUp Accuracy = [ 1 ] Machine Learning and R

30 How Can we improve on these results? More tuning of the cost and gamma parameters? Is the data representative of the real world? More training data in the model? Less training data in the model? Model overfitting? Are we using the right algorithm, the right way? Machine Learning and R

31 Improved Predictive Model Add more data to the Model Build > ### OUT OF THE BOX ## ADD MORE DATA TO THE MODEL BUILD Ynew < dget ( Y 7 ) d < c r e a t e D a t a ( Ynew ) data < d Ynew < dget ( Y 5 ) d < c r e a t e D a t a ( Ynew ) data < r b i n d ( data, d ) Ynew < dget ( Y 4 ) d < c r e a t e D a t a ( Ynew ) data < r b i n d ( data, d ) Ynew < dget ( Y 3 ) d < c r e a t e D a t a ( Ynew ) data < r b i n d ( data, d ) Ynew < dget ( Y 2 ) d < c r e a t e D a t a ( Ynew ) data < r b i n d ( data, d ) Ynew < dget ( Y 1 ) d < c r e a t e D a t a ( Ynew ) data < r b i n d ( data, d ) ## SPLIT TO x and Y x < s u b s e t ( data, s e l e c t = c l a s s ) y < data \$ c l a s s Machine Learning and R

32 Improved Predictive Model Add more data to the Model Build, cont. > ## BUILD MODEL model < svm ( c l a s s., data = data ) summary ( model ) C a l l : svm ( formula = c l a s s., data = data ) Parameters : SVM Type : C c l a s s i f i c a t i o n SVM K e r n e l : r a d i a l cost : 1 gamma : Number of Support Vectors : 2475 ( ) Number of Classes : 6 Levels : C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Machine Learning and R

33 Improved Predictive Model Add more data to the Model Build, cont. > ## PREDICTIONS pred < p r e d i c t ( model, x ) # CHECK ACCURACY: confusionm ( pred, y ) P r e d i c t e d Values : Yp C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Y values : Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Confusion Matrix : Yp Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp C y c l i c Normal ShiftDown S h i f t U p TrendDown TrendUp Accuracy = [ 1 ] Machine Learning and R

34 Improved Predictive Model Tune the new model > ### AFTER TUNING ## NEW MODEL WITH COST AND GAMMA model < svm ( c l a s s., data = data, cost =2.25, gamma=.01) ## RE DO THE PREDICTION pred < p r e d i c t ( model, x ) # CHECK ACCURACY confusionm ( pred, y ) P r e d i c t e d Values : Yp C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Y values : Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Confusion Matrix : Yp Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp C y c l i c Normal ShiftDown S h i f t U p TrendDown TrendUp Accuracy = [ 1 ] Machine Learning and R

35 Improved Predictive Model Use the new model to classify new data > ### NEW DATA ## READ IN DATA THAT MODEL HAS NOT SEEN Ynew < dget ( Y 6 ) data < c r e a t e D a t a ( Ynew ) ## SPLIT TO X and Y x < s u b s e t ( data, s e l e c t = c l a s s ) y < data \$ c l a s s ## PREDICT CLASS USING PREVIOUSLY CREATED MODEL pred < p r e d i c t ( model, x ) # CHECK ACCURACY confusionm ( pred, y ) P r e d i c t e d Values : Yp C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Y values : Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Confusion Matrix : Yp Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp C y c l i c Normal ShiftDown S h i f t U p TrendDown TrendUp Accuracy = [ 1 ] Machine Learning and R

36 Model Deployment Considerations Need to create the training data from real data; lots of labeling required. Patterns and labeling need to be consistent with objectives. Training data and New data need to be well formed, and consistent. Need to consider rare observations: anomaly detection. Experimentation is required; no one best solution. There are always trade-offs. Continually monitor model performance: is the real world drifting? Need to have model measurement and validation processes. Change control of a new model, what, why and how. Models are guides, not the answer. Machine Learning and R

37 Thank You! Introduction Machine Learning Phone: Example code and slides available:?? Machine Learning and R

38 References: Introduction Machine Learning Wikipedia Vucetic, Slobodan http: // CRoss Industry Standard Process for Data Mining Trevor Hastie, Robert Tibshirani, Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Machine Learning and R

### Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

### CS 207 - Data Science and Visualization Spring 2016

CS 207 - Data Science and Visualization Spring 2016 Professor: Sorelle Friedler sorelle@cs.haverford.edu An introduction to techniques for the automated and human-assisted analysis of data sets. These

### DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

### Application of Predictive Analytics for Better Alignment of Business and IT

Application of Predictive Analytics for Better Alignment of Business and IT Boris Zibitsker, PhD bzibitsker@beznext.com July 25, 2014 Big Data Summit - Riga, Latvia About the Presenter Boris Zibitsker

### An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

### CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

### A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML

www.bsc.es A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML Josep Ll. Berral, Nicolas Poggi, David Carrera Workshop on Big Data Benchmarks Toronto, Canada 2015 1 Context ALOJA: framework

### Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

### Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be

### Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by

### Machine learning for algo trading

Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with

### Fast Analytics on Big Data with H20

Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,

### Machine Learning with MATLAB David Willingham Application Engineer

Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the

### Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject

Defending Networks with Incomplete Information: A Machine Learning Approach Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject Agenda Security Monitoring: We are doing it wrong Machine Learning

### Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00

### 203.4770: Introduction to Machine Learning Dr. Rita Osadchy

203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:

### Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next

### High Productivity Data Processing Analytics Methods with Applications

High Productivity Data Processing Analytics Methods with Applications Dr. Ing. Morris Riedel et al. Adjunct Associate Professor School of Engineering and Natural Sciences, University of Iceland Research

### An Introduction to Advanced Analytics and Data Mining

An Introduction to Advanced Analytics and Data Mining Dr Barry Leventhal Henry Stewart Briefing on Marketing Analytics 19 th November 2010 Agenda What are Advanced Analytics and Data Mining? The toolkit

### Machine Learning using MapReduce

Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

### BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort xavier.conort@gear-analytics.com Session Number: TBR14 Insurance has always been a data business The industry has successfully

### Machine Learning What, how, why?

Machine Learning What, how, why? Rémi Emonet (@remiemonet) 2015-09-30 Web En Vert \$ whoami \$ whoami Software Engineer Researcher: machine learning, computer vision Teacher: web technologies, computing

### COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

### Predicting borrowers chance of defaulting on credit loans

Predicting borrowers chance of defaulting on credit loans Junjie Liang (junjie87@stanford.edu) Abstract Credit score prediction is of great interests to banks as the outcome of the prediction algorithm

### Anomaly detection. Problem motivation. Machine Learning

Anomaly detection Problem motivation Machine Learning Anomaly detection example Aircraft engine features: = heat generated = vibration intensity Dataset: New engine: (vibration) (heat) Density estimation

### Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

### COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK DEPARTMENT OF INDUSTRIAL ENGINEERING AND OPERATIONS RESEARCH

Course: IEOR 4575 Business Analytics for Operations Research Lectures MW 2:40-3:55PM Instructor Prof. Guillermo Gallego Office Hours Tuesdays: 3-4pm Office: CEPSR 822 (8 th floor) Textbooks and Learning

### Data Mining. Nonlinear Classification

Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

### Machine Learning. 01 - Introduction

Machine Learning 01 - Introduction Machine learning course One lecture (Wednesday, 9:30, 346) and one exercise (Monday, 17:15, 203). Oral exam, 20 minutes, 5 credit points. Some basic mathematical knowledge

### Data Mining. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/

Data Mining Dr. Saed Sayad University of Toronto 2010 saed.sayad@utoronto.ca http://chem-eng.utoronto.ca/~datamining/ 1 Data Mining Data mining is about explaining the past and predicting the future by

### Prerequisites. Course Outline

MS-55040: Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot Description This three-day instructor-led course will introduce the students to the concepts of data mining,

### Data Mining Applications in Higher Education

Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

### Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

### lop Building Machine Learning Systems with Python en source

Building Machine Learning Systems with Python Master the art of machine learning with Python and build effective machine learning systems with this intensive handson guide Willi Richert Luis Pedro Coelho

### Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution

### Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)

### Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Machine Learning Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos) What Is Machine Learning? A computer program is said to learn from experience E with respect to some class of

### A semi-supervised Spam mail detector

A semi-supervised Spam mail detector Bernhard Pfahringer Department of Computer Science, University of Waikato, Hamilton, New Zealand Abstract. This document describes a novel semi-supervised approach

### MACHINE LEARNING BASICS WITH R

MACHINE LEARNING [Hands-on Introduction of Supervised Machine Learning Methods] DURATION 2 DAY The field of machine learning is concerned with the question of how to construct computer programs that automatically

### Machine Learning for Data Science (CS4786) Lecture 1

Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:

### Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

### How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

### CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

CSci 538 Articial Intelligence (Machine Learning and Data Analysis) Course Syllabus Fall 2015 Instructor Derek Harter, Ph.D., Associate Professor Department of Computer Science Texas A&M University - Commerce

### Risk pricing for Australian Motor Insurance

Risk pricing for Australian Motor Insurance Dr Richard Brookes November 2012 Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model

### Learning is a very general term denoting the way in which agents:

What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

### The Data Mining Process

Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

### Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

### COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers

COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Instructor: (jpineau@cs.mcgill.ca) TAs: Pierre-Luc Bacon (pbacon@cs.mcgill.ca) Ryan Lowe (ryan.lowe@mail.mcgill.ca)

### Analysis of Tweets for Prediction of Indian Stock Markets

Analysis of Tweets for Prediction of Indian Stock Markets Phillip Tichaona Sumbureru Department of Computer Science and Engineering, JNTU College of Engineering Hyderabad, Kukatpally, Hyderabad-500 085,

### TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP

TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions

### Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

### What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley,

### Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu

Machine Learning CS 6830 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu What is Learning? Merriam-Webster: learn = to acquire knowledge, understanding, or skill

### Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

www.cognitro.com/training Predicitve DATA EMPOWERING DECISIONS Data Mining & Predicitve Training (DMPA) is a set of multi-level intensive courses and workshops developed by Cognitro team. it is designed

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov

Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

### 8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

### New Ensemble Combination Scheme

New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,

### A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

### Inciting Cloud Virtual Machine Reallocation With Supervised Machine Learning and Time Series Forecasts. Eli M. Dow IBM Research, Yorktown NY

Inciting Cloud Virtual Machine Reallocation With Supervised Machine Learning and Time Series Forecasts Eli M. Dow IBM Research, Yorktown NY What is this talk about? This is a 30 minute technical talk about

### Machine Learning over Big Data

Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed

### Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

### Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration

### Analysis Tools and Libraries for BigData

+ Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I

### Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

### Car Insurance. Prvák, Tomi, Havri

Car Insurance Prvák, Tomi, Havri Sumo report - expectations Sumo report - reality Bc. Jan Tomášek Deeper look into data set Column approach Reminder What the hell is this competition about??? Attributes

### An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

### Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

### An Introduction to Data Mining

An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

### Scalable Machine Learning to Exploit Big Data for Knowledge Discovery

Scalable Machine Learning to Exploit Big Data for Knowledge Discovery Una-May O Reilly MIT MIT ILP-EPOCH Taiwan Symposium Big Data: Technologies and Applications Lots of Data Everywhere Knowledge Mining

### Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

### Make Better Decisions Through Predictive Intelligence

IBM SPSS Modeler Professional Make Better Decisions Through Predictive Intelligence Highlights Easily access, prepare and model structured data with this intuitive, visual data mining workbench Rapidly

### Maschinelles Lernen mit MATLAB

Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical

Performance Workload Design The goal of this paper is to show the basic principles involved in designing a workload for performance and scalability testing. We will understand how to achieve these principles

### Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,

### MA2823: Foundations of Machine Learning

MA2823: Foundations of Machine Learning École Centrale Paris Fall 2015 Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr TAs: Jiaqian Yu jiaqian.yu@centralesupelec.fr

### HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

### Banking Analytics Training Program

Training (BAT) is a set of courses and workshops developed by Cognitro Analytics team designed to assist banks in making smarter lending, marketing and credit decisions. Analyze Data, Discover Information,

### Chapter 20: Data Analysis

Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

### CloudRank-D:A Benchmark Suite for Private Cloud Systems

CloudRank-D:A Benchmark Suite for Private Cloud Systems Jing Quan Institute of Computing Technology, Chinese Academy of Sciences and University of Science and Technology of China HVC tutorial in conjunction

### Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Brochure More information from http://www.researchandmarkets.com/reports/2170926/ Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd

### MS1b Statistical Data Mining

MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

### Predictive Data modeling for health care: Comparative performance study of different prediction models

Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath hiremat.nitie@gmail.com National Institute of Industrial Engineering (NITIE) Vihar

### Machine Learning and Data Mining. Fundamentals, robotics, recognition

Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,

### Predictive Modeling Techniques in Insurance

Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics

### Data Mining for Fun and Profit

Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools

### Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

### DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

### Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu

Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel

### Machine Learning in Python with scikit-learn. O Reilly Webcast Aug. 2014

Machine Learning in Python with scikit-learn O Reilly Webcast Aug. 2014 Outline Machine Learning refresher scikit-learn How the project is structured Some improvements released in 0.15 Ongoing work for

### Monday Morning Data Mining

Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse Outline: - data mining - IceCube - Data mining in IceCube Computer Scientists are different... Fakultät Physik Fakultät Physik

### Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Data Mining Techniques forcrm Data Mining The non-trivial extraction of novel, implicit, and actionable knowledge from large datasets. Extremely large datasets Discovery of the non-obvious Useful knowledge

### Data Warehousing and Data Mining in Business Applications

133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business

### Machine Learning in Spam Filtering

Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

### Identifying SPAM with Predictive Models

Identifying SPAM with Predictive Models Dan Steinberg and Mikhaylo Golovnya Salford Systems 1 Introduction The ECML-PKDD 2006 Discovery Challenge posed a topical problem for predictive modelers: how to

### W6.B.1. FAQs CS535 BIG DATA W6.B.3. 4. If the distance of the point is additionally less than the tight distance T 2, remove it from the original set

http://wwwcscolostateedu/~cs535 W6B W6B2 CS535 BIG DAA FAQs Please prepare for the last minute rush Store your output files safely Partial score will be given for the output from less than 50GB input Computer

### IN THE CITY OF NEW YORK Decision Risk and Operations. Advanced Business Analytics Fall 2015

Advanced Business Analytics Fall 2015 Course Description Business Analytics is about information turning data into action. Its value derives fundamentally from information gaps in the economic choices