Machine Learning Capacity and Performance Analysis and R

Size: px
Start display at page:

Download "Machine Learning Capacity and Performance Analysis and R"

Transcription

1 Machine Learning and R May 3, 11

2 P95 CPU AVG CPU Introduction Introduction Machine Learning Brief Introduction to Machine Learning and Data Mining What, Why and How How can this be applied to Capacity and Performance Analysis Data driven Patterns Example: Utilization Profiling in R Data Transformation Model Construction and Test Model Deployment Day of the Month Day of the Month Day of the Month CPU July Hours June Hours May Hours Day of the Month Day of the Month Day of the Month rdcuxsrv277.insidelive.net Memory July Hours June Hours May Make: UNIX Hours % CPU Utilization Forecast P95: F30=85.3 F60=87.8 F90=90.3 F180=97.7 AVG: F30= 57.3 F60= 58.8 F90= 60.2 F180= 64.5 rdcuxsrv277.insidelive.net Days Configuration Environment: QA OS: SunOS OS Version: G Number of CPU: 16 Total Memory: Machine Learning and R

3 Machine Learning: Definition There are Many: Here are a couple Definition: Tom M. Mitchell provided a widely quoted definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.[1] Definition: The field of machine learning studies the design of computer programs able to induce patterns, regularities, or rules from past experiences. Learner (a computer program) processes data representing past experiences and tries to either develop an appropriate response to future data, or describe in some meaningful way the data seen. [2] Machine Learning and R

4 Example: Handwritten Digits Elements of Machine Learning: Task T recognizing and classifying handwritten words within images Performance P percent of words correctly classified Experience E a database of handwritten words with given classifications Machine Learning and R

5 Methods Introduction Machine Learning Supervised learning[4]: Use a labeled (known) set of data to build models to perform classification or regression Use the model on new data to make predictions or describe the data Supervised algorithms: Linear Regression Trees Neural Networks Support Vector Machines Machine Learning and R

6 Methods Introduction Machine Learning Unsupervised learning[4]: Find hidden structure in unlabeled (unknown) data Unsupervised algorithms: Kmeans K Nearest Neighbor Hierarchical Clustering Association Rules Principal Components Machine Learning and R

7 Machine Learning is a Process Like application development CRISP-DM, for example[3]: Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment REPEAT AS NEEDED Machine Learning and R

8 Applications Introduction Machine Learning Some notable applications: Spam Filtering Yahoo Fraud / Anomaly Detection - Credit Card Stock Predications / Trading Models Recommendation - Netflix Social Network Analysis Facebook Internet Search Google Machine Learning and R

9 And you can make money! Data Mining Competitions: Netflix $1M Heritage Health Prize - $3M Kaggle Machine Learning and R

10 Capacity Planning and Performance Analysis: A simplified view of capacity planning: Hourly and Daily, Monday thru Friday stats Peak and Average Daily Utilization Simple linear regression on peak and average utilization Extrapolate days into the future. Machine Learning and R

11 Capacity Planning Simplified View: rdcuxsrv277.insidelive.net CPU July Memory July Forecast P95: F30=85.3 F60=87.8 F90=90.3 F180=97.7 AVG: F30= 57.3 F60= 58.8 F90= 60.2 F180= 64.5 Day of the Month Day of the Month P95 CPU AVG CPU rdcuxsrv277.insidelive.net Hours June Hours June % CPU Utilization Day of the Month Day of the Month Hours May Day of the Month Day of the Month Hours May Days Configuration Environment: QA Make: UNIX OS: SunOS OS Version: G Number of CPU: 16 Total Memory: Hours Hours * One page Server Utilization, Forecast, and Configuration developed using R Machine Learning and R

12 Capacity Planning Simplified View: Put all forecasts into a spreadsheet and sort by 30, 60, 90, or 180 forecast to find top Utilized servers Machine Learning and R

13 Capacity Planning Simplified View: Works well for stable business applications and environments. With servers capacity planning for critical servers is straight forward. Machine Learning and R

14 Capacity Planning Real world With 3k-4k-5k-+10k servers capacity planning is very difficult. Why? Many different environments, Production, Test, BCP, QA Many different applications: database, web server, Quant, Hadoop, etc. Many different hardware platforms: Unix, Linux, Windows, VMWare, etc. Things are not stable and well formed different applications have different utilization profiles, for different reasons. Exceptions occur Machine Learning and R

15 How can Machine Learning help? Lots and lots of data System have many different components and each component has its own function and collection of metrics that determine performance No problem is in isolation, there are many different sets of data that need to be correlated Many different relationships server, storage, database, application... Lots of historical data (Capacity Database?) Many repeating and familiar patterns Machine Learning and R

16 How can Machine Learning help? Elements of Machine Learning: Task T Classify resource utilization/performance Performance P Filtered list of key utilization classes Enhanced Monitoring Smart Alerting Experience E Historical data from a capacity database Machine Learning and R

17 Utilization Patterns Resource Utilization have patterns They are visual clues as to performance and future utilization These patterns can be grouped: Normal A flat utilization, stable environment, could be either consistently high/middle/low in its utilization. Cyclic Highly variable workload, maybe some consistency like month-end or quant load. Trend Increasing Organic growth in the utilization. Trend Decreasing Reduced workload, application retirement. Shift Upward Sharp increase in processing, could be related to broken processes, cluster failover to stand-by server, and/or new application deployment. Shift Downward Sharp decrease in processing, could be related to fixing processes, fail-back, or application retirement. Machine Learning and R

18 Utilization Patterns Machine Learning and R

19 What is R? Introduction Machine Learning Open source statistical programming language Great visualization packages Many different modeling packages Many different machine learning packages Almost a complete solution for building machine learning tools scaling is an issue, i.e. the problem has to fit in memory. Machine Learning and R

20 Support Vector Machine Machine Learning and R

21 Data Considerations Performance and Utilization data is time series DateTime S e r v e r AverageCPU w e b s e r v e r w e b s e r v e r w e b s e r v e r w e b s e r v e r w e b s e r v e r 94 ML data format is a matrix with the general form Y, X1, X2,...Xn Need to convert time series to matrix Y X1 X2 X3 X4 X5 w e b s e r v e r Data needs to be consistent and well formed, no missing or bad data Machine Learning and R

22 SVM Demonstration Data is generated, prototypes.r helperfunctions.r createdata confusionm printmissclassified demo 1.R Builds an initial SVM model, tunes the model and classifies new data with the model demo 2.R Improves the accuracy of the initial model Machine Learning and R

23 Generated Data Introduction Machine Learning Datasets contain 100 of each type of pattern, i.e. 600 servers There are 130 X data points/features representing 180 days, Monday thru Friday Randomly generated... Machine Learning and R

24 First Predictive Model Create the first model ### OUT OF THE BOX ## GET DATA Ynew < dget ( Y 7 ) data < c r e a t e D a t a ( Ynew ) ## SPLIT TO x and Y x < s u b s e t ( data, s e l e c t = c l a s s ) y < data $ c l a s s ## BUILD MODEL model < svm ( c l a s s., data = data ) summary ( model ) C a l l : svm ( formula = c l a s s., data = data ) Parameters : SVM Type : C c l a s s i f i c a t i o n SVM K e r n e l : r a d i a l cost : 1 gamma : Number of Support Vectors : 545 ( ) Number of Classes : 6 Levels : C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Machine Learning and R

25 First Predictive Model Check the models accuracy > ## PREDICTIONS pred < p r e d i c t ( model, x ) # CHECK ACCURACY: confusionm ( pred, y ) P r e d i c t e d Values : Yp C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Y values : Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Confusion Matrix : Yp Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp C y c l i c Normal ShiftDown S h i f t U p TrendDown TrendUp Accuracy = [ 1 ] Machine Learning and R

26 First Predictive Model Model Tuning: > o b j < tune. svm ( c l a s s., data = data, gamma = 2ˆ( 1:1), c o s t = 2 ˆ ( 2 : 4 ) ) > summary ( o b j ) Parameter t u n i n g o f s v m : s a m p l i n g method : 10 f o l d c r o s s v a l i d a t i o n best parameters : gamma c o s t best performance : D e t a i l e d p e r f o r m a n c e r e s u l t s : gamma c o s t e r r o r d i s p e r s i o n Machine Learning and R

27 First Predictive Model Re-run the training data with tuned parameters > ### AFTER TUNING ## NEW MODEL WITH COST AND GAMMA model < svm ( c l a s s., data = data, cost =2.25, gamma=.01) ## RE DO THE PREDICTION pred < p r e d i c t ( model, x ) # CHECK ACCURACY confusionm ( pred, y ) P r e d i c t e d Values : Yp C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Y values : Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Confusion Matrix : Yp Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp C y c l i c Normal ShiftDown S h i f t U p TrendDown TrendUp Accuracy = [ 1 ] Machine Learning and R

28 First Predictive Model Missclassification Analysis > p r i n t M i s s C l a s s i f i e d ( pred, y ) P r e d i c t e d A c t u a l o b s e r v a t i o n 1 Normal C y c l i c Normal C y c l i c TrendUp S h i f t U p TrendUp S h i f t U p TrendDown ShiftDown TrendDown ShiftDown 586 > p l o t ( t ( data [ 1 5 6, 2 : ] ), t y p e= l, main= Debug, y l a b= %u t i l, y l i m=c ( 0, ) ) Machine Learning and R

29 First Predictive Model Use the model to classify new data ### NEW DATA ## READ IN DATA THAT MODEL HAS NOT SEEN Ynew < dget ( Y 6 ) data < c r e a t e D a t a ( Ynew ) ## SPLIT TO X and Y x < s u b s e t ( data, s e l e c t = c l a s s ) y < data $ c l a s s ## PREDICT CLASS USING PREVIOUSLY CREATED MODEL pred < p r e d i c t ( model, x ) # CHECK ACCURACY confusionm ( pred, y ) P r e d i c t e d Values : Yp C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Y values : Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Confusion Matrix : Yp Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp C y c l i c Normal ShiftDown S h i f t U p TrendDown TrendUp Accuracy = [ 1 ] Machine Learning and R

30 How Can we improve on these results? More tuning of the cost and gamma parameters? Is the data representative of the real world? More training data in the model? Less training data in the model? Model overfitting? Are we using the right algorithm, the right way? Machine Learning and R

31 Improved Predictive Model Add more data to the Model Build > ### OUT OF THE BOX ## ADD MORE DATA TO THE MODEL BUILD Ynew < dget ( Y 7 ) d < c r e a t e D a t a ( Ynew ) data < d Ynew < dget ( Y 5 ) d < c r e a t e D a t a ( Ynew ) data < r b i n d ( data, d ) Ynew < dget ( Y 4 ) d < c r e a t e D a t a ( Ynew ) data < r b i n d ( data, d ) Ynew < dget ( Y 3 ) d < c r e a t e D a t a ( Ynew ) data < r b i n d ( data, d ) Ynew < dget ( Y 2 ) d < c r e a t e D a t a ( Ynew ) data < r b i n d ( data, d ) Ynew < dget ( Y 1 ) d < c r e a t e D a t a ( Ynew ) data < r b i n d ( data, d ) ## SPLIT TO x and Y x < s u b s e t ( data, s e l e c t = c l a s s ) y < data $ c l a s s Machine Learning and R

32 Improved Predictive Model Add more data to the Model Build, cont. > ## BUILD MODEL model < svm ( c l a s s., data = data ) summary ( model ) C a l l : svm ( formula = c l a s s., data = data ) Parameters : SVM Type : C c l a s s i f i c a t i o n SVM K e r n e l : r a d i a l cost : 1 gamma : Number of Support Vectors : 2475 ( ) Number of Classes : 6 Levels : C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Machine Learning and R

33 Improved Predictive Model Add more data to the Model Build, cont. > ## PREDICTIONS pred < p r e d i c t ( model, x ) # CHECK ACCURACY: confusionm ( pred, y ) P r e d i c t e d Values : Yp C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Y values : Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Confusion Matrix : Yp Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp C y c l i c Normal ShiftDown S h i f t U p TrendDown TrendUp Accuracy = [ 1 ] Machine Learning and R

34 Improved Predictive Model Tune the new model > ### AFTER TUNING ## NEW MODEL WITH COST AND GAMMA model < svm ( c l a s s., data = data, cost =2.25, gamma=.01) ## RE DO THE PREDICTION pred < p r e d i c t ( model, x ) # CHECK ACCURACY confusionm ( pred, y ) P r e d i c t e d Values : Yp C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Y values : Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Confusion Matrix : Yp Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp C y c l i c Normal ShiftDown S h i f t U p TrendDown TrendUp Accuracy = [ 1 ] Machine Learning and R

35 Improved Predictive Model Use the new model to classify new data > ### NEW DATA ## READ IN DATA THAT MODEL HAS NOT SEEN Ynew < dget ( Y 6 ) data < c r e a t e D a t a ( Ynew ) ## SPLIT TO X and Y x < s u b s e t ( data, s e l e c t = c l a s s ) y < data $ c l a s s ## PREDICT CLASS USING PREVIOUSLY CREATED MODEL pred < p r e d i c t ( model, x ) # CHECK ACCURACY confusionm ( pred, y ) P r e d i c t e d Values : Yp C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Y values : Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp Confusion Matrix : Yp Y C y c l i c Normal ShiftDown ShiftUp TrendDown TrendUp C y c l i c Normal ShiftDown S h i f t U p TrendDown TrendUp Accuracy = [ 1 ] Machine Learning and R

36 Model Deployment Considerations Need to create the training data from real data; lots of labeling required. Patterns and labeling need to be consistent with objectives. Training data and New data need to be well formed, and consistent. Need to consider rare observations: anomaly detection. Experimentation is required; no one best solution. There are always trade-offs. Continually monitor model performance: is the real world drifting? Need to have model measurement and validation processes. Change control of a new model, what, why and how. Models are guides, not the answer. Machine Learning and R

37 Thank You! Introduction Machine Learning Phone: Example code and slides available:?? Machine Learning and R

38 References: Introduction Machine Learning Wikipedia Vucetic, Slobodan http: // CRoss Industry Standard Process for Data Mining Trevor Hastie, Robert Tibshirani, Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Machine Learning and R

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

CS 207 - Data Science and Visualization Spring 2016

CS 207 - Data Science and Visualization Spring 2016 CS 207 - Data Science and Visualization Spring 2016 Professor: Sorelle Friedler sorelle@cs.haverford.edu An introduction to techniques for the automated and human-assisted analysis of data sets. These

More information

Application of Predictive Analytics for Better Alignment of Business and IT

Application of Predictive Analytics for Better Alignment of Business and IT Application of Predictive Analytics for Better Alignment of Business and IT Boris Zibitsker, PhD bzibitsker@beznext.com July 25, 2014 Big Data Summit - Riga, Latvia About the Presenter Boris Zibitsker

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML www.bsc.es A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML Josep Ll. Berral, Nicolas Poggi, David Carrera Workshop on Big Data Benchmarks Toronto, Canada 2015 1 Context ALOJA: framework

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore. CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

More information

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs 1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be

More information

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Is a Data Scientist the New Quant? Stuart Kozola MathWorks Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by

More information

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376 Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

More information

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

More information

Machine learning for algo trading

Machine learning for algo trading Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with

More information

Assessing Data Mining: The State of the Practice

Assessing Data Mining: The State of the Practice Assessing Data Mining: The State of the Practice 2003 Herbert A. Edelstein Two Crows Corporation 10500 Falls Road Potomac, Maryland 20854 www.twocrows.com (301) 983-3555 Objectives Separate myth from reality

More information

Fast Analytics on Big Data with H20

Fast Analytics on Big Data with H20 Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,

More information

Machine Learning with MATLAB David Willingham Application Engineer

Machine Learning with MATLAB David Willingham Application Engineer Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the

More information

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject Defending Networks with Incomplete Information: A Machine Learning Approach Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject Agenda Security Monitoring: We are doing it wrong Machine Learning

More information

203.4770: Introduction to Machine Learning Dr. Rita Osadchy

203.4770: Introduction to Machine Learning Dr. Rita Osadchy 203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

High Productivity Data Processing Analytics Methods with Applications

High Productivity Data Processing Analytics Methods with Applications High Productivity Data Processing Analytics Methods with Applications Dr. Ing. Morris Riedel et al. Adjunct Associate Professor School of Engineering and Natural Sciences, University of Iceland Research

More information

An Introduction to Advanced Analytics and Data Mining

An Introduction to Advanced Analytics and Data Mining An Introduction to Advanced Analytics and Data Mining Dr Barry Leventhal Henry Stewart Briefing on Marketing Analytics 19 th November 2010 Agenda What are Advanced Analytics and Data Mining? The toolkit

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

More information

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next

More information

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016 Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00

More information

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort xavier.conort@gear-analytics.com Session Number: TBR14 Insurance has always been a data business The industry has successfully

More information

Machine Learning What, how, why?

Machine Learning What, how, why? Machine Learning What, how, why? Rémi Emonet (@remiemonet) 2015-09-30 Web En Vert $ whoami $ whoami Software Engineer Researcher: machine learning, computer vision Teacher: web technologies, computing

More information

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann

More information

COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK DEPARTMENT OF INDUSTRIAL ENGINEERING AND OPERATIONS RESEARCH

COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK DEPARTMENT OF INDUSTRIAL ENGINEERING AND OPERATIONS RESEARCH Course: IEOR 4575 Business Analytics for Operations Research Lectures MW 2:40-3:55PM Instructor Prof. Guillermo Gallego Office Hours Tuesdays: 3-4pm Office: CEPSR 822 (8 th floor) Textbooks and Learning

More information

Data Mining. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/

Data Mining. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/ Data Mining Dr. Saed Sayad University of Toronto 2010 saed.sayad@utoronto.ca http://chem-eng.utoronto.ca/~datamining/ 1 Data Mining Data mining is about explaining the past and predicting the future by

More information

Prerequisites. Course Outline

Prerequisites. Course Outline MS-55040: Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot Description This three-day instructor-led course will introduce the students to the concepts of data mining,

More information

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

CSCI-599 DATA MINING AND STATISTICAL INFERENCE CSCI-599 DATA MINING AND STATISTICAL INFERENCE Course Information Course ID and title: CSCI-599 Data Mining and Statistical Inference Semester and day/time/location: Spring 2013/ Mon/Wed 3:30-4:50pm Instructor:

More information

Advanced analytics at your hands

Advanced analytics at your hands 2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously

More information

Machine Learning. 01 - Introduction

Machine Learning. 01 - Introduction Machine Learning 01 - Introduction Machine learning course One lecture (Wednesday, 9:30, 346) and one exercise (Monday, 17:15, 203). Oral exam, 20 minutes, 5 credit points. Some basic mathematical knowledge

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

Predicting borrowers chance of defaulting on credit loans

Predicting borrowers chance of defaulting on credit loans Predicting borrowers chance of defaulting on credit loans Junjie Liang (junjie87@stanford.edu) Abstract Credit score prediction is of great interests to banks as the outcome of the prediction algorithm

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python. Vikram Kamath Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

More information

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

More information

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos) Machine Learning Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos) What Is Machine Learning? A computer program is said to learn from experience E with respect to some class of

More information

Machine Learning for Data Science (CS4786) Lecture 1

Machine Learning for Data Science (CS4786) Lecture 1 Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:

More information

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)

More information

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining + Business Intelligence. Integration, Design and Implementation Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution

More information

Anomaly detection. Problem motivation. Machine Learning

Anomaly detection. Problem motivation. Machine Learning Anomaly detection Problem motivation Machine Learning Anomaly detection example Aircraft engine features: = heat generated = vibration intensity Dataset: New engine: (vibration) (heat) Density estimation

More information

MACHINE LEARNING BASICS WITH R

MACHINE LEARNING BASICS WITH R MACHINE LEARNING [Hands-on Introduction of Supervised Machine Learning Methods] DURATION 2 DAY The field of machine learning is concerned with the question of how to construct computer programs that automatically

More information

CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

CSci 538 Articial Intelligence (Machine Learning and Data Analysis) CSci 538 Articial Intelligence (Machine Learning and Data Analysis) Course Syllabus Fall 2015 Instructor Derek Harter, Ph.D., Associate Professor Department of Computer Science Texas A&M University - Commerce

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Learning is a very general term denoting the way in which agents:

Learning is a very general term denoting the way in which agents: What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

More information

Data Mining: Overview. What is Data Mining?

Data Mining: Overview. What is Data Mining? Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu

Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Machine Learning CS 6830 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu What is Learning? Merriam-Webster: learn = to acquire knowledge, understanding, or skill

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

A semi-supervised Spam mail detector

A semi-supervised Spam mail detector A semi-supervised Spam mail detector Bernhard Pfahringer Department of Computer Science, University of Waikato, Hamilton, New Zealand Abstract. This document describes a novel semi-supervised approach

More information

8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence 8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Inciting Cloud Virtual Machine Reallocation With Supervised Machine Learning and Time Series Forecasts. Eli M. Dow IBM Research, Yorktown NY

Inciting Cloud Virtual Machine Reallocation With Supervised Machine Learning and Time Series Forecasts. Eli M. Dow IBM Research, Yorktown NY Inciting Cloud Virtual Machine Reallocation With Supervised Machine Learning and Time Series Forecasts Eli M. Dow IBM Research, Yorktown NY What is this talk about? This is a 30 minute technical talk about

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Risk pricing for Australian Motor Insurance

Risk pricing for Australian Motor Insurance Risk pricing for Australian Motor Insurance Dr Richard Brookes November 2012 Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration

More information

Analysis Tools and Libraries for BigData

Analysis Tools and Libraries for BigData + Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

More information

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

More information

Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Analysis of Tweets for Prediction of Indian Stock Markets

Analysis of Tweets for Prediction of Indian Stock Markets Analysis of Tweets for Prediction of Indian Stock Markets Phillip Tichaona Sumbureru Department of Computer Science and Engineering, JNTU College of Engineering Hyderabad, Kukatpally, Hyderabad-500 085,

More information

COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers

COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Instructor: (jpineau@cs.mcgill.ca) TAs: Pierre-Luc Bacon (pbacon@cs.mcgill.ca) Ryan Lowe (ryan.lowe@mail.mcgill.ca)

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

How To Make A Credit Risk Model For A Bank Account

How To Make A Credit Risk Model For A Bank Account TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions

More information

Maschinelles Lernen mit MATLAB

Maschinelles Lernen mit MATLAB Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III www.cognitro.com/training Predicitve DATA EMPOWERING DECISIONS Data Mining & Predicitve Training (DMPA) is a set of multi-level intensive courses and workshops developed by Cognitro team. it is designed

More information

MA2823: Foundations of Machine Learning

MA2823: Foundations of Machine Learning MA2823: Foundations of Machine Learning École Centrale Paris Fall 2015 Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr TAs: Jiaqian Yu jiaqian.yu@centralesupelec.fr

More information

New Ensemble Combination Scheme

New Ensemble Combination Scheme New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,

More information

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

More information

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,

More information

Machine Learning over Big Data

Machine Learning over Big Data Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed

More information

Data Mining. SPSS Clementine 12.0. 1. Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Data Mining. SPSS Clementine 12.0. 1. Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine Data Mining SPSS 12.0 1. Overview Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Types of Models Interface Projects References Outline Introduction Introduction Three of the common data mining

More information

Apache Hama Design Document v0.6

Apache Hama Design Document v0.6 Apache Hama Design Document v0.6 Introduction Hama Architecture BSPMaster GroomServer Zookeeper BSP Task Execution Job Submission Job and Task Scheduling Task Execution Lifecycle Synchronization Fault

More information

Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu

Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel

More information

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley,

More information

Predictive Modeling Techniques in Insurance

Predictive Modeling Techniques in Insurance Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics

More information

Data Mining for Fun and Profit

Data Mining for Fun and Profit Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Data Mining for Knowledge Management. Classification

Data Mining for Knowledge Management. Classification 1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

More information

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms Data Mining Techniques forcrm Data Mining The non-trivial extraction of novel, implicit, and actionable knowledge from large datasets. Extremely large datasets Discovery of the non-obvious Useful knowledge

More information

Machine Learning in Python with scikit-learn. O Reilly Webcast Aug. 2014

Machine Learning in Python with scikit-learn. O Reilly Webcast Aug. 2014 Machine Learning in Python with scikit-learn O Reilly Webcast Aug. 2014 Outline Machine Learning refresher scikit-learn How the project is structured Some improvements released in 0.15 Ongoing work for

More information

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition Brochure More information from http://www.researchandmarkets.com/reports/2170926/ Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

Make Better Decisions Through Predictive Intelligence

Make Better Decisions Through Predictive Intelligence IBM SPSS Modeler Professional Make Better Decisions Through Predictive Intelligence Highlights Easily access, prepare and model structured data with this intuitive, visual data mining workbench Rapidly

More information

Chapter 7. Feature Selection. 7.1 Introduction

Chapter 7. Feature Selection. 7.1 Introduction Chapter 7 Feature Selection Feature selection is not used in the system classification experiments, which will be discussed in Chapter 8 and 9. However, as an autonomous system, OMEGA includes feature

More information

Scalable Machine Learning to Exploit Big Data for Knowledge Discovery

Scalable Machine Learning to Exploit Big Data for Knowledge Discovery Scalable Machine Learning to Exploit Big Data for Knowledge Discovery Una-May O Reilly MIT MIT ILP-EPOCH Taiwan Symposium Big Data: Technologies and Applications Lots of Data Everywhere Knowledge Mining

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

Car Insurance. Prvák, Tomi, Havri

Car Insurance. Prvák, Tomi, Havri Car Insurance Prvák, Tomi, Havri Sumo report - expectations Sumo report - reality Bc. Jan Tomášek Deeper look into data set Column approach Reminder What the hell is this competition about??? Attributes

More information