Predictive Analytics. Omer Mimran, Spring Challenges in Modern Data Centers Management, Spring

Size: px
Start display at page:

Download "Predictive Analytics. Omer Mimran, Spring 2015. Challenges in Modern Data Centers Management, Spring 2015 1"

Transcription

1 Predictive Analytics Omer Mimran, Spring 2015 Challenges in Modern Data Centers Management, Spring

2 Information provided in these slides is for educational purposes only Challenges in Modern Data Centers Management, Spring

3 Agenda Motivation Predicting the jobs resource requirements Background and challenges Predictive analytics, data-stream mining (DSM) System overview DSM algorithms Regression tree, Hoeffding tree, Multiple sliding windows (MSW) Summary & conclusions Challenges in Modern Data Centers Management, Spring

4 Motivation Challenges in Modern Data Centers Management, Spring

5 Reminder: RM lectures I III Each job comes with resource requirements e.g., 2-cores X 8GB Specified by the user submitting the job, based on his experience, etc. Scheduler picks the job (RM-I) and matches it with a server (RM-II) Best fit, worst fit, etc. Challenges in Modern Data Centers Management, Spring

6 Why we need predictive analytics? What if the jobs (users) request too many resources? 8GB while in practice the job only uses 4GB of memory? Very common problem resulting in huge waste of resources ($$ loss) Even if resource matching was done optimally (RM-II lecture) Our goal (predictive analytics) Provide prediction for the actual resource usage of the jobs (focusing on memory) Forward this information to the scheduler to do the matching More jobs fit in higher throughput $$ saving Challenges in Modern Data Centers Management, Spring

7 Background and challenges Challenges in Modern Data Centers Management, Spring

8 Predictive analytics Predictive analytics: encompasses a variety of techniques from statistics, modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future, or otherwise unknown, events. (Nyce, Charles, 2007) Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed. (Arthur Samuel, 1959) An Introduction to Data Mining/Machine Learning General methodology (CRISP-DM): 1. Divide the data into 3 sets (training, testing, validation) 2. Use training set to create models and testing set to measure performance 3. Use validation set to select best model & test model generalization 4. Use the model for prediction Challenges in Modern Data Centers Management, Spring

9 Data-stream mining (DSM) Data-stream: continuous (endless) and rapid incoming data Idea: apply machine-learning techniques on-line, on the data stream Key challenges: 1. Performance: infeasible to store/train all data, each sample is processed once 2. Quality: expected to perform at least as well as no-stream models 3. Adaptability: non-stationary stream, the underlying model must be altered accordingly 4. Availability: must be available for prediction at all times (Bifet, et al., 2010; Domingos & Hulten, 2001; Aggarwal, 2007; Gama & Rodrigues, 2007; Gaber, et al., 2005; Babcock, et al., 2002) Challenges in Modern Data Centers Management, Spring

10 Adaptivity challenge: concept drift Concept drift: scenarios in which the distribution of a certain population changes over time; hence, statistical inference is affected (Kelly et al., 1999) Concept-drift types: 1. Sudden: easier to detect, with fewer examples 2. Gradual: harder to detect, often mistaken for random noise 3. Incremental: occur over long period of time 4. Recurring contexts: appear in a cyclic manner (Tsymbal, 2004; Gama & Castillo, 2006; Zliobaite, 2009) Possible treatments: 1. Resetting the training data (Klinkenberg, 2004; Cohen, et al., 2008) (Zliobaite, 2009) 2. Training a shadow model (Domingos & Hulten, 2000; Ikonomovska & Gama; 2008; Bifet & Gavaldà, 2009) 3. Using ensemble (Tsymbal et al., 2008; Ouyang et al., 2009) Challenges in Modern Data Centers Management, Spring

11 Concept drift in reality Bursts in jobs core and memory requirements Ohad Shai, Edi Shmueli, and Dror G. Feitelson, Heuristics for resource matching in Intel's compute farm. In Job Scheduling Strategies for Parallel Processing, Walfredo Cirne and Narayan Desai, (ed.), Springer-Verlag, 2013 Challenges in Modern Data Centers Management, Spring

12 Performance challenge: sliding windows Using time windows A common technique in stream mining Better performance Also addresses concept drift Time-window types 1. Landmark window: maintaining data, starting from identified relevant point 2. Tilted window: maintain all data within a window in different aggregate scales 3. Sliding window: only recent examples are stored in the window (Gama & Rodrigues, 2007) Challenges in Modern Data Centers Management, Spring

13 Performance challenge: sliding windows The problem: how to set window size? Too short lower statistical validity and stability Too long slow adaptation, with negative impact on quality Example: The accuracy of protein-structure prediction, using KNN with sliding windows of varying length (Chen, Kurgan, & Ruan, 2006) Challenges in Modern Data Centers Management, Spring

14 System overview 1 Challenges in Modern Data Centers Management, Spring

15 System overview input from the users 1 Job characteristics User, project, priority, command-line, resource requirements, etc. Data only known at submission time Categorial variables with many possible values Challenges in Modern Data Centers Management, Spring

16 System overview 1 2 Challenges in Modern Data Centers Management, Spring

17 System overview output of the model 2 Prediction example If command = A and project = Tablet then memory=4gb If command = B and project = Mobile then memory=6gb If priority = 1 and user team = uncore then memory=2gb If project = ServerX then memory=16gb Challenges in Modern Data Centers Management, Spring

18 System overview Challenges in Modern Data Centers Management, Spring

19 System overview output of the scheduler 3 Scheduler matches the jobs with machines/servers (RM-II lecture) Using the predicted values (not the original values specified by the user) More jobs fit in higher throughput $$ saving Challenges in Modern Data Centers Management, Spring

20 System overview Challenges in Modern Data Centers Management, Spring

21 System overview input to the model 4 Job characteristics User, project, priority, command-line, etc. Actual resources consumed by the jobs e.g., memory Challenges in Modern Data Centers Management, Spring

22 Performance measurements & objective Measurements calculated per job once completed & available in DB Calculating actual runtime/memory consumption vs. prediction Objective: maximum savings + minimum of 95% accuracy i.e., minimize resource waste, while ensuring that 95% of the jobs will not be under-estimated (otherwise they might be killed by the scheduler) Measurements (calculated for all jobs which got memory prediction): Accuracy = number of jobs with memory consumed<memory prediction number of jobs Saving = job runtime memory prediction memory requested by user Challenges in Modern Data Centers Management, Spring

23 DSM algorithms Challenges in Modern Data Centers Management, Spring

24 Challenge Data available for learning 1. Jobs characteristics: user, project, command-line, etc. 2. Actual resources consumed by the jobs e.g., memory Output Predict resource consumption for future incoming jobs Challenges in Modern Data Centers Management, Spring

25 DSM algorithms: regression tree idea Priority=1 NumOfLoops=2 Fast Incremental Regression Tree with Drift Detection (FIRT-DD) (Elena et al., 2009) Priority <= 5 Priority > 5 3 NumOfLoops = 0 NumOfLoops > New Job Memory Prediction = 4GB Challenges in Modern Data Centers Management, Spring

26 DSM algorithms: regression tree steps 1. Construct a tree using Chernoff bound comparing standard deviation reduction (SDR) of all possible values as split criteria Split node using Priority value 5 All candidate variables values are tested, Priority value 5 found best reducing STDEV Priority<= 5 Priority > 5 Challenges in Modern Data Centers Management, Spring

27 DSM algorithms: regression tree steps 2. Sliding window size is a pre- defined parameter Sliding window side = 5 Jobs 1,4,1,5,2 Priority <= 5 Priority > 5 new job value 3 added Job value 1 discarded Jobs 4,1,5,2,3 NumOfLoops = 0 NumOfLoops <> 0 Re-calculate prediction Median=3 2 Challenges in Modern Data Centers Management, Spring

28 DSM algorithms: regression tree steps 3. Adaptivity Track error rate using statistical PH test Grow a shadow sub-tree and replace once accuracy is better Priority <= 5 Compare Error Rate High Error Priority<= 5 Priority > 5 CommandNum <= 10 CommandNum >10 NumOfLoops = 0 NumOfLoops > 0 Challenges in Modern Data Centers Management, Spring

29 DSM algorithms: Hoeffding tree idea Project=A CommandType=X Hoeffding Adaptive Tree (HAT) (Bifet et al., 2009) Project = A Project = B fail Command Type = Y Command Type = X pass fail New job predicted to fail Challenges in Modern Data Centers Management, Spring

30 Entropy & information gain Entropy(S) = - n i=1 p i log 2 p i Weather Go to the beach P(Beach = Yes) = 5/12 P(Beach = No) = 7/12 Sunny Sunny Sunny Yes Yes Yes Entropy (Beach) = Sunny No - 5/12log 2 ( 5 12 )- 7/12log = 0.98 P(Weather=Sunny and Beach=Yes) = 3/4 P(Weather=Sunny and Beach=No) = 1/4 Overcast Overcast Overcast Overcast Yes Yes No No Entropy(S sunny ) = - 3/4log 2 ( 3 4 )- 1/4log 2 Entropy(S overcast ) = 1 Entropy(S rain ) = =0.81 Rain Rain Rain Rain No No No No Challenges in Modern Data Centers Management, Spring

31 Entropy & information gain Entropy (Beach) = 0.98 Entropy(S sunny ) = 0.81 Entropy(S overcast ) = 1 Entropy(S rain ) = 0 P(sunny) = P(overcast) = P(rain) = 4/12 Entropy (Beach Weather) = P(sunny)*Entropy(sunny) + P(overcast)* Entropy(overcast) + P(rain)*Entropy(rain) = 4/12(0.81) + 4/12(1) + 4/12(0) = 0.6 Weather Sunny Sunny Sunny Sunny Overcast Overcast Overcast Overcast By knowing the weather, how much information have I gained Rain? Rain Gain = Entropy(X) - Entropy(X Y) Rain Entropy(Beach) Entropy(Beach Weather) = Rain = 0.38 Go to the beach Yes Yes Yes No Yes Yes No No No No No No Challenges in Modern Data Centers Management, Spring

32 DSM algorithms: Hoeffding tree steps 1. Construct a tree using information gain as split criteria and Hoeffding bound statistical test as a stopping condition Split node using Project variable Information Gain calculated for all candidate variables if G(Best Attr.) G(2nd best)> ε* Split leaf on best attribute Project = A Project = B * ε = Hoeffding bound statistic Challenges in Modern Data Centers Management, Spring

33 DSM algorithms: Hoeffding tree steps 2. Sliding window size is dynamic (discussed later...) Sliding window side = 5 Jobs +,+,+,-,- Project = A Project = B new job - added Job + discarded Jobs +,+,-,-,- Command Type = Y Command Type = X Re-calculate prediction fail pass Challenges in Modern Data Centers Management, Spring

34 DSM algorithms: Hoeffding tree steps 3. Adaptivity A. Window size change similar to MSW (discussed later...) B. Alternate tree: After a concept drift in the data stream, followed by a stable period, a new alternate tree is generated Track error rate on new concept If new tree error is smaller for time T replace trees Challenges in Modern Data Centers Management, Spring

35 DSM algorithms: MSW idea Project=A CommandType=X Multiple Sliding Windows (MSW) (Mimran & Even, 2014) Project = A Project = B Command Type = Y Command Type = X Command Type = Y Command Type = X 2GB 4GB 4GB 10GB New job memory prediction is 4GB * MSW variable set is [Project],[Command Type] Challenges in Modern Data Centers Management, Spring

36 DSM algorithms: MSW 1. Before training the model, find set of variables that impact the memory consumption The method used forward selection minimizing variance and number of profiles: Candidate Variables A, B, C, D, E, F, G A, B, C, E, F, G B, C, E, F, G B, C, E, F C, E, F Variable Rank 1,2,3,4,3,2,1 6,5,4,3,2,1 4,6,8,10,20 3,1,1,2-1, -2, 0 Selected Variable Set D D, A D, A, G D, A, G, B D, A, G, B Challenges in Modern Data Centers Management, Spring

37 DSM algorithms: MSW Variable set selection illustration Challenges in Modern Data Centers Management, Spring

38 DSM algorithms: MSW 2. Set a sliding window per profile Predict Label Challenges in Modern Data Centers Management, Spring

39 DSM algorithms: MSW 3. Use any given prediction function within the window Objective: Chosen strategy: maximum saving + minimum 95% accuracy linear prediction function (φ = 0.95, C=0.1) Challenges in Modern Data Centers Management, Spring

40 DSM algorithms: MSW 4. Set the window size dynamically, using change detector Example for concept drift management of window with 850 jobs. Sub-window size parameter is 200 and confidence levels are: 97.5%, 95%, 90%, 90% Division to sub-windows 1st change detection comparison, 97.5% confidence level 2nd change detection comparison, 95% confidence level 3rd change detection comparison, 90% confidence level Flow in case of 2 nd comparison being statistically significant 1st comparison is not statistically significant go to next sub-windows 2nd comparison is statistically significant prune window New sliding window Older observations Older observations Challenges in Modern Data Centers Management, Spring

41 DSM algorithms: MSW 4. Set the window size dynamically, using change detector Change detector function: Hoeffding bound The Hoeffding bound (Hoeffding, 1963), also known as additive Chernoff bound R - The range of the variable (1-δ) - The statistical confidence n - The number of examples ε = R2 ln 1 δ 2n Alternate function: Kolmogorov Smirnov test Courtesy of Wikipedia Challenges in Modern Data Centers Management, Spring

42 Summary & conclusions Model Model Type Sliding Windows Window Size Adaptivity Change Detector Incremental Online Info-fuzzy Network (IOLIN) (Cohen et al., 2008) classification 1 window heuristic update network Accuracy Degregation Fast Incremental Regression Tree with Drift Detection (FIRT-DD) (Ikonomovska et al., 2009) regression multiple windows userdefined shadow model Error PH test Concept adapting Very Fast Decision Trees (CVFDT) (Hulten et al., 2001) classification 1 window userdefined shadow model Hoeffding Bound Hoeffding Adaptive Tree (HAT) (Bifet et al., 2009) classification multiple windows dynamic modify window Hoeffding Bound Multiple Sliding Windows (MSW) (Mimran & Even, 2014) Classification Regression multiple windows dynamic modify window Hoeffding Bound Challenges in Modern Data Centers Management, Spring

43 MSW in production: predicting jobs memory usage Deploying the model improved throughput by 10% By allowing the scheduler to fit more jobs on available resources Challenges in Modern Data Centers Management, Spring

44 Can we do the same for the jobs runtime? Jobs runtime behavior is more chaotic compared to memory Some jobs get killed upon startup, e.g., due to configuration issues Jobs sharing the same CPU create contention impacting runtime Environmental issues impact runtime, e.g., file system slowness Non-uniform server configurations having different CPU speeds Hyper-threading, etc. Conclusion: existing variables do not sufficiently reduce variance Challenges in Modern Data Centers Management, Spring

45 Can we do the same for the jobs runtime? Proposed approach predict the extremes MAX (improved throughput by 5%) 0.5% of jobs consume ~10% of resources with high failure rate Predict the outliers using large windows and kill them MIN (not implemented yet) ~50% of jobs run less than 5 minutes Predict if job s run time is short for better scheduling use cases Challenges in Modern Data Centers Management, Spring

46 References Cohen, L., Avrahami, G., Last, M., & Kandel, A. (2008, September). Info-fuzzy algorithms for mining dynamic data streams. Applied Soft Computing, 8(4), Ikonomovska, E., Gama, J., Sebastião, R., & Gjorgjevik, D. (2009). Regression Trees from Data Streams with Drift Detection. Discovery Science - Lecture Notes in Computer Science (pp ). Springer Berlin / Heidelberg. Hulten, G., Spencer, L., & Domingos, P. (2001). Mining time-changing data streams. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '01). (pp ). New York: ACM. Bifet, A., & Gavaldà, R. (2009). Adaptive Learning from Evolving Data Streams. In N. Adams, C. Robardet, A. Siebes, & J.-F. Boulicaut (Eds.), Advances in Intelligent Data Analysis VIII / Lecture Notes in Computer Science (Vol. 5772, pp ). Berlin / Heidelberg: Springer. Mimran, O. & Even, A. (2014). Data Stream Mining With Multiple Sliding Windows For Continuous Prediction. Proceedings of the European Conference on Information Systems (ECIS). AISeL. Challenges in Modern Data Centers Management, Spring

47 Thank You Challenges in Modern Data Centers Management, Spring

48 Backup Challenges in Modern Data Centers Management, Spring

49 MSW feature selection Data Dictionary Selection (DDS) criterion: DDS = j=1 Normalized DDS criterion: J σ j 2 N j /N V 0 = σ 2 ; P 0 = 1; V i = P J j=1 σ j 2 N j /N ; DDS i = V i V i 1 N - The total number of observations J - The number of profiles considered σ j - The standard deviation of profile j N j - The number of observations in profile j P - The number of profiles generated (i.e. distinct value combinations) P i P i 1 i - The step number N - The total number of observations J - The number of profiles considered σ j - The standard deviation of profile j N j - The number of observations in profile j P i - The number of profiles in step i Normalized DDS criterion with minimum support α: V 0 = σ 2 ; P 0 = 1; V i = n j=1 N j α σ 2 j N j /N otherwise 0 DDS i = V i V i 1 P i P i 1 n j=1 N j α N j otherwise 0 /N Challenges in Modern Data Centers Management, Spring

50 DSM algorithms: MSW Multiple Sliding Windows (MSW) (Mimran & Even, 2014) MSW strategy: Find a variable set, which divides the data into minimal set of profiles (clusters) with minimal variance (done once) Set a sliding window per profile Use any given prediction function within the windows Set the window size dynamically, using change detector Challenges in Modern Data Centers Management, Spring

Data Mining on Streams

Data Mining on Streams Data Mining on Streams Using Decision Trees CS 536: Machine Learning Instructor: Michael Littman TA: Yihua Wu Outline Introduction to data streams Overview of traditional DT learning ALG DT learning ALGs

More information

Adaptive Classification Algorithm for Concept Drifting Electricity Pricing Data Streams

Adaptive Classification Algorithm for Concept Drifting Electricity Pricing Data Streams Adaptive Classification Algorithm for Concept Drifting Electricity Pricing Data Streams Pramod D. Patil Research Scholar Department of Computer Engineering College of Engg. Pune, University of Pune Parag

More information

Using One-Versus-All classification ensembles to support modeling decisions in data stream mining

Using One-Versus-All classification ensembles to support modeling decisions in data stream mining Using One-Versus-All classification ensembles to support modeling decisions in data stream mining Patricia E.N. Lutu Department of Computer Science, University of Pretoria, South Africa Patricia.Lutu@up.ac.za

More information

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream

More information

Data Mining & Data Stream Mining Open Source Tools

Data Mining & Data Stream Mining Open Source Tools Data Mining & Data Stream Mining Open Source Tools Darshana Parikh, Priyanka Tirkha Student M.Tech, Dept. of CSE, Sri Balaji College Of Engg. & Tech, Jaipur, Rajasthan, India Assistant Professor, Dept.

More information

An Adaptive Regression Tree for Non-stationary Data Streams

An Adaptive Regression Tree for Non-stationary Data Streams An Adaptive Regression Tree for Non-stationary Data Streams ABSTRACT Data streams are endless flow of data produced in high speed, large size and usually non-stationary environments. These characteristics

More information

Evaluating Algorithms that Learn from Data Streams

Evaluating Algorithms that Learn from Data Streams João Gama LIAAD-INESC Porto, Portugal Pedro Pereira Rodrigues LIAAD-INESC Porto & Faculty of Sciences, University of Porto, Portugal Gladys Castillo University Aveiro, Portugal jgama@liaad.up.pt pprodrigues@fc.up.pt

More information

Proposal of Credit Card Fraudulent Use Detection by Online-type Decision Tree Construction and Verification of Generality

Proposal of Credit Card Fraudulent Use Detection by Online-type Decision Tree Construction and Verification of Generality Proposal of Credit Card Fraudulent Use Detection by Online-type Decision Tree Construction and Verification of Generality Tatsuya Minegishi 1, Ayahiko Niimi 2 Graduate chool of ystems Information cience,

More information

A Data Generator for Multi-Stream Data

A Data Generator for Multi-Stream Data A Data Generator for Multi-Stream Data Zaigham Faraz Siddiqui, Myra Spiliopoulou, Panagiotis Symeonidis, and Eleftherios Tiakas University of Magdeburg ; University of Thessaloniki. [siddiqui,myra]@iti.cs.uni-magdeburg.de;

More information

Performance and efficacy simulations of the mlpack Hoeffding tree

Performance and efficacy simulations of the mlpack Hoeffding tree Performance and efficacy simulations of the mlpack Hoeffding tree Ryan R. Curtin and Jugal Parikh November 24, 2015 1 Introduction The Hoeffding tree (or streaming decision tree ) is a decision tree induction

More information

How To Classify Data Stream Mining

How To Classify Data Stream Mining JOURNAL OF COMPUTERS, VOL. 8, NO. 11, NOVEMBER 2013 2873 A Semi-supervised Ensemble Approach for Mining Data Streams Jing Liu 1,2, Guo-sheng Xu 1,2, Da Xiao 1,2, Li-ze Gu 1,2, Xin-xin Niu 1,2 1.Information

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

More information

Efficient Decision Tree Construction for Mining Time-Varying Data Streams

Efficient Decision Tree Construction for Mining Time-Varying Data Streams Efficient Decision Tree Construction for Mining Time-Varying Data Streams ingying Tao and M. Tamer Özsu University of Waterloo Waterloo, Ontario, Canada {y3tao, tozsu}@cs.uwaterloo.ca Abstract Mining streaming

More information

Massive Online Analysis Manual

Massive Online Analysis Manual Massive Online Analysis Manual Albert Bifet and Richard Kirkby August 2009 Contents 1 Introduction 1 1.1 Data streams Evaluation..................... 2 2 Installation 5 3 Using the GUI 7 4 Using the command

More information

A Simple Unlearning Framework for Online Learning under Concept Drifts

A Simple Unlearning Framework for Online Learning under Concept Drifts A Simple Unlearning Framework for Online Learning under Concept Drifts Sheng-Chi You and Hsuan-Tien Lin Department of Computer Science and Information Engineering, National Taiwan University, No.1, Sec.

More information

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Lecture 10: Regression Trees

Lecture 10: Regression Trees Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

More information

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

A Novel Cloud Based Elastic Framework for Big Data Preprocessing School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview

More information

SAP HANA In-Memory Database Sizing Guideline

SAP HANA In-Memory Database Sizing Guideline SAP HANA In-Memory Database Sizing Guideline Version 1.4 August 2013 2 DISCLAIMER Sizing recommendations apply for certified hardware only. Please contact hardware vendor for suitable hardware configuration.

More information

ONLINE learning has received growing attention in

ONLINE learning has received growing attention in Concept Drift Detection for Online Class Imbalance Learning Shuo Wang, Leandro L. Minku, Davide Ghezzi, Daniele Caltabiano, Peter Tino and Xin Yao Abstract detection methods are crucial components of many

More information

Classification and Prediction

Classification and Prediction Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

More information

How To Classify Data Stream Data From Concept Drift To Novel Class

How To Classify Data Stream Data From Concept Drift To Novel Class A Comparative study of Data stream classification using Decision tree and Novel class Detection Techniques 1 Mistry Vinay R, 2 Ms. Astha Baxi 1 M.E. Computer Science 1 Parul Institute of Technology of

More information

Data mining techniques: decision trees

Data mining techniques: decision trees Data mining techniques: decision trees 1/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 2/39 1 Agenda Rule systems Building rule systems vs rule systems Quick reference 3/39

More information

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques. International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant

More information

An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus

An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus Tadashi Ogino* Okinawa National College of Technology, Okinawa, Japan. * Corresponding author. Email: ogino@okinawa-ct.ac.jp

More information

Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye team @ MSRC

Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye team @ MSRC Machine Learning for Medical Image Analysis A. Criminisi & the InnerEye team @ MSRC Medical image analysis the goal Automatic, semantic analysis and quantification of what observed in medical scans Brain

More information

Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

More information

So, how do you pronounce. Jilles Vreeken. Okay, now we can talk. So, what kind of data? binary. * multi-relational

So, how do you pronounce. Jilles Vreeken. Okay, now we can talk. So, what kind of data? binary. * multi-relational Simply Mining Data Jilles Vreeken So, how do you pronounce Exploratory Data Analysis Jilles Vreeken Jilles Yill less Vreeken Fray can 17 August 2015 Okay, now we can talk. 17 August 2015 The goal So, what

More information

Challenges in Modern Data- Centers Management

Challenges in Modern Data- Centers Management Challenges in Modern Data- Centers Management Edi Shmueli, Spring 2015 Challenges in Modern Data Centers Management, Spring 2015 1 Information provided in these slides is for educational purposes only

More information

Prerequisites. Course Outline

Prerequisites. Course Outline MS-55040: Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot Description This three-day instructor-led course will introduce the students to the concepts of data mining,

More information

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore. CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Monday Morning Data Mining

Monday Morning Data Mining Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse Outline: - data mining - IceCube - Data mining in IceCube Computer Scientists are different... Fakultät Physik Fakultät Physik

More information

Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014

Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014 Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview

More information

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD 72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD Paulo Gottgtroy Auckland University of Technology Paulo.gottgtroy@aut.ac.nz Abstract This paper is

More information

HUAWEI Advanced Data Science with Spark Streaming. Albert Bifet (@abifet)

HUAWEI Advanced Data Science with Spark Streaming. Albert Bifet (@abifet) HUAWEI Advanced Data Science with Spark Streaming Albert Bifet (@abifet) Huawei Noah s Ark Lab Focus Intelligent Mobile Devices Data Mining & Artificial Intelligence Intelligent Telecommunication Networks

More information

EFFICIENT SCHEDULING STRATEGY USING COMMUNICATION AWARE SCHEDULING FOR PARALLEL JOBS IN CLUSTERS

EFFICIENT SCHEDULING STRATEGY USING COMMUNICATION AWARE SCHEDULING FOR PARALLEL JOBS IN CLUSTERS EFFICIENT SCHEDULING STRATEGY USING COMMUNICATION AWARE SCHEDULING FOR PARALLEL JOBS IN CLUSTERS A.Neela madheswari 1 and R.S.D.Wahida Banu 2 1 Department of Information Technology, KMEA Engineering College,

More information

Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing

Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing www.ijcsi.org 227 Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing Dhuha Basheer Abdullah 1, Zeena Abdulgafar Thanoon 2, 1 Computer Science Department, Mosul University,

More information

Mining Concept-Drifting Data Streams

Mining Concept-Drifting Data Streams Mining Concept-Drifting Data Streams Haixun Wang IBM T. J. Watson Research Center haixun@us.ibm.com August 19, 2004 Abstract Knowledge discovery from infinite data streams is an important and difficult

More information

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three

More information

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML www.bsc.es A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML Josep Ll. Berral, Nicolas Poggi, David Carrera Workshop on Big Data Benchmarks Toronto, Canada 2015 1 Context ALOJA: framework

More information

Standardization and Its Effects on K-Means Clustering Algorithm

Standardization and Its Effects on K-Means Clustering Algorithm Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Decision-Tree Learning

Decision-Tree Learning Decision-Tree Learning Introduction ID3 Attribute selection Entropy, Information, Information Gain Gain Ratio C4.5 Decision Trees TDIDT: Top-Down Induction of Decision Trees Numeric Values Missing Values

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

A comparative study of data mining (DM) and massive data mining (MDM)

A comparative study of data mining (DM) and massive data mining (MDM) A comparative study of data mining (DM) and massive data mining (MDM) Prof. Dr. P K Srimani Former Chairman, Dept. of Computer Science and Maths, Bangalore University, Director, R & D, B.U., Bangalore,

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

The basic data mining algorithms introduced may be enhanced in a number of ways.

The basic data mining algorithms introduced may be enhanced in a number of ways. DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,

More information

Machine Learning Capacity and Performance Analysis and R

Machine Learning Capacity and Performance Analysis and R Machine Learning and R May 3, 11 30 25 15 10 5 25 15 10 5 30 25 15 10 5 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 100 80 60 40 100 80 60 40 100 80 60 40 30 25 15 10 5 25 15 10

More information

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Big Data Mining Services and Knowledge Discovery Applications on Clouds Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy talia@dimes.unical.it Data Availability or Data Deluge? Some decades

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

ETL PROCESS IN DATA WAREHOUSE

ETL PROCESS IN DATA WAREHOUSE ETL PROCESS IN DATA WAREHOUSE OUTLINE ETL : Extraction, Transformation, Loading Capture/Extract Scrub or data cleansing Transform Load and Index ETL OVERVIEW Extraction Transformation Loading ETL ETL is

More information

INCREMENTAL AGGREGATION MODEL FOR DATA STREAM CLASSIFICATION

INCREMENTAL AGGREGATION MODEL FOR DATA STREAM CLASSIFICATION INCREMENTAL AGGREGATION MODEL FOR DATA STREAM CLASSIFICATION S. Jayanthi 1 and B. Karthikeyan 2 1 Department of Computer Science and Engineering, Karpagam University, Coimbatore, India 2 Dhanalakshmi Srinivsan

More information

Philosophies and Advances in Scaling Mining Algorithms to Large Databases

Philosophies and Advances in Scaling Mining Algorithms to Large Databases Philosophies and Advances in Scaling Mining Algorithms to Large Databases Paul Bradley Apollo Data Technologies paul@apollodatatech.com Raghu Ramakrishnan UW-Madison raghu@cs.wisc.edu Johannes Gehrke Cornell

More information

Adaptive Model Rules from Data Streams

Adaptive Model Rules from Data Streams Adaptive Model Rules from Data Streams Ezilda Almeida 1, Carlos Ferreira 1, and João Gama 1,2 1 LIAAD-INESC TEC, University of Porto ezildacv@gmail.com, cgf@isep.ipp.pt 2 Faculty of Economics, University

More information

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil brunorocha_33@hotmail.com 2 Network Engineering

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Data Mining for Knowledge Management. Classification

Data Mining for Knowledge Management. Classification 1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

More information

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Data Preprocessing. Week 2

Data Preprocessing. Week 2 Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

More information

Load Balancing to Save Energy in Cloud Computing

Load Balancing to Save Energy in Cloud Computing presented at the Energy Efficient Systems Workshop at ICT4S, Stockholm, Aug. 2014 Load Balancing to Save Energy in Cloud Computing Theodore Pertsas University of Manchester United Kingdom tpertsas@gmail.com

More information

Professor Anita Wasilewska. Classification Lecture Notes

Professor Anita Wasilewska. Classification Lecture Notes Professor Anita Wasilewska Classification Lecture Notes Classification (Data Mining Book Chapters 5 and 7) PART ONE: Supervised learning and Classification Data format: training and test data Concept,

More information

CSE 326: Data Structures B-Trees and B+ Trees

CSE 326: Data Structures B-Trees and B+ Trees Announcements (4//08) CSE 26: Data Structures B-Trees and B+ Trees Brian Curless Spring 2008 Midterm on Friday Special office hour: 4:-5: Thursday in Jaech Gallery (6 th floor of CSE building) This is

More information

Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

More information

Data Mining with Weka

Data Mining with Weka Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Data Mining with Weka a practical course on how to

More information

Scalable Machine Learning - or what to do with all that Big Data infrastructure

Scalable Machine Learning - or what to do with all that Big Data infrastructure - or what to do with all that Big Data infrastructure TU Berlin blog.mikiobraun.de Strata+Hadoop World London, 2015 1 Complex Data Analysis at Scale Click-through prediction Personalized Spam Detection

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

D A T A M I N I N G C L A S S I F I C A T I O N

D A T A M I N I N G C L A S S I F I C A T I O N D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.

More information

Binary Coded Web Access Pattern Tree in Education Domain

Binary Coded Web Access Pattern Tree in Education Domain Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: kc.gomathi@gmail.com M. Moorthi

More information

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Please note the following IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

Informationsaustausch für Nutzer des Aachener HPC Clusters

Informationsaustausch für Nutzer des Aachener HPC Clusters Informationsaustausch für Nutzer des Aachener HPC Clusters Paul Kapinos, Marcus Wagner - 21.05.2015 Informationsaustausch für Nutzer des Aachener HPC Clusters Agenda (The RWTH Compute cluster) Project-based

More information

Random forest algorithm in big data environment

Random forest algorithm in big data environment Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest

More information

Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I

Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Data is Important because it: Helps in Corporate Aims Basis of Business Decisions Engineering Decisions Energy

More information

Web Document Clustering

Web Document Clustering Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

More information

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group HiBench Introduction Carson Wang (carson.wang@intel.com) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

A Review of Online Decision Tree Learning Algorithms

A Review of Online Decision Tree Learning Algorithms A Review of Online Decision Tree Learning Algorithms Johns Hopkins University Department of Computer Science Corbin Rosset June 17, 2015 Abstract This paper summarizes the most impactful literature of

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Load Balancing. Load Balancing 1 / 24

Load Balancing. Load Balancing 1 / 24 Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait

More information

Fingerprinting the datacenter: automated classification of performance crises

Fingerprinting the datacenter: automated classification of performance crises Fingerprinting the datacenter: automated classification of performance crises Peter Bodík 1,3, Moises Goldszmidt 3, Armando Fox 1, Dawn Woodard 4, Hans Andersen 2 1 RAD Lab, UC Berkeley 2 Microsoft 3 Research

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

Oracle Database 11g Comparison Chart

Oracle Database 11g Comparison Chart Key Feature Summary Express 10g Standard One Standard Enterprise Maximum 1 CPU 2 Sockets 4 Sockets No Limit RAM 1GB OS Max OS Max OS Max Database Size 4GB No Limit No Limit No Limit Windows Linux Unix

More information

Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

More information

Why Ensembles Win Data Mining Competitions

Why Ensembles Win Data Mining Competitions Why Ensembles Win Data Mining Competitions A Predictive Analytics Center of Excellence (PACE) Tech Talk November 14, 2012 Dean Abbott Abbott Analytics, Inc. Blog: http://abbottanalytics.blogspot.com URL:

More information

Australian Journal of Basic and Applied Sciences. Big Data Streaming Using Adaptive Machine Learning And Mining Algorithms

Australian Journal of Basic and Applied Sciences. Big Data Streaming Using Adaptive Machine Learning And Mining Algorithms AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Big Data Streaming Using Adaptive Machine Learning And Mining Algorithms 1 Samson Immanuel

More information

Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product

Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago sagarikaprusty@gmail.com Keywords:

More information

Predicting earning potential on Adult Dataset

Predicting earning potential on Adult Dataset MSc in Computing, Business Intelligence and Data Mining stream. Business Intelligence and Data Mining Applications Project Report. Predicting earning potential on Adult Dataset Submitted by: xxxxxxx Supervisor:

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

Predicting borrowers chance of defaulting on credit loans

Predicting borrowers chance of defaulting on credit loans Predicting borrowers chance of defaulting on credit loans Junjie Liang (junjie87@stanford.edu) Abstract Credit score prediction is of great interests to banks as the outcome of the prediction algorithm

More information

Newsletter 4/2013 Oktober 2013. www.soug.ch

Newsletter 4/2013 Oktober 2013. www.soug.ch SWISS ORACLE US ER GRO UP www.soug.ch Newsletter 4/2013 Oktober 2013 Oracle 12c Consolidation Planer Data Redaction & Transparent Sensitive Data Protection Oracle Forms Migration Oracle 12c IDENTITY table

More information

PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce. Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo.

PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce. Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo. PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo. VLDB 2009 CS 422 Decision Trees: Main Components Find Best Split Choose split

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information