Predictive Analytics Omer Mimran, Spring 2015 Challenges in Modern Data Centers Management, Spring 2015 1
Information provided in these slides is for educational purposes only Challenges in Modern Data Centers Management, Spring 2015 2
Agenda Motivation Predicting the jobs resource requirements Background and challenges Predictive analytics, data-stream mining (DSM) System overview DSM algorithms Regression tree, Hoeffding tree, Multiple sliding windows (MSW) Summary & conclusions Challenges in Modern Data Centers Management, Spring 2015 3
Motivation Challenges in Modern Data Centers Management, Spring 2015 4
Reminder: RM lectures I III Each job comes with resource requirements e.g., 2-cores X 8GB Specified by the user submitting the job, based on his experience, etc. Scheduler picks the job (RM-I) and matches it with a server (RM-II) Best fit, worst fit, etc. Challenges in Modern Data Centers Management, Spring 2015 5
Why we need predictive analytics? What if the jobs (users) request too many resources? 8GB while in practice the job only uses 4GB of memory? Very common problem resulting in huge waste of resources ($$ loss) Even if resource matching was done optimally (RM-II lecture) Our goal (predictive analytics) Provide prediction for the actual resource usage of the jobs (focusing on memory) Forward this information to the scheduler to do the matching More jobs fit in higher throughput $$ saving Challenges in Modern Data Centers Management, Spring 2015 6
Background and challenges Challenges in Modern Data Centers Management, Spring 2015 7
Predictive analytics Predictive analytics: encompasses a variety of techniques from statistics, modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future, or otherwise unknown, events. (Nyce, Charles, 2007) Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed. (Arthur Samuel, 1959) An Introduction to Data Mining/Machine Learning General methodology (CRISP-DM): 1. Divide the data into 3 sets (training, testing, validation) 2. Use training set to create models and testing set to measure performance 3. Use validation set to select best model & test model generalization 4. Use the model for prediction Challenges in Modern Data Centers Management, Spring 2015 8
Data-stream mining (DSM) Data-stream: continuous (endless) and rapid incoming data Idea: apply machine-learning techniques on-line, on the data stream Key challenges: 1. Performance: infeasible to store/train all data, each sample is processed once 2. Quality: expected to perform at least as well as no-stream models 3. Adaptability: non-stationary stream, the underlying model must be altered accordingly 4. Availability: must be available for prediction at all times (Bifet, et al., 2010; Domingos & Hulten, 2001; Aggarwal, 2007; Gama & Rodrigues, 2007; Gaber, et al., 2005; Babcock, et al., 2002) Challenges in Modern Data Centers Management, Spring 2015 9
Adaptivity challenge: concept drift Concept drift: scenarios in which the distribution of a certain population changes over time; hence, statistical inference is affected (Kelly et al., 1999) Concept-drift types: 1. Sudden: easier to detect, with fewer examples 2. Gradual: harder to detect, often mistaken for random noise 3. Incremental: occur over long period of time 4. Recurring contexts: appear in a cyclic manner (Tsymbal, 2004; Gama & Castillo, 2006; Zliobaite, 2009) Possible treatments: 1. Resetting the training data (Klinkenberg, 2004; Cohen, et al., 2008) (Zliobaite, 2009) 2. Training a shadow model (Domingos & Hulten, 2000; Ikonomovska & Gama; 2008; Bifet & Gavaldà, 2009) 3. Using ensemble (Tsymbal et al., 2008; Ouyang et al., 2009) Challenges in Modern Data Centers Management, Spring 2015 10
Concept drift in reality Bursts in jobs core and memory requirements Ohad Shai, Edi Shmueli, and Dror G. Feitelson, Heuristics for resource matching in Intel's compute farm. In Job Scheduling Strategies for Parallel Processing, Walfredo Cirne and Narayan Desai, (ed.), Springer-Verlag, 2013 Challenges in Modern Data Centers Management, Spring 2015 11
Performance challenge: sliding windows Using time windows A common technique in stream mining Better performance Also addresses concept drift Time-window types 1. Landmark window: maintaining data, starting from identified relevant point 2. Tilted window: maintain all data within a window in different aggregate scales 3. Sliding window: only recent examples are stored in the window (Gama & Rodrigues, 2007) Challenges in Modern Data Centers Management, Spring 2015 12
Performance challenge: sliding windows The problem: how to set window size? Too short lower statistical validity and stability Too long slow adaptation, with negative impact on quality Example: The accuracy of protein-structure prediction, using KNN with sliding windows of varying length (Chen, Kurgan, & Ruan, 2006) Challenges in Modern Data Centers Management, Spring 2015 13
System overview 1 Challenges in Modern Data Centers Management, Spring 2015 14
System overview input from the users 1 Job characteristics User, project, priority, command-line, resource requirements, etc. Data only known at submission time Categorial variables with many possible values Challenges in Modern Data Centers Management, Spring 2015 15
System overview 1 2 Challenges in Modern Data Centers Management, Spring 2015 16
System overview output of the model 2 Prediction example If command = A and project = Tablet then memory=4gb If command = B and project = Mobile then memory=6gb If priority = 1 and user team = uncore then memory=2gb If project = ServerX then memory=16gb Challenges in Modern Data Centers Management, Spring 2015 17
System overview 3 1 2 Challenges in Modern Data Centers Management, Spring 2015 18
System overview output of the scheduler 3 Scheduler matches the jobs with machines/servers (RM-II lecture) Using the predicted values (not the original values specified by the user) More jobs fit in higher throughput $$ saving Challenges in Modern Data Centers Management, Spring 2015 19
System overview 3 1 2 4 Challenges in Modern Data Centers Management, Spring 2015 20
System overview input to the model 4 Job characteristics User, project, priority, command-line, etc. Actual resources consumed by the jobs e.g., memory Challenges in Modern Data Centers Management, Spring 2015 21
Performance measurements & objective Measurements calculated per job once completed & available in DB Calculating actual runtime/memory consumption vs. prediction Objective: maximum savings + minimum of 95% accuracy i.e., minimize resource waste, while ensuring that 95% of the jobs will not be under-estimated (otherwise they might be killed by the scheduler) Measurements (calculated for all jobs which got memory prediction): Accuracy = number of jobs with memory consumed<memory prediction number of jobs Saving = job runtime memory prediction memory requested by user Challenges in Modern Data Centers Management, Spring 2015 22
DSM algorithms Challenges in Modern Data Centers Management, Spring 2015 23
Challenge Data available for learning 1. Jobs characteristics: user, project, command-line, etc. 2. Actual resources consumed by the jobs e.g., memory Output Predict resource consumption for future incoming jobs Challenges in Modern Data Centers Management, Spring 2015 24
DSM algorithms: regression tree idea Priority=1 NumOfLoops=2 Fast Incremental Regression Tree with Drift Detection (FIRT-DD) (Elena et al., 2009) Priority <= 5 Priority > 5 3 NumOfLoops = 0 NumOfLoops > 0 2 4 New Job Memory Prediction = 4GB Challenges in Modern Data Centers Management, Spring 2015 25
DSM algorithms: regression tree steps 1. Construct a tree using Chernoff bound comparing standard deviation reduction (SDR) of all possible values as split criteria Split node using Priority value 5 All candidate variables values are tested, Priority value 5 found best reducing STDEV Priority<= 5 Priority > 5 Challenges in Modern Data Centers Management, Spring 2015 26
DSM algorithms: regression tree steps 2. Sliding window size is a pre- defined parameter Sliding window side = 5 Jobs 1,4,1,5,2 Priority <= 5 Priority > 5 new job value 3 added Job value 1 discarded Jobs 4,1,5,2,3 NumOfLoops = 0 NumOfLoops <> 0 Re-calculate prediction Median=3 2 Challenges in Modern Data Centers Management, Spring 2015 27
DSM algorithms: regression tree steps 3. Adaptivity Track error rate using statistical PH test Grow a shadow sub-tree and replace once accuracy is better Priority <= 5 Compare Error Rate High Error Priority<= 5 Priority > 5 CommandNum <= 10 CommandNum >10 NumOfLoops = 0 NumOfLoops > 0 Challenges in Modern Data Centers Management, Spring 2015 28
DSM algorithms: Hoeffding tree idea Project=A CommandType=X Hoeffding Adaptive Tree (HAT) (Bifet et al., 2009) Project = A Project = B fail Command Type = Y Command Type = X pass fail New job predicted to fail Challenges in Modern Data Centers Management, Spring 2015 29
Entropy & information gain Entropy(S) = - n i=1 p i log 2 p i Weather Go to the beach P(Beach = Yes) = 5/12 P(Beach = No) = 7/12 Sunny Sunny Sunny Yes Yes Yes Entropy (Beach) = Sunny No - 5/12log 2 ( 5 12 )- 7/12log 2 7 12 = 0.98 P(Weather=Sunny and Beach=Yes) = 3/4 P(Weather=Sunny and Beach=No) = 1/4 Overcast Overcast Overcast Overcast Yes Yes No No Entropy(S sunny ) = - 3/4log 2 ( 3 4 )- 1/4log 2 Entropy(S overcast ) = 1 Entropy(S rain ) = 0 1 4 =0.81 Rain Rain Rain Rain No No No No Challenges in Modern Data Centers Management, Spring 2015 30
Entropy & information gain Entropy (Beach) = 0.98 Entropy(S sunny ) = 0.81 Entropy(S overcast ) = 1 Entropy(S rain ) = 0 P(sunny) = P(overcast) = P(rain) = 4/12 Entropy (Beach Weather) = P(sunny)*Entropy(sunny) + P(overcast)* Entropy(overcast) + P(rain)*Entropy(rain) = 4/12(0.81) + 4/12(1) + 4/12(0) = 0.6 Weather Sunny Sunny Sunny Sunny Overcast Overcast Overcast Overcast By knowing the weather, how much information have I gained Rain? Rain Gain = Entropy(X) - Entropy(X Y) Rain Entropy(Beach) Entropy(Beach Weather) = 0.98 0.6 Rain = 0.38 Go to the beach Yes Yes Yes No Yes Yes No No No No No No Challenges in Modern Data Centers Management, Spring 2015 31
DSM algorithms: Hoeffding tree steps 1. Construct a tree using information gain as split criteria and Hoeffding bound statistical test as a stopping condition Split node using Project variable Information Gain calculated for all candidate variables if G(Best Attr.) G(2nd best)> ε* Split leaf on best attribute Project = A Project = B * ε = Hoeffding bound statistic Challenges in Modern Data Centers Management, Spring 2015 32
DSM algorithms: Hoeffding tree steps 2. Sliding window size is dynamic (discussed later...) Sliding window side = 5 Jobs +,+,+,-,- Project = A Project = B new job - added Job + discarded Jobs +,+,-,-,- Command Type = Y Command Type = X Re-calculate prediction fail pass Challenges in Modern Data Centers Management, Spring 2015 33
DSM algorithms: Hoeffding tree steps 3. Adaptivity A. Window size change similar to MSW (discussed later...) B. Alternate tree: After a concept drift in the data stream, followed by a stable period, a new alternate tree is generated Track error rate on new concept If new tree error is smaller for time T replace trees Challenges in Modern Data Centers Management, Spring 2015 34
DSM algorithms: MSW idea Project=A CommandType=X Multiple Sliding Windows (MSW) (Mimran & Even, 2014) Project = A Project = B Command Type = Y Command Type = X Command Type = Y Command Type = X 2GB 4GB 4GB 10GB New job memory prediction is 4GB * MSW variable set is [Project],[Command Type] Challenges in Modern Data Centers Management, Spring 2015 35
DSM algorithms: MSW 1. Before training the model, find set of variables that impact the memory consumption The method used forward selection minimizing variance and number of profiles: Candidate Variables A, B, C, D, E, F, G A, B, C, E, F, G B, C, E, F, G B, C, E, F C, E, F Variable Rank 1,2,3,4,3,2,1 6,5,4,3,2,1 4,6,8,10,20 3,1,1,2-1, -2, 0 Selected Variable Set D D, A D, A, G D, A, G, B D, A, G, B Challenges in Modern Data Centers Management, Spring 2015 36
DSM algorithms: MSW Variable set selection illustration Challenges in Modern Data Centers Management, Spring 2015 37
DSM algorithms: MSW 2. Set a sliding window per profile Predict Label Challenges in Modern Data Centers Management, Spring 2015 38
DSM algorithms: MSW 3. Use any given prediction function within the window Objective: Chosen strategy: maximum saving + minimum 95% accuracy linear prediction function (φ = 0.95, C=0.1) Challenges in Modern Data Centers Management, Spring 2015 39
DSM algorithms: MSW 4. Set the window size dynamically, using change detector Example for concept drift management of window with 850 jobs. Sub-window size parameter is 200 and confidence levels are: 97.5%, 95%, 90%, 90% Division to sub-windows 1st change detection comparison, 97.5% confidence level 2nd change detection comparison, 95% confidence level 3rd change detection comparison, 90% confidence level 200 200 200 250 200 650 400 450 600 250 Flow in case of 2 nd comparison being statistically significant 1st comparison is not statistically significant go to next sub-windows 2nd comparison is statistically significant prune window New sliding window 200 650 Older observations 400 450 400 Older observations Challenges in Modern Data Centers Management, Spring 2015 40
DSM algorithms: MSW 4. Set the window size dynamically, using change detector Change detector function: Hoeffding bound The Hoeffding bound (Hoeffding, 1963), also known as additive Chernoff bound R - The range of the variable (1-δ) - The statistical confidence n - The number of examples ε = R2 ln 1 δ 2n Alternate function: Kolmogorov Smirnov test Courtesy of Wikipedia http://en.wikipedia.org/wiki/file:ks_example.png Challenges in Modern Data Centers Management, Spring 2015 41
Summary & conclusions Model Model Type Sliding Windows Window Size Adaptivity Change Detector Incremental Online Info-fuzzy Network (IOLIN) (Cohen et al., 2008) classification 1 window heuristic update network Accuracy Degregation Fast Incremental Regression Tree with Drift Detection (FIRT-DD) (Ikonomovska et al., 2009) regression multiple windows userdefined shadow model Error PH test Concept adapting Very Fast Decision Trees (CVFDT) (Hulten et al., 2001) classification 1 window userdefined shadow model Hoeffding Bound Hoeffding Adaptive Tree (HAT) (Bifet et al., 2009) classification multiple windows dynamic modify window Hoeffding Bound Multiple Sliding Windows (MSW) (Mimran & Even, 2014) Classification Regression multiple windows dynamic modify window Hoeffding Bound Challenges in Modern Data Centers Management, Spring 2015 42
MSW in production: predicting jobs memory usage Deploying the model improved throughput by 10% By allowing the scheduler to fit more jobs on available resources Challenges in Modern Data Centers Management, Spring 2015 43
Can we do the same for the jobs runtime? Jobs runtime behavior is more chaotic compared to memory Some jobs get killed upon startup, e.g., due to configuration issues Jobs sharing the same CPU create contention impacting runtime Environmental issues impact runtime, e.g., file system slowness Non-uniform server configurations having different CPU speeds Hyper-threading, etc. Conclusion: existing variables do not sufficiently reduce variance Challenges in Modern Data Centers Management, Spring 2015 44
Can we do the same for the jobs runtime? Proposed approach predict the extremes MAX (improved throughput by 5%) 0.5% of jobs consume ~10% of resources with high failure rate Predict the outliers using large windows and kill them MIN (not implemented yet) ~50% of jobs run less than 5 minutes Predict if job s run time is short for better scheduling use cases Challenges in Modern Data Centers Management, Spring 2015 45
References Cohen, L., Avrahami, G., Last, M., & Kandel, A. (2008, September). Info-fuzzy algorithms for mining dynamic data streams. Applied Soft Computing, 8(4), 1283-1294. Ikonomovska, E., Gama, J., Sebastião, R., & Gjorgjevik, D. (2009). Regression Trees from Data Streams with Drift Detection. Discovery Science - Lecture Notes in Computer Science (pp. 121-135). Springer Berlin / Heidelberg. Hulten, G., Spencer, L., & Domingos, P. (2001). Mining time-changing data streams. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '01). (pp. 97-106). New York: ACM. Bifet, A., & Gavaldà, R. (2009). Adaptive Learning from Evolving Data Streams. In N. Adams, C. Robardet, A. Siebes, & J.-F. Boulicaut (Eds.), Advances in Intelligent Data Analysis VIII / Lecture Notes in Computer Science (Vol. 5772, pp. 249-260). Berlin / Heidelberg: Springer. Mimran, O. & Even, A. (2014). Data Stream Mining With Multiple Sliding Windows For Continuous Prediction. Proceedings of the European Conference on Information Systems (ECIS). AISeL. Challenges in Modern Data Centers Management, Spring 2015 46
Thank You Challenges in Modern Data Centers Management, Spring 2015 47
Backup Challenges in Modern Data Centers Management, Spring 2015 48
MSW feature selection Data Dictionary Selection (DDS) criterion: DDS = j=1 Normalized DDS criterion: J σ j 2 N j /N V 0 = σ 2 ; P 0 = 1; V i = P J j=1 σ j 2 N j /N ; DDS i = V i V i 1 N - The total number of observations J - The number of profiles considered σ j - The standard deviation of profile j N j - The number of observations in profile j P - The number of profiles generated (i.e. distinct value combinations) P i P i 1 i - The step number N - The total number of observations J - The number of profiles considered σ j - The standard deviation of profile j N j - The number of observations in profile j P i - The number of profiles in step i Normalized DDS criterion with minimum support α: V 0 = σ 2 ; P 0 = 1; V i = n j=1 N j α σ 2 j N j /N otherwise 0 DDS i = V i V i 1 P i P i 1 n j=1 N j α N j otherwise 0 /N Challenges in Modern Data Centers Management, Spring 2015 49
DSM algorithms: MSW Multiple Sliding Windows (MSW) (Mimran & Even, 2014) MSW strategy: Find a variable set, which divides the data into minimal set of profiles (clusters) with minimal variance (done once) Set a sliding window per profile Use any given prediction function within the windows Set the window size dynamically, using change detector Challenges in Modern Data Centers Management, Spring 2015 50