A Practical Method for Estimating Performance Degradation on Multicore Processors, and its Application to HPC Workloads.

A Practical Method for Estimating Performance Degradation on Multicore Processors, and its Application to HPC Workloads Tyler Dwyer, Alexandra Fedorova, Sergey Blagodurov, Mark Roth, Fabien Gaud, Jian Pei 2012 Sameer Wadgaonkar Department of Computer & Information Sciences University of Delaware

Motivation Performance Degradation :- When Multiple programs are run on a modern multicore processor they compete for shared resources. Performance degradation is how much slower does each program run as compared to when run individually on the same system. Performance Degradation can be measured as high as 200%. This causes loss in time and power consumption.

Introduction In this paper the Authors have proposed: 1. A methodology for modeling performance degradation on multicore systems using machine learning. 2. Evaluating strengths and limitations of the resulting model. 3. Designing a confidence predictor that signals when the model is unable to produce an accurate estimate. 4. Demonstrating how the model can be applied to improve performance fidelity to save energy in HPC setting.

Model Testing Platforms: Two systems were used for building and testing the model, Intel and AMD. The models were built using exactly the same procedure on both the systems. The System parameters were as follows

Model Decision tree Learning was used to develop the model. The Nodes of the Decision tree are the attributes with their individual threshold values. Performance Degradation was calculated using the following formula The authors perform the above operation on all instances in the dataset. After the above procedure the authors had 340 attributes per core = 1360 attributes from the event counters.(intel) Weka, a machine learning tool was used for attribute selection. Correlation based feature subset attribute selection(cfssubset) was used within Weka. After attribute selection the number of attributes were reduced to 19 per core from the original 340 attributes per core.(intel)

Model List of Attributes selected after applying attribute selection for the Intel System.

Model The authors have used all modeling procedures available on Weka and compared each of them. After evaluation of all the models present in Weka the Authors choose REPTree as it yielded the highest accuracy. Regression tree mode was used instead of classification tree. The authors also used bagging to lower the error rate further.

Model Root of the Decision tree for Intel System. The number under the attribute is the value used for branching

Results Difference between the actual and predicted degradation for Best, Median and Worst Predicted co-schedules for each primary benchmark. The right most chart shows coscheduled when we apply the confidence predictor.

Results Baseline Cluster Scheduling Policies: Best-fit and Min-collocation. Best-fit allocates the process of the same job on all available cores on the node using additional nodes if needed, but if single job does not fill the cores, it fills them with processes from another job. Min-collocation attempts to schedule no more than one job per node, as long as there are unused nodes available. The Balanced Scheduler is based on the model described above. Job allocation across Nodes

Results Performance and energy consumption Experiment 1: Improved Performance Fidelity

Results Performance and energy consumption Experiment 2: Improved Power Efficiency

Conclusion The study was aimed to investigate the effectiveness of machine learning in modeling contention-induced performance degradation. The proposed model could be run on live workload without a prior knowledge of the applications or the need to run them in isolation. The model accurately estimates degradation within 16% of its true value. The confidence predictor will successfully estimate when the model is likely to produce an inaccurate estimate and reduce the maximum error.

Questions?