Intelligent Heuristic Construction with Active Learning William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather E H U N I V E R S I T Y T O H F G R E D I N B U
Space is BIG! Hubble Ultra-Deep Field Tiny region of space shown Despite this, many galaxies Each galaxy, billions of stars Relevance to heuristics?
Optimisation spaces are MUCH BIGGER!!! We can t pick from 10 400 Rough heuristics instead 10 82 Atoms in the Universe Traditionally hard-coded Can take a year to perfect As if that wasn't bad enough 10 400 Combinations of GCC Optimisations
the problem is even worse than that! Each architectural change requires heuristics to be re-tuned Heuristics are inherently tied to the underlying hardware Most compilers support many different platforms Very difficult to keep up and getting harder We already have out of date compilers
Machine Learning to the rescue? Leverage machine learning techniques to create heuristics Well suited to the problem Lots of interesting research Can be better than Humans But, it s also incredibly slow to learn We demonstrate how it s possible to accelerate training Create a heuristic which maps workload to processor
feature values Quick Detour: Machine Learning 101 Classification involves forming a correlation between the features of an object and its label examples Machine Learning Algorithm Model best heuristic value
Training a Heuristic thousands of examples input value 2 input value 1
Training a Heuristic thousands of examples Machine Learning Algorithm input value 2 input value 1
Training a Heuristic thousands of examples Machine Learning Algorithm input value 2 GPU CPU mathematical model input value 1
Using a Heuristic unseen features Mathematical Model input value 2 GPU CPU predicted processor input value 1
So what s wrong with this? feature 2 feature 1 Traditional approach almost universally adopted
Well, we actually only needed these! feature 2 feature 1
So this was a complete waste of time! feature 2 feature 1 Random sampling inevitably leads to redundancy
How much time was wasted? Correctness of labels are tied to heuristic quality I.e. consistently wrong labels leads to wrong model Sound data is essential, but very expensive E.g. are inputs X, Y, Z faster on CPU or GPU? 1. Run program on CPU using X, Y, Z 2. Run program on GPU using X, Y, Z 3. GOTO 1 until statistical difference observed
Compile-time Heuristics are Even Slower Labelling one single example requires iterative compilation compile code using different optimisation values repeated profiling to make statistically sound determination only then, associate best optimisation with code features.exe.c.exe best optimisation wins.exe
What do we do about it? We cannot know where the informative examples lie But, we can let the algorithm make an educated guess You and I do not learn in a random, unstructured way We build up our knowledge gradually and iteratively Perhaps, let the algorithm do the same?
Active Supervised Learning passive (random) thousands of random examples Machine Learning Algorithm final model
Active Supervised Learning passive (random) active (iterative) few random examples thousands of random examples ML Algorithm intermediate model Machine Learning Algorithm final model
Active Supervised Learning passive (random) active (iterative) few random examples thousands of random examples Machine Learning Algorithm ML Algorithm intermediate model completion reached? no carefully select an example final model yes final model
How do we know when it s complete? few random examples Many criteria, including time elapsed loop iterations ML Algorithm intermediate model carefully select an example cross-validation completion reached? no yes final model
What about selecting examples? few random examples Many algorithms available Used Query by Committee Easier to show than to tell ML Algorithm intermediate model carefully select an example completion reached? no yes final model
We start with a few random examples feature 2 feature 1
We form multiple intermediate models feature 2 feature 1
Each with a distinct algorithm feature 2 feature 1
A committee of different models feature 2 feature 1
Here the committee disagrees, but we use this to our advantage feature 2 feature 1 Disagreement regions hold the greatest potential to improve the collective knowledge learn from there!
So what example do we learn from next? feature 2 feature 1 We ask each model to predict the label of random unseen examples drawn from the feature space
Broadly the Committee will agree feature 2 feature 1
but we re interested in disagreement! feature 2 feature 1 Disagreement inevitably occurs around class boundaries
We select one of these examples to label properly feature 2 feature 1
Then rebuild the intermediate models feature 2 feature 1 Notice the region of disagreement has shrunk Eventually the distinct models will converge
Experimental Setup Demonstrate technique by creating an important heuristic Map workload to fastest device CPU or GPU Much studied problem, choosing poorly can drastically degrade performance Specifically, given inputs for Rodinia HotSpot, PathFinder, SRAD and Matrix Multiplication is it faster to use OpenMP (CPU) or OpenCL (GPU)? Compared number of training examples required to get high accuracy heuristic using passive versus active learning
A few gory details most in the paper Measured accuracy of randomly-trained vs. QBCtrained classifier using 500 test examples Intel Core i7 7770 @ 3.4GHz (8 HW Threads) NVIDIA Geforce GTX Titan (6GB) 12 distinct committee members 1 random example to begin 10,000 candidate examples 200 loop iterations
Random Training Examples 120 CPU GPU Sample Points Program Input Parameter 100 80 60 40 20 0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 Program Input Parameter
QBC Chosen Training Examples 120 CPU GPU Sample Points Program Input Parameter 100 80 60 40 20 0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 Program Input Parameter Same accuracy but quicker
Lights, Camera, Action... Region of Disagreement over time Shape of Model over time Shows ib1 algorithm refining a HotSpot model over time, using training examples chosen by a committee
It works 3x faster on average!
Summary Desperately need fast, reliable method to generate heuristics Current implementations rely on learning randomly Randomness is problematic because of labelling costs We show active learning is much more efficient 3x faster at creating heuristics to map program inputs to best processor in a heterogeneous system