A DoE/RSMbased Strategy for an Efficient Design Space Exploration targeted to CMPs


 Eleanor Whitehead
 1 years ago
 Views:
Transcription
1 A DoE/RSMbased Strategy for an Efficient Design Space Exploration targeted to CMPs Gianluca Palermo, Cristina Silvano, Vittorio Zaccaria Politecnico di Milano Dipartimento di Elettronica e Informazione {gpalermo, silvano, Abstract. Applicationspecific MPSoCs are usually designed by using a platformbased approach, where a wide range of customizable parameters must be tuned to find the best tradeoffs in terms of the selected figures of merit (such as energy, delay and area). This optimization phase is called Design Space Exploration (DSE) and it generally consists of a Multi Objective Optimization (MOO) problem with multiple constraints. In this paper, an efficient DSE methodology for applicationspecific MP SoC is presented. The methodology is efficient since it allows determining a suitable set of candidate architectures with as few system simulations as possible, combining Design of Experiments (DoEs) and Response Surface Modeling (RSM) strategies. 1 1 Introduction Customizable MPSoCs supported by parallel programming represent an emerging computing paradigm for applicationspecific processors. In fact, they represent the best compromise in terms of a stable hardware platform which is software programmable, thus customizable, upgradeable and extensible. In this sense, the MPSoC paradigm minimizes the risk of missing the timetomarket deadline while allowing for greater efficiency due to architecture customization and software compilation techniques. For these architectures, the platformbased design approach [1] is widely used to design applicationspecific architectures meeting timetomarket constraints. In this scenario, configurable simulation models are used to accurately tune the onchip architectures and to meet the target application requirements in terms of performance, battery lifetime and area. The Design Space Exploration (DSE) phase is used to tune the configurable system parameters and it generally consists of a multiobjective optimization problem. The DSE problem consists of exploring a large design space consisting of several parameters at system and microarchitectural levels. Although several heuristic techniques have been proposed to address this problem so far, they are all characterized by low efficiency to identify the Pareto front of feasible solutions. Evolutionary or sensitivity based algorithms are among the most notable, stateofthe art techniques [2 4]. 2 An application specific DSE methodology In this paper, we present an applicationspecific design space exploration strategy leveraging Design of Experiments (DoE) and Response Surface Modeling 1 This work was supported in part by the EC under grant MULTICUBE FP
2 (RSM) techniques. Once the objective functions associated to the system have been identified, the proposed methodology allows the efficient identification of an approximate Pareto sets of candidate architectures by evaluating as few system configurations as possible. This is a notable achievement, since, nowadays, evaluating the objective function f(x) of a single system configuration x (being it either performance or power consumption) means hours or days of simulations under a realistic workload for complex SoCs. DESIGN OF EXPERIMENTS. The term Design of Experiments (DoE) [5] is used to identify the planning of an informationgathering experimentation campaign where a set of variable parameters can be tuned. In this paper, we define an experiment as an actual simulation of the target system. The reason for DoEs is that very often the designer is interested in the effects of some parameter s tuning on the system response. Design of experiments is a discipline that has very broad application across natural and social sciences and encompasses a set of techniques whose main goal is the screening and analysis of the system behavior with a small number of simulations. Each DoE plan differs in terms of the layout of the selected design points in the design space. Although several design of experiments have been proposed in the literature so far, we used here the most traditional DoEs which we will leverage in the construction of our efficient design space exploration methodology: Random. In this case, design space configurations are picked up randomly by following a Probability Density Function (PDF). In our methodology, we will use a uniformly distributed PDF. Full factorial. In statistics, a factorial experiment [5] is an experiment whose design consists of two or more parameters, each with discrete possible values or levels, and whose experimental units take on all possible combinations of these levels across all such parameters. Such an experiment allows studying the effects of each parameter on the response variable, as well as the effects of interactions between parameters on the response variable. In this paper, we consider a 2level full factorial DoE, where the only levels considered are the minimum and maximum for each parameter. Central composite design. A Central Composite Design [5] is an experimental design specifically targeted to the construction of response surfaces of the second order (quadratic) without requiring a threelevel factorial DoE. BoxBehnken. The BoxBehnken design [5] is suitable for quadratic models where parameter combinations are at the center of the edges of the process space plus a design with all the parameters at the center. The primary advantage is that the parameter combinations avoid extreme values taken at the same time (in contrast with the central composite design). RESPONSE SURFACE METHODS. Response Surface Modeling techniques allow determining an analytical dependence between several design parameters and one or more response variables. The working principle of RSM is to use a set of simulations generated by DoE in order to obtain a response model. A typical RSM flow involves a training phase, in which known data (or training set) is used to identify the RSM configuration, and a prediction phase in which the RSM is used to forecast unknown system response. RSMs are an effective tool for analytically predicting the behavior of the system platform without resorting to a system simulation; they represent the core of the presented methodology. The RSM models that used in the presented methodology are:
3 Linear regression. Linear regression is a regression method that models a linear relationship between a dependent response function f and some independent variables x i, i = 1 p plus a random term ε. In this work we apply regression by taking into account also the interaction between the parameters as well as quadratic behavior with respect to a single parameter. Shepard s interpolation. The Shepard s technique is a well known method for multivariate interpolation. This technique is also called inverse distance weighting (IDW) method because the value of the response function in unknown points is the sum of the value of the response function in known points weighted with the inverse of the distance. Artificial Neural Networks Artificial neural networks (ANNs) [6] represent a powerful and flexible method for generalized nonlinear regression. The ANN approximation function f is defined, recursively, as a function of other, linearly combined functions f i : ( ) f(x) = Θ w i f i (x) (1) The function Θ is called the activation function while the coefficients w i are called weights. Functions f i can be recursively defined as in Equation 1 in order to create a layered structure. Radial Basis Functions Radial basis functions (RBF) are a widely used interpolation/approximation model [7]. The interpolation function is built on a set of training configurations x j as follows: f(x) = i n λ j φ( x x j ) (2) j=1 where φ is a scalar distance function, λ j are the weights of the RBF and n is the number of samples in the training set. THE PROPOSED DESIGN FLOW. The proposed strategy is called Response Surfacebased Pareto Iterative Refinement (ReSPIR). It is based on the concept of iterative refinements of the approximate Pareto set by using predictions given by RSM model. The methodology is parametric in terms of DoE and RSM technique, as well as the maximum number of simulations to be run (see Algorithm 1). Initially (step 2), the DoE plan is used to pick up the set of initial configurations corresponding to the plan of simulations to be run. This step provides an initial coarse view of the target design space, by running the simulations to obtain the actual measurements f associated with F 0. In the successive steps, F 0 represents the archive containing significant information about all the architectural configurations simulated so far. At the first iteration, provided that the maxnsim value is greater than the DoE size, condition in step 5 is met and the while loop body is entered. The RSM technique (step 7) is thus trained with the current archive F 0. The response surface model generates a prediction archive R 0 which is then filtered for Pareto configurations in step 8. Successively (step 9) the simulations associated with the Pareto set R 1 are run; the result is put into the intermediate archive F 1 and the coverage with respect to F 0 is computed. The algorithm iterates until either this
4 Algorithm 1 The RSMSupported Iterative Pareto Refinement Design Space Exploration Flow Require: DOE, RSM, maxnsim 1: nsim = 0 2: Generate and run the simulations from DOE. Update nsim accordingly. Put results into F 0. 3: cov = 100% 4: F 1 = {} 5: while (cov > 0) (nsim < maxnsim) do 6: F 0 = F 0 F 1 7: Train RSM with the content of F 0 and compute a prediction R 0, x X 8: R 1 = Ψ (R 0) 9: Generate and run the simulations associated with the configurations in R 1. Update nsim accordingly. Put results into F 1. 10: cov = χ(f 1, F 0) 11: end while 12: return Ψ(F 0) by pruning nonfeasible configurations. coverage value reaches 0 or the number of simulations has reached the maximum maxnsim. In the case of reiteration, the freshly generated Pareto points in F 1 are merged with F 0 to improve the prediction accuracy of the RSM. Finally (step 12), F 0 is Pareto filtered by pruning all the nonfeasible configurations; the resulting archive is the approximate solution to our Design Space Exploration problem. 3 Validation of ReSPIR To validate the presented ReSPIR methodology, we applied it to the customization of a symmetric sharedmemory multiprocessor architecture for the execution of a set of standard benchmarks derived by the SPLASH2. Also in this case, we focused our analysis on the architectural parameters listed in Table 1 which constitute a design space consisting of X = 2 17 alternative configurations. To carry out the system metrics evaluation (execution time and energy consumption), we leveraged the Sesc [8] simulation tool, a fast simulator for chipmultiprocessor architectures that is able to provide energy and performance results for a given application. Within Sesc, the energy consumption computation is supported by CACTI [9] and WATTCH models [10]. In order to give a fair comparison of ReSPIR other stateoftheart heuristics, we introduce a MultiObjective Simulated Annealing (MOSA) derived from [11] and a MultiObjective Genetic Algorithm (NSGAII) derived from [12]. Each of these heuristics is parametrized in terms of variables such as the initial population, the number of iteration steps, the set of permutation probabilities and other algorithm specific parameters. Generally, calibrating these parameters is a very difficult task which depends strongly on the problem domain; moreover, since the heuristics are inherently random 2, each combination of heuristics variables should be evaluated more than once in order to infer a more general trend. The number of runs for each algorithm is set such that the actual performance of the 2 this is true also for ReSPIR whenever we use a random DoE or a neural network where the initial weight are chosen randomly
5 Table 1. Design space for the sharedmemory multiprocessor platform Parameter Min. Max. # Processors 2 16 Processor issue width. 1 8 L1 instruction cache size 2K 16K L1 data cache size 2K 16K L2 private cache size 32K 256K L1 instruction cache assoc. 1w 8w L1 data cache assoc. 1w 8w L2 private cache assoc. 1w 8w I/D/L2 block size algorithm (in terms of ADRS, Average Distance from Reference Set) reaches an average asymptotic value. Practically speaking, this resulted into more than a hundred evaluations for each heuristic. We underline that we are focused on obtaining a good approximation of the exact Pareto set (ADRS 1%) by executing as few simulations as possible, i.e., by simulating less than 3.5% of the entire design space. As a consequence, each strategy has been run by considering an upper bound on the number of simulations which is 3.5% of the entire design space; the resulting Pareto front has been validated against the reference, exact Pareto front of the target architecture 3. Concerning ReSPIR, we focus on the overall performance by presenting a collapsed view for all the combinations of DoE and RSMs without breakingout the actual algorithm performance for each DoE and RSM. This is due to the fact that what we ant demonstrate is the goodness of the presented ReSPIR exploration strategy and not the goodness of a particular DoE or RSM. Figures 1(a), 1(b) and 1(c) show the average ADRS of the approximated Pareto fronts with respect to the exact Pareto, by varying the size of the design space analyzed. The figures show also the estimated ADRS standard deviation. We can note that, the MOSA algorithm is the worst heuristic in terms of average ADRS, starting from 18% for 1% of the design space and decreasing to 10% for 2.5% of the design space. The NSGAII and ReSPIR reach, respectively, 5% and 2.5% for the same percentage of design space. Also from the point of view of the standard deviation, the MOSA algorithm is running behind the NSGAII reaching a 4% at the upper bound of the design space, where the NSGAII and ReSPIR obtain around 0.5%. 4 Conclusions In this paper, we presented ReSPIR a design space exploration methodology that leverages the traditional DoE paradigm and RSM techniques combined with a powerful way of considering customized application constraints. The design of experiments phase generates an initial plan of experiments which are used to create a coarse view of the target design space; then a set of response surface extraction techniques are used to identify nonfeasible configurations and refine the Pareto configurations. This process is repeated iteratively until a target criterion, e.g. number of simulations, is satisfied. 3 The reference Pareto front has been computed with a fullsearch algorithm, thus it is the exact Pareto front.
6 "mean(adrs)" "std(adrs)" "mean(adrs)" "std(adrs)" "mean(adrs)" "std(adrs)" ADRS ADRS ADRS Percentage of design space analyzed (a) MOSA Percentage of design space analyzed (b) NSGAII Percentage of design space analyzed (c) ReSPIR Fig. 1. Average ADRS (with standard deviation) and percentage of the design space analyzed by (a) MOSA, (b) NSGAII and (c) ReSPIR References 1. K. Keutzer, S. Malik, A. R. Newton, J. Rabaey, and A. SangiovanniVincentelli. System level design: Orthogonolization of concerns and platformbased design. IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, 19(12): , December Gianluca Palermo, Cristina Silvano, and Vittorio Zaccaria. Multiobjective design space exploration of embedded system. Journal of Embedded Computing, 1(3): , Giuseppe Ascia, Vincenzo Catania, Alessandro G. Di Nuovo, Maurizio Palesi, and Davide Patti. Efficient design space exploration for application specific systemsonachip. Journal of Systems Architecture, 53(10): , Giovanni Beltrame, Dario Bruschi, Donatella Sciuto, and Cristina Silvano. Decisiontheoretic exploration of multiprocessor platforms. In Proceedings of CODES+ISSS: International Conference on HardwareSoftware Codesign and System Synthesis, pages , T. J. Santner, Williams B., and Notz W. The Design and Analysis of Computer Experiments. SpringerVerlag, C. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, M. J. D. Powell. The theory of radial basis functions. In Advances in Numerical Analysis II: Wavelets, Subdivision, and Radial Basis Functions, W. Light (ed, pages University Press, Jose Renau, Basilio Fraguela, James Tuck, Wei Liu, Milos Prvulovic, Luis Ceze, Smruti Sarangi, Paul Sack, Karin Strauss, and Pablo Montesinos. SESC simulator, January S. Wilton and N. Jouppi. CACTI:An Enhanced Cache Access and Cycle Time Model. volume 31, pages , David Brooks, Vivek Tiwari, and Margaret Martonosi. Wattch: a framework for architecturallevel power analysis and optimizations. In Proceedings ISCA 2000: International Symposium on Computer Architecture, pages 83 94, Jaszkiewicz A. Czyak P. Pareto simulated annealing  a metaheuristic technique for multipleobjective combinatorial optimisation. Journal of MultiCriteria Decision Analysis, (7):34 47, April K. Deb, S. Agrawal, A. Pratab, and T. Meyarivan. A Fast and Elitist Multi Objective Genetic Algorithm: NSGAII. Proceedings of the Parallel Problem Solving from Nature VI Conference, pages , 2000.
Preferencebased Search using ExampleCritiquing with Suggestions
Journal of Artificial Intelligence Research 27 (2006) 465503 Submitted 04/06; published 12/06 Preferencebased Search using ExampleCritiquing with Suggestions Paolo Viappiani Boi Faltings Artificial
More informationTHE PROBLEM OF finding localized energy solutions
600 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997 Sparse Signal Reconstruction from Limited Data Using FOCUSS: A Reweighted Minimum Norm Algorithm Irina F. Gorodnitsky, Member, IEEE,
More informationApproximately Detecting Duplicates for Streaming Data using Stable Bloom Filters
Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Fan Deng University of Alberta fandeng@cs.ualberta.ca Davood Rafiei University of Alberta drafiei@cs.ualberta.ca ABSTRACT
More informationScalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights
Seventh IEEE International Conference on Data Mining Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights Robert M. Bell and Yehuda Koren AT&T Labs Research 180 Park
More informationIntroduction to Data Mining and Knowledge Discovery
Introduction to Data Mining and Knowledge Discovery Third Edition by Two Crows Corporation RELATED READINGS Data Mining 99: Technology Report, Two Crows Corporation, 1999 M. Berry and G. Linoff, Data Mining
More informationSubspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity
Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity Wei Dai and Olgica Milenkovic Department of Electrical and Computer Engineering University of Illinois at UrbanaChampaign
More informationFeature Sensitive Surface Extraction from Volume Data
Feature Sensitive Surface Extraction from Volume Data Leif P. Kobbelt Mario Botsch Ulrich Schwanecke HansPeter Seidel Computer Graphics Group, RWTHAachen Computer Graphics Group, MPI Saarbrücken Figure
More informationRealTime Dynamic Voltage Scaling for LowPower Embedded Operating Systems
RealTime Dynamic Voltage Scaling for LowPower Embedded Operating Syste Padmanabhan Pillai and Kang G. Shin RealTime Computing Laboratory Department of Electrical Engineering and Computer Science The
More informationEstimating Query Result Sizes for Proxy Caching in Scientific Database Federations
Estimating Query Result Sizes for Proxy Caching in Scientific Database Federations Tanu Malik, Randal Burns Dept. of Computer Science Johns opkins University Baltimore, MD 21218 Nitesh V. Chawla Dept.
More informationRevisiting the Edge of Chaos: Evolving Cellular Automata to Perform Computations
Revisiting the Edge of Chaos: Evolving Cellular Automata to Perform Computations Melanie Mitchell 1, Peter T. Hraber 1, and James P. Crutchfield 2 In Complex Systems, 7:8913, 1993 Abstract We present
More informationCLoud Computing is the long dreamed vision of
1 Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data Cong Wang, Student Member, IEEE, Ning Cao, Student Member, IEEE, Kui Ren, Senior Member, IEEE, Wenjing Lou, Senior Member,
More informationPrepared for NASA Office of Safety and Mission Assurance NASA Headquarters Washington, DC 20546
Fault Tree Handbook with Aerospace Applications Prepared for NASA Office of Safety and Mission Assurance NASA Headquarters Washington, DC 20546 August, 2002 Fault Tree Handbook with Aerospace Applications
More informationAn Introduction to Variable and Feature Selection
Journal of Machine Learning Research 3 (23) 11571182 Submitted 11/2; Published 3/3 An Introduction to Variable and Feature Selection Isabelle Guyon Clopinet 955 Creston Road Berkeley, CA 9478151, USA
More informationGeneral Principles of Software Validation; Final Guidance for Industry and FDA Staff
General Principles of Software Validation; Final Guidance for Industry and FDA Staff Document issued on: January 11, 2002 This document supersedes the draft document, "General Principles of Software Validation,
More informationEfficient Combination of Ranked Result Sets in MultiFeature Applications
Efficient Combination of Ranked Result Sets in MultiFeature Applications WolfTilo Balke University of Augsburg, Germany balke@informatik.uniaugsburg.de c Copyright 2001. All rights reserved. Abstract
More informationCostAware Strategies for Query Result Caching in Web Search Engines
CostAware Strategies for Query Result Caching in Web Search Engines RIFAT OZCAN, ISMAIL SENGOR ALTINGOVDE, and ÖZGÜR ULUSOY, Bilkent University Search engines and largescale IR systems need to cache
More informationHow to Use Expert Advice
NICOLÒ CESABIANCHI Università di Milano, Milan, Italy YOAV FREUND AT&T Labs, Florham Park, New Jersey DAVID HAUSSLER AND DAVID P. HELMBOLD University of California, Santa Cruz, Santa Cruz, California
More informationApplication of Dimensionality Reduction in Recommender System  A Case Study
Application of Dimensionality Reduction in Recommender System  A Case Study Badrul M. Sarwar, George Karypis, Joseph A. Konstan, John T. Riedl GroupLens Research Group / Army HPC Research Center Department
More informationGiotto: A TimeTriggered Language for Embedded Programming
Giotto: A TimeTriggered Language for Embedded Programming THOMAS A HENZINGER, MEMBER, IEEE, BENJAMIN HOROWITZ, MEMBER, IEEE, AND CHRISTOPH M KIRSCH Invited Paper Giotto provides an abstract programmer
More informationDiscovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow
Discovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow Ashton Anderson Daniel Huttenlocher Jon Kleinberg Jure Leskovec Stanford University Cornell
More informationDistributed Optimization by Ant Colonies
APPEARED IN PROCEEDINGS OF ECAL91  EUROPEAN CONFERENCE ON ARTIFICIAL LIFE, PARIS, FRANCE, ELSEVIER PUBLISHING, 134 142. Distributed Optimization by Ant Colonies Alberto Colorni, Marco Dorigo, Vittorio
More informationLearning Invariant Features through Topographic Filter Maps
Learning Invariant Features through Topographic Filter Maps Koray Kavukcuoglu Marc Aurelio Ranzato Rob Fergus Yann LeCun Courant Institute of Mathematical Sciences New York University {koray,ranzato,fergus,yann}@cs.nyu.edu
More informationAutomatically Detecting Vulnerable Websites Before They Turn Malicious
Automatically Detecting Vulnerable Websites Before They Turn Malicious Kyle Soska and Nicolas Christin, Carnegie Mellon University https://www.usenix.org/conference/usenixsecurity14/technicalsessions/presentation/soska
More informationA Googlelike Model of Road Network Dynamics and its Application to Regulation and Control
A Googlelike Model of Road Network Dynamics and its Application to Regulation and Control Emanuele Crisostomi, Steve Kirkland, Robert Shorten August, 2010 Abstract Inspired by the ability of Markov chains
More informationSpeeding up Distributed RequestResponse Workflows
Speeding up Distributed RequestResponse Workflows Virajith Jalaparti (UIUC) Peter Bodik Srikanth Kandula Ishai Menache Mikhail Rybalkin (Steklov Math Inst.) Chenyu Yan Microsoft Abstract We found that
More informationExperimental Computer Science: The Need for a Cultural Change
Experimental Computer Science: The Need for a Cultural Change Dror G. Feitelson School of Computer Science and Engineering The Hebrew University of Jerusalem 91904 Jerusalem, Israel Version of December
More informationRobust Object Detection with Interleaved Categorization and Segmentation
Submission to the IJCV Special Issue on Learning for Vision and Vision for Learning, Sept. 2005, 2 nd revised version Aug. 2007. Robust Object Detection with Interleaved Categorization and Segmentation
More informationEVALUATION OF GAUSSIAN PROCESSES AND OTHER METHODS FOR NONLINEAR REGRESSION. Carl Edward Rasmussen
EVALUATION OF GAUSSIAN PROCESSES AND OTHER METHODS FOR NONLINEAR REGRESSION Carl Edward Rasmussen A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy, Graduate
More informationVirtualize Everything but Time
Virtualize Everything but Time Timothy Broomhead Laurence Cremean Julien Ridoux Darryl Veitch Center for UltraBroadband Information Networks (CUBIN) Department of Electrical & Electronic Engineering,
More informationExamplebased Synthesis of 3D Object Arrangements. Manolis Savva Stanford University
Examplebased Synthesis of 3D Object Arrangements Matthew Fisher Stanford University Daniel Ritchie Stanford University Manolis Savva Stanford University Database Input Scenes Thomas Funkhouser Princeton
More information