A DoE/RSM-based Strategy for an Efficient Design Space Exploration targeted to CMPs



Similar documents
Multi-objective Design Space Exploration based on UML

A Study of Local Optima in the Biobjective Travelling Salesman Problem

A Multi-Objective Performance Evaluation in Grid Task Scheduling using Evolutionary Algorithms

The Master s Degree with Thesis Course Descriptions in Industrial Engineering

An Interactive Visualization Tool for the Analysis of Multi-Objective Embedded Systems Design Space Exploration

Architectures and Platforms

Multiobjective Optimization and Evolutionary Algorithms for the Application Mapping Problem in Multiprocessor System-on-Chip Design

MULTI-OBJECTIVE OPTIMIZATION USING PARALLEL COMPUTATIONS

RESEARCH STATEMENT CRISTINA SILVANO INDEX

AES Power Attack Based on Induced Cache Miss and Countermeasure

Using Ant Colony Optimization for Infrastructure Maintenance Scheduling

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Simple Population Replacement Strategies for a Steady-State Multi-Objective Evolutionary Algorithm

Optimizing Configuration and Application Mapping for MPSoC Architectures

ACO FOR OPTIMAL SENSOR LAYOUT

WORKFLOW ENGINE FOR CLOUDS

Electric Distribution Network Multi objective Design Using Problem Specific Genetic Algorithm

OpenFOAM Optimization Tools

FRANCESCO BELLOCCHIO S CURRICULUM VITAE ET STUDIORUM

Genetic Algorithm Based Interconnection Network Topology Optimization Analysis

Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm

Java Modules for Time Series Analysis

A New Multi-objective Evolutionary Optimisation Algorithm: The Two-Archive Algorithm

A Reactive Tabu Search for Service Restoration in Electric Power Distribution Systems

A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II

Effect of Using Neural Networks in GA-Based School Timetabling

Model-based Parameter Optimization of an Engine Control Unit using Genetic Algorithms

ANT COLONY OPTIMIZATION ALGORITHM FOR RESOURCE LEVELING PROBLEM OF CONSTRUCTION PROJECT

Make Better Decisions with Optimization

Biopharmaceutical Portfolio Management Optimization under Uncertainty

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Investigation and Application of Multi-Disciplinary Optimization for. Automotive Body-in-White Development

Advanced analytics at your hands

Research Statement Immanuel Trummer

ENERGY EFFICIENT CONTROL OF VIRTUAL MACHINE CONSOLIDATION UNDER UNCERTAIN INPUT PARAMETERS FOR THE CLOUD

5MD00. Assignment Introduction. Luc Waeijen

Using Predictive Modeling for Cross-Program Design Space Exploration in Multicore Systems

MANY complex distributed embedded systems with time

A Learning Based Method for Super-Resolution of Low Resolution Images

HYBRID GENETIC ALGORITHM PARAMETER EFFECTS FOR OPTIMIZATION OF CONSTRUCTION RESOURCE ALLOCATION PROBLEM. Jin-Lee KIM 1, M. ASCE

STUDY OF PROJECT SCHEDULING AND RESOURCE ALLOCATION USING ANT COLONY OPTIMIZATION 1

Multiobjective Multicast Routing Algorithm

Artificial Neural Network and Non-Linear Regression: A Comparative Study

INTELLIGENT ENERGY MANAGEMENT OF ELECTRICAL POWER SYSTEMS WITH DISTRIBUTED FEEDING ON THE BASIS OF FORECASTS OF DEMAND AND GENERATION Chr.

A Hardware-Software Cosynthesis Technique Based on Heterogeneous Multiprocessor Scheduling

Multi-Objective Genetic Test Generation for Systems-on-Chip Hardware Verification

PyMTL and Pydgin Tutorial. Python Frameworks for Highly Productive Computer Architecture Research

Management of Software Projects with GAs

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network

FPGA area allocation for parallel C applications

International Journal of Computer & Organization Trends Volume21 Number1 June 2015 A Study on Load Balancing in Cloud Computing

Load Balancing. Load Balancing 1 / 24

The Psychology of Simulation Model and Metamodeling

CHAPTER 1 INTRODUCTION

An Evolutionary Algorithm in Grid Scheduling by multiobjective Optimization using variants of NSGA

Using artificial intelligence for data reduction in mechanical engineering

An Improved ACO Algorithm for Multicast Routing

The ACO Encoding. Alberto Moraglio, Fernando E. B. Otero, and Colin G. Johnson

PLAANN as a Classification Tool for Customer Intelligence in Banking

A Model-driven Approach to Predictive Non Functional Analysis of Component-based Systems

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

ACO Hypercube Framework for Solving a University Course Timetabling Problem

A New Quantitative Behavioral Model for Financial Prediction

Throughput constraint for Synchronous Data Flow Graphs

NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES

degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].

On Correlating Performance Metrics

MEng, BSc Computer Science with Artificial Intelligence

Predict Influencers in the Social Network

Empirically Identifying the Best Genetic Algorithm for Covering Array Generation

Scalable Cache Miss Handling For High MLP

Load Balancing and Switch Scheduling

On evaluating performance of exploration strategies for an autonomous mobile robot

MEng, BSc Applied Computer Science

How To Use Neural Networks In Data Mining

The Big Data methodology in computer vision systems

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Managing Adaptability in Heterogeneous Architectures through Performance Monitoring and Prediction

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

MAGS An Approach Using Multi-Objective Evolutionary Algorithms for Grid Task Scheduling

High-fidelity electromagnetic modeling of large multi-scale naval structures

Research on the Performance Optimization of Hadoop in Big Data Environment

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

NEURAL NETWORKS IN DATA MINING

Stochastic Processes and Queueing Theory used in Cloud Computer Performance Simulations

Computational Design Optimization Using Distributed Grid Resources

Transcription:

A DoE/RSM-based Strategy for an Efficient Design Space Exploration targeted to CMPs Gianluca Palermo, Cristina Silvano, Vittorio Zaccaria Politecnico di Milano -Dipartimento di Elettronica e Informazione E-mail: {gpalermo, silvano, zaccaria}@elet.polimi.it Abstract. Application-specific MPSoCs are usually designed by using a platform-based approach, where a wide range of customizable parameters must be tuned to find the best trade-offs in terms of the selected figures of merit (such as energy, delay and area). This optimization phase is called Design Space Exploration (DSE) and it generally consists of a Multi- Objective Optimization (MOO) problem with multiple constraints. In this paper, an efficient DSE methodology for application-specific MP- SoC is presented. The methodology is efficient since it allows determining a suitable set of candidate architectures with as few system simulations as possible, combining Design of Experiments (DoEs) and Response Surface Modeling (RSM) strategies. 1 1 Introduction Customizable MPSoCs supported by parallel programming represent an emerging computing paradigm for application-specific processors. In fact, they represent the best compromise in terms of a stable hardware platform which is software programmable, thus customizable, upgradeable and extensible. In this sense, the MPSoC paradigm minimizes the risk of missing the time-to-market deadline while allowing for greater efficiency due to architecture customization and software compilation techniques. For these architectures, the platform-based design approach [1] is widely used to design application-specific architectures meeting time-to-market constraints. In this scenario, configurable simulation models are used to accurately tune the on-chip architectures and to meet the target application requirements in terms of performance, battery lifetime and area. The Design Space Exploration (DSE) phase is used to tune the configurable system parameters and it generally consists of a multi-objective optimization problem. The DSE problem consists of exploring a large design space consisting of several parameters at system and micro-architectural levels. Although several heuristic techniques have been proposed to address this problem so far, they are all characterized by low efficiency to identify the Pareto front of feasible solutions. Evolutionary or sensitivity based algorithms are among the most notable, state-of-the art techniques [2 4]. 2 An application specific DSE methodology In this paper, we present an application-specific design space exploration strategy leveraging Design of Experiments (DoE) and Response Surface Modeling 1 This work was supported in part by the EC under grant MULTICUBE FP7-216693

(RSM) techniques. Once the objective functions associated to the system have been identified, the proposed methodology allows the efficient identification of an approximate Pareto sets of candidate architectures by evaluating as few system configurations as possible. This is a notable achievement, since, nowadays, evaluating the objective function f(x) of a single system configuration x (being it either performance or power consumption) means hours or days of simulations under a realistic workload for complex SoCs. DESIGN OF EXPERIMENTS. The term Design of Experiments (DoE) [5] is used to identify the planning of an information-gathering experimentation campaign where a set of variable parameters can be tuned. In this paper, we define an experiment as an actual simulation of the target system. The reason for DoEs is that very often the designer is interested in the effects of some parameter s tuning on the system response. Design of experiments is a discipline that has very broad application across natural and social sciences and encompasses a set of techniques whose main goal is the screening and analysis of the system behavior with a small number of simulations. Each DoE plan differs in terms of the layout of the selected design points in the design space. Although several design of experiments have been proposed in the literature so far, we used here the most traditional DoEs which we will leverage in the construction of our efficient design space exploration methodology: Random. In this case, design space configurations are picked up randomly by following a Probability Density Function (PDF). In our methodology, we will use a uniformly distributed PDF. Full factorial. In statistics, a factorial experiment [5] is an experiment whose design consists of two or more parameters, each with discrete possible values or levels, and whose experimental units take on all possible combinations of these levels across all such parameters. Such an experiment allows studying the effects of each parameter on the response variable, as well as the effects of interactions between parameters on the response variable. In this paper, we consider a 2-level full factorial DoE, where the only levels considered are the minimum and maximum for each parameter. Central composite design. A Central Composite Design [5] is an experimental design specifically targeted to the construction of response surfaces of the second order (quadratic) without requiring a three-level factorial DoE. Box-Behnken. The Box-Behnken design [5] is suitable for quadratic models where parameter combinations are at the center of the edges of the process space plus a design with all the parameters at the center. The primary advantage is that the parameter combinations avoid extreme values taken at the same time (in contrast with the central composite design). RESPONSE SURFACE METHODS. Response Surface Modeling techniques allow determining an analytical dependence between several design parameters and one or more response variables. The working principle of RSM is to use a set of simulations generated by DoE in order to obtain a response model. A typical RSM flow involves a training phase, in which known data (or training set) is used to identify the RSM configuration, and a prediction phase in which the RSM is used to forecast unknown system response. RSMs are an effective tool for analytically predicting the behavior of the system platform without resorting to a system simulation; they represent the core of the presented methodology. The RSM models that used in the presented methodology are:

Linear regression. Linear regression is a regression method that models a linear relationship between a dependent response function f and some independent variables x i, i = 1 p plus a random term ε. In this work we apply regression by taking into account also the interaction between the parameters as well as quadratic behavior with respect to a single parameter. Shepard s interpolation. The Shepard s technique is a well known method for multivariate interpolation. This technique is also called inverse distance weighting (IDW) method because the value of the response function in unknown points is the sum of the value of the response function in known points weighted with the inverse of the distance. Artificial Neural Networks Artificial neural networks (ANNs) [6] represent a powerful and flexible method for generalized non-linear regression. The ANN approximation function f is defined, recursively, as a function of other, linearly combined functions f i : ( ) f(x) = Θ w i f i (x) (1) The function Θ is called the activation function while the coefficients w i are called weights. Functions f i can be recursively defined as in Equation 1 in order to create a layered structure. Radial Basis Functions Radial basis functions (RBF) are a widely used interpolation/approximation model [7]. The interpolation function is built on a set of training configurations x j as follows: f(x) = i n λ j φ( x x j ) (2) j=1 where φ is a scalar distance function, λ j are the weights of the RBF and n is the number of samples in the training set. THE PROPOSED DESIGN FLOW. The proposed strategy is called Response Surface-based Pareto Iterative Refinement (ReSPIR). It is based on the concept of iterative refinements of the approximate Pareto set by using predictions given by RSM model. The methodology is parametric in terms of DoE and RSM technique, as well as the maximum number of simulations to be run (see Algorithm 1). Initially (step 2), the DoE plan is used to pick up the set of initial configurations corresponding to the plan of simulations to be run. This step provides an initial coarse view of the target design space, by running the simulations to obtain the actual measurements f associated with F 0. In the successive steps, F 0 represents the archive containing significant information about all the architectural configurations simulated so far. At the first iteration, provided that the maxnsim value is greater than the DoE size, condition in step 5 is met and the while loop body is entered. The RSM technique (step 7) is thus trained with the current archive F 0. The response surface model generates a prediction archive R 0 which is then filtered for Pareto configurations in step 8. Successively (step 9) the simulations associated with the Pareto set R 1 are run; the result is put into the intermediate archive F 1 and the coverage with respect to F 0 is computed. The algorithm iterates until either this

Algorithm 1 The RSM-Supported Iterative Pareto Refinement Design Space Exploration Flow Require: DOE, RSM, maxnsim 1: nsim = 0 2: Generate and run the simulations from DOE. Update nsim accordingly. Put results into F 0. 3: cov = 100% 4: F 1 = {} 5: while (cov > 0) (nsim < maxnsim) do 6: F 0 = F 0 F 1 7: Train RSM with the content of F 0 and compute a prediction R 0, x X 8: R 1 = Ψ (R 0) 9: Generate and run the simulations associated with the configurations in R 1. Update nsim accordingly. Put results into F 1. 10: cov = χ(f 1, F 0) 11: end while 12: return Ψ(F 0) by pruning non-feasible configurations. coverage value reaches 0 or the number of simulations has reached the maximum maxnsim. In the case of reiteration, the freshly generated Pareto points in F 1 are merged with F 0 to improve the prediction accuracy of the RSM. Finally (step 12), F 0 is Pareto filtered by pruning all the non-feasible configurations; the resulting archive is the approximate solution to our Design Space Exploration problem. 3 Validation of ReSPIR To validate the presented ReSPIR methodology, we applied it to the customization of a symmetric shared-memory multiprocessor architecture for the execution of a set of standard benchmarks derived by the SPLASH-2. Also in this case, we focused our analysis on the architectural parameters listed in Table 1 which constitute a design space consisting of X = 2 17 alternative configurations. To carry out the system metrics evaluation (execution time and energy consumption), we leveraged the Sesc [8] simulation tool, a fast simulator for chip-multiprocessor architectures that is able to provide energy and performance results for a given application. Within Sesc, the energy consumption computation is supported by CACTI [9] and WATTCH models [10]. In order to give a fair comparison of ReSPIR other state-of-the-art heuristics, we introduce a Multi-Objective Simulated Annealing (MOSA) derived from [11] and a Multi-Objective Genetic Algorithm (NSGA-II) derived from [12]. Each of these heuristics is parametrized in terms of variables such as the initial population, the number of iteration steps, the set of permutation probabilities and other algorithm specific parameters. Generally, calibrating these parameters is a very difficult task which depends strongly on the problem domain; moreover, since the heuristics are inherently random 2, each combination of heuristics variables should be evaluated more than once in order to infer a more general trend. The number of runs for each algorithm is set such that the actual performance of the 2 this is true also for ReSPIR whenever we use a random DoE or a neural network where the initial weight are chosen randomly

Table 1. Design space for the shared-memory multi-processor platform Parameter Min. Max. # Processors 2 16 Processor issue width. 1 8 L1 instruction cache size 2K 16K L1 data cache size 2K 16K L2 private cache size 32K 256K L1 instruction cache assoc. 1w 8w L1 data cache assoc. 1w 8w L2 private cache assoc. 1w 8w I/D/L2 block size 16 32 algorithm (in terms of ADRS, Average Distance from Reference Set) reaches an average asymptotic value. Practically speaking, this resulted into more than a hundred evaluations for each heuristic. We underline that we are focused on obtaining a good approximation of the exact Pareto set (ADRS 1%) by executing as few simulations as possible, i.e., by simulating less than 3.5% of the entire design space. As a consequence, each strategy has been run by considering an upper bound on the number of simulations which is 3.5% of the entire design space; the resulting Pareto front has been validated against the reference, exact Pareto front of the target architecture 3. Concerning ReSPIR, we focus on the overall performance by presenting a collapsed view for all the combinations of DoE and RSMs without breaking-out the actual algorithm performance for each DoE and RSM. This is due to the fact that what we ant demonstrate is the goodness of the presented ReSPIR exploration strategy and not the goodness of a particular DoE or RSM. Figures 1(a), 1(b) and 1(c) show the average ADRS of the approximated Pareto fronts with respect to the exact Pareto, by varying the size of the design space analyzed. The figures show also the estimated ADRS standard deviation. We can note that, the MOSA algorithm is the worst heuristic in terms of average ADRS, starting from 18% for 1% of the design space and decreasing to 10% for 2.5% of the design space. The NSGA-II and ReSPIR reach, respectively, 5% and 2.5% for the same percentage of design space. Also from the point of view of the standard deviation, the MOSA algorithm is running behind the NSGA-II reaching a 4% at the upper bound of the design space, where the NSGA-II and ReSPIR obtain around 0.5%. 4 Conclusions In this paper, we presented ReSPIR a design space exploration methodology that leverages the traditional DoE paradigm and RSM techniques combined with a powerful way of considering customized application constraints. The design of experiments phase generates an initial plan of experiments which are used to create a coarse view of the target design space; then a set of response surface extraction techniques are used to identify non-feasible configurations and refine the Pareto configurations. This process is repeated iteratively until a target criterion, e.g. number of simulations, is satisfied. 3 The reference Pareto front has been computed with a full-search algorithm, thus it is the exact Pareto front.

0.02 0.2 0.18 0.16 "mean(adrs)" "std(adrs)" 0.07 0.06 "mean(adrs)" "std(adrs)" 0.055 0.05 0.045 "mean(adrs)" "std(adrs)" 0.05 0.04 0.14 0.035 ADRS 0.12 0.1 ADRS 0.04 0.03 ADRS 0.03 0.025 0.08 0.02 0.02 0.015 0.06 0.04 0.01 0.01 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 Percentage of design space analyzed (a) MOSA 0 0.01 0.015 0.02 0.025 0.03 0.035 0.04 Percentage of design space analyzed (b) NSGA-II 0 0.01 0.015 0.02 0.025 0.03 0.035 0.04 Percentage of design space analyzed (c) ReSPIR Fig. 1. Average ADRS (with standard deviation) and percentage of the design space analyzed by (a) MOSA, (b) NSGA-II and (c) ReSPIR References 1. K. Keutzer, S. Malik, A. R. Newton, J. Rabaey, and A. Sangiovanni-Vincentelli. System level design: Orthogonolization of concerns and platform-based design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 19(12):1523 1543, December 2000. 2. Gianluca Palermo, Cristina Silvano, and Vittorio Zaccaria. Multi-objective design space exploration of embedded system. Journal of Embedded Computing, 1(3):305 316, 2006. 3. Giuseppe Ascia, Vincenzo Catania, Alessandro G. Di Nuovo, Maurizio Palesi, and Davide Patti. Efficient design space exploration for application specific systemson-a-chip. Journal of Systems Architecture, 53(10):733 750, 2007. 4. Giovanni Beltrame, Dario Bruschi, Donatella Sciuto, and Cristina Silvano. Decision-theoretic exploration of multiprocessor platforms. In Proceedings of CODES+ISSS: International Conference on Hardware-Software Codesign and System Synthesis, pages 205 210, 2006. 5. T. J. Santner, Williams B., and Notz W. The Design and Analysis of Computer Experiments. Springer-Verlag, 2003. 6. C. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 2002. 7. M. J. D. Powell. The theory of radial basis functions. In Advances in Numerical Analysis II: Wavelets, Subdivision, and Radial Basis Functions, W. Light (ed, pages 105 210. University Press, 1992. 8. Jose Renau, Basilio Fraguela, James Tuck, Wei Liu, Milos Prvulovic, Luis Ceze, Smruti Sarangi, Paul Sack, Karin Strauss, and Pablo Montesinos. SESC simulator, January 2005. http://sesc.sourceforge.net. 9. S. Wilton and N. Jouppi. CACTI:An Enhanced Cache Access and Cycle Time Model. volume 31, pages 677 688, 1996. 10. David Brooks, Vivek Tiwari, and Margaret Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In Proceedings ISCA 2000: International Symposium on Computer Architecture, pages 83 94, 2000. 11. Jaszkiewicz A. Czyak P. Pareto simulated annealing - a metaheuristic technique for multiple-objective combinatorial optimisation. Journal of Multi-Criteria Decision Analysis, (7):34 47, April 1998. 12. K. Deb, S. Agrawal, A. Pratab, and T. Meyarivan. A Fast and Elitist Multi- Objective Genetic Algorithm: NSGA-II. Proceedings of the Parallel Problem Solving from Nature VI Conference, pages 849 858, 2000.