The SimpleScalar Macro Tool (SSIT)
|
|
|
- Dayna Robertson
- 9 years ago
- Views:
Transcription
1 The SimpleScalar Macro Tool (SSIT) 1 Carsten M. van der Hoeven, B.H.H. (Ben) Juurlink, Demid Borodin Computer Engineering Laboratory Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology Mekelweg 4, 2628 CD Delft, The Netherlands Phone: , Fax: [email protected], {benj,demid}@ce.et.tudelft.nl Abstract SimpleScalar is a simulation tool set that is often used in computer architecture research. Amongst others, it provides a mechanism to synthesize new instructions without having to modify the assembler, by annotating existing instructions in the assembly files. This mechanism, however, is rather error-prone, especially for a novice in SimpleScalar. For this reason two tools have already been developed. These tools are called SimpleScalar Instruction Tool (SSIT) and SimpleScalar Architecture Tool (SSAT). With SSIT, a developer can add functional units and new instructions to SimpleScalar and it allowes the programmer to use readable instructions in assembly to make programming ISA extensions less error prone and easier to do. SSAT enables the developer to extend the number of registers in the SimpleScalar simulator and use them in assembly. This paper presents the SimpleScalar Macro Tool (SSMT), a tool to enable the developer to write ISA extensions in C programming language instead of in assembly. This makes extending the SimpleScalar simulation toolset with new instructions much easier because the developer now can concentrate on writing optimal code without having to bother about register allocation and instruction ordering of the existing ISA. Besides that, experiments conducted on several kernels have showed that SSMT does not introduce a significant performance overhead and in some cases even increases the performance of ISA extensions up to 6% compared to hand written assembly code. Keywords processor simulator, instruction set architecture, macro programming interface I. Introduction Simulation is often used to evaluate newly designed computer architectures. SimpleScalar [1, 2] is a very popular processor simulator toolset that enables a developer to effectively implement, simulate and evaluate ISA extensions. To ensure easy development, a version of the GNU C compiler has been provided along with the toolset. SimpleScalar has been used to evaluate many new architectural designs such as novel branch predictors [3,4], cache organizations [5,6], instruction set extensions [7, 8] and fault tolerance schemes [9, 10]. The SimpleScalar Processor Toolset provides a mechanism to extend the Instruction Set Architecture, however this mechanism is quite error prone and difficult to use, especially by a novice at SimpleScalar. This mechanism requires the developer to write annotated assembly instructions where he needs ISA extensions to be executed. In addition to this, the developer also has to do register allocation for these ISA extensions as well since the provided GNU C compiler does not recognise the extensions. Therefore, two other tools have been designed at the Delft University of Technology. These tools are called SimpleScalar Instruction Tool (SSIT) and SimpleScalar Architecture Tool (SSAT) [11, 12]. SSIT is designed to add functional units and new instructions to SimpleScalar and to allow the user to use readable assembly code instead of annotated assembly code, this makes writing optimized assembly much less error prone and easier. SSAT allows the developer to extend the number of existing registers, alias existing registers and define new registers that are not already present in the SimpleScalar processor and use these in assembly. Although SSIT and SSAT substantially simplify the work of a developer, ISA extensions still need to be written in assembly language where the developer needs to take everything into account, including register allocation and instruction ordering. To aid the developer in his efforts, the SimpleScalar Macro Tool is developed. This tool allows the developer to write ISA extensions directly in C code and leave the register allocation and instruction ordering to the C compiler. This paper first discusses SSMT in detail in Section II, where Section II-A covers the SSIT configuration file, Section II-B discusses the internals of SSMT and Section II-C shows a small example. Section III covers the results of the experiments that were conducted to test SSMT and evaluate the performance impact of the macro interface. Finally Section IV provides a
2 2 short summary. II. SSMT The SimpleScalar Macro Tool (SSMT) is a tool designed to work within the tool chain of which the SSIT and SSAT tools are already part of. Unlike SSIT and SSAT, which process assembly code to change readable mnemonics into annotated instructions that are recognized by the assembler, SSMT is used before compiling C-code into assembly. SSMT takes in the SSIT configuration file and generates a header file with C macros that can be used throughout any C code that needs to be optimized using the newly designed ISA extensions described in the configuration file. This is depicted in Figure 1. This section first describes the SSIT configuration file followed by the internals of SSMT and a small example A. SSIT configuration file In order to run SSIT and SSMT properly, a configuration file needs to be provided. The contents and the format of this file has been defined by the developers of SSIT, but is very suitable to provide SSMT the appropriate information. A SSIT configuration file consists of a number of sections with one of two types: (1) FUNCTIONAL UNITS sections in which new functional units can be defined or (2) MACHINE.DEF sections that allows a developer to define new instructions. Since the FUNCTIONAL UNITS sections are not used by SSMT they are not discussed further. The MACHINE.DEF sections are constructed as follows: #define <INSTRUCTION NAME>_IMPL { \ <Functional implementation of the new instruction in C> \ } DEFINST(<instruction name>, <operands>, <name of annotated instruction>, <FU class>, <iflags>, <odep1>, <odep2>, <idep1>, <idep2>, <idep3> ) where <instruction name> is the name of the new instruction as a string. <operands> is a string that specifies the instruction operand fields (the instruction format). <name of annotated instruction> is the name of an existing SimpleScalar instruction which will be annotated to pass the new instruction through the SimpleScalar assembler. <FU class> is the resource class required by this new instruction. <iflags> are instruction flags, used by SimpleScalar for fast decoding. <odep1>, <odep2> are two output dependency designators. They specify which registers are modified by the instruction and are used to determine data dependencies. <idep1>, <idep2>, <idep3> are input dependency designators. They specify which registers are read by the instruction. For more information on SSIT please refer to the SSIT manual [12]. B. Internals Figure 1 depicts how SSMT is used in combination with SSIT and SSAT. The SSIT configuration file is described in Section II-A and in [12]. This section will describe how the SSIT configuration file is used to generate the header file with the macros. The data that is necessary to construct the macros is taken from each instruction definition in the SSIT configuration file. From this file, the following items are taken from each MACHINE.DEF section: <instruction name>, <operands>, <odep1>, <odep2>, <idep1>, <idep2> and <idep3>. From the information that is obtained by reading these fields, the macros are constructed. The general form of these macros is: SSMT INSTRUCTION NAME XXXXX(ARGUMENTS) where SSMT is a prefix to ensure that each macro generated with SSMT is unlike a user defined macro. INSTRUCTION NAME is the name of the instruction in capitals. If an instruction name contains dots (spaces and underscores are not allowed), and underscore should take its place. XXXXX is a sequence of letters indicating the type of the following arguments. Each X can stand for one of the following four possibilities: Nothing - If the number of arguments is less than 5, some of the letters are omitted and do not appear in the macro. V - If the argument is supposed to be a variable name, the letter is a V. R - If the argument is supposed to indicate a register, the letter is an R. The register name should be indicated by a string for correct results. More about this comes later. I - For the instructions that need an immediate operand, there is the letter I to indicate that the variable should be an immediate value. ARGUMENTS is a list of arguments in the same sequence as they are needed by the instruction to which the macro refers. The type of these arguments should be as determined by the letters preceding the arguments. Compiler errors will occur when the types are not cor-
3 3 Fig. 1 Entire toolchain around SimpleScalar, including SSIT, SSAT and SSMT rect, although no type checking is done. The need for differentiation by letters comes from the fact that the macros are regular C macros. Therefore, each macro should be unique and given the number of possibilities, this is the most convenient way to ensure that all the macros are unique. The distinction between variable names and register names (strings) originates from the possibility to extend the number of registers or define new registers with SSAT. Since the compiler does not recognize these new registers, it is not possible to apply automatic register allocation on those registers. Therefore, a developer needs to do manual register allocation for extended or new registers by the use of register names, but can leave the register allocation to the compiler when using existing registers by using the variable names. Due to minour differences in the implementation of some instructions, there are two macros that always have the same configuration: - Comment macro This is a macro that has COMMENT for the instruction name and takes only a string as an argument. No identification letters are needed. - Load and Store instructions Instructions that operate on memory always have the following macros to access them: SSMT_NAME_RV(register_name, pointer_to_variable) and SSMT_NAME_RVI(register_name, pointer_to_variable, immediate_address_offset) Because custom load and store instructions are not written unless there is a need for them (because of new types of registers that are defined with SSAT), these macros take in the name of a register and the address of a certain variable (which can be loaded into the new register). For easy memory offset (which is possible in SimpleScalar), the immediate value is added since the translation from macro to assembly does not do this automatically. The macros that are described above are regular C macros. This means that they are simply replaced during the pre-processing of the C file. In this case each macro is replaced by a piece of extended in-line assembly containing the name of the instruction and all the variables. C. Example This section demonstrates an example of the use of the macros in C-code. The example consists of snippets taken from four different files. The first snippet is taken from the SSIT configuration file: #define AVG_IMPL { \ sword_t result = (GPR(RS)+GPR(RT)) >> 1; \
4 4 } SET_GPR(RD, result); \ DEFINST( "avg", "d,s,t", "add", avg_fu, F_ICOMP, DGPR(RD), DNA, DGPR(RS), DGPR(RT), DNA ) This piece of code defines a new instruction that calculates the average of two input arguments and writes it to the output register. In the instruction definition, the name of the annotated instruction, avg in this case, comes first, then the the register layout followed by the instruction that should be annotated to make the new instruction available in the simulator. Next comes the functional unit followed by an instruction flag. Last come the output and input registers, each of which one is not used (DNA). For more information about this see Section II-A or [12]. Given the SSIT configuration file with the instruction definition as stated above, SSMT generates the following in the header file: SSMT_AVG_VVV(out0,in0,in1) asm("avg %0,%1,%2":"=r" (out0):"r" (in0),"r" (in1)) SSMT_AVG_VVR(out0,in0,in1) asm("avg %0,%1," in1 "":"=r" (out0):"r" (in0)) SSMT_AVG_VRV(out0,in0,in1) asm("avg %0," in0 ",%1":"=r" (out0):"r" (in1)) SSMT_AVG_VRR(out0,in0,in1) asm("avg %0," in0 "," in1 "":"=r" (out0):) SSMT_AVG_RVV(out0,in0,in1) asm("avg " out0 ",%0,%1"::"r" (in0),"r" (in1)) SSMT_AVG_RVR(out0,in0,in1) asm("avg " out0 ",%0," in1 ""::"r" (in0)) SSMT_AVG_RRV(out0,in0,in1) asm("avg " out0 "," in0 ",%0"::"r" (in1)) SSMT_AVG_RRR(out0,in0,in1) asm("avg " out0 "," in0 "," in1 ""::) Since the new instruction takes in three different arguments and each argument can be either a normal variable that is placed in a certain register or a variable that is placed in a type of register that is either new, extended or aliased, the total number of possibilities is eight. An example of a C source file in which the generated macro is used is stated below: #include "ssmt.h" int main(int argc, char ** argv) { short a = 6, b = 4; SSMT_AVG_VVV(a, a, b); return 0; } Here, the macro SSMT AVG VVV takes in variables a and b and stores the result in a. Note that this code does not present its result to the user, however at the end, variable a should contain the value 5. The last part of the example is a small piece of the code that is generated by gcc, with optimization level 2 enabled:.. li $5, 0x li $2, 0x #APP avg $5,$5,$2 #NO_APP.. As can be seen in the assembly code, the variables a and b are loaded into a register before execution of the new instruction. The piece of code where the variables are loaded into registers is automatically generated by GCC. III. Experimental results To test the impact of the macro tool on the performance of applications using ISA extensions, eight multimedia kernels that were rewritten in assembly to use an adapted form of MMX extensions are now rewritten in C code to use the macros SSMT created. The rewritten C code is then compiled with GCC and ran through SSIT and SSAT to obtain executable code. The rewritten C code versions of the kernels are then compared to their hand written counterpart and the original C code. First the kernels themselves will be discussed, followed by the results of the experiments. A. Kernels In order to make proper comparison a total of eight multimedia kernels were changed to use the macros from the header file that was created with SSMT. All kernels are optimized for the use of MMX instructions and thus work on vectors of four or eight bytes at a time. Three of the kernels have actually two forms that are basically the same. One of the two is a small version that works on matrices of eight by eight or eight by sixteen bytes. The other one is a large version that works on vectors and matrices with much larger dimensions to more accurately simulate the result of the use of MMX instructions because relative overhead for example is decreased. A.1 Add image This kernel adds two images of the same size and stores the result in one of the input images. The images are matrices of unsigned bytes thus limiting each value between 0 and 255. Saturation is done in the MMX instructions. The small version of the Add image kernel adds two images of 8 x 8 bytes. The large version of the Add image kernel adds two images of 704 x 576 bytes. These images are converted
5 5 from Windows Bitmap files to be read in properly by the initialization part of the kernel. A.2 DCT The DCT kernel computes the Discrete Cosine Transformation. In this particular case the algorithm works on a matrix of 704 x 576 bytes and stores the result in a different one. A.3 IDCT To calculate the reverse process of the DCT, this kernel calculates the Inverse Discrete Cosine Transformation. Once again this kernel operates on a matrix of bytes that is 704 high and 576 wide. The result is stored in a different matrix than the original one. A.4 Matrix Transpose The transpose of a matrix is calculated with this kernel. Like the kernel that adds two images this kernel has two versions: The smaller version transposes an 8 x 8 matrix, the larger matrix (for the large version) transposes a matrix of 704 x 576. A.5 Matrix Vector multiplication Matrix Vector multiplication is something that is used quite often. This kernel also has a version that works on a matrix of 8 x 16 bytes and a vector of 16 bytes, this is the small one. The large version works on a matrix of 704 x 576 bytes and a vector of 576 bytes. B. Comparison results This chapter discusses the results of the comparison between hand written code and C-code using macros. The total comparison is between the original C-code, C-code optimized with GCC optimization level 2, hand written assembly, C-code using macros and C-code using macros and optimized with GCC optimization level 2. As a measurement, the total number of cycles that is needed to execute a certain kernel is taken and compared to the other values. The results of the experiment is shown in Figure 2. From this figure it is clear that the hand written assembly code performs significantly better than the original C code, however, optimizing C code with GCC already proves to give a significant speedup that is in half the cases even faster than the unoptimized C code that uses the macrointerface. Optimizing the macro using C-code with GCC gives performance that is fairly comparable to hand written assembly, in some cases the macro using optimized C-code is even better than the hand written assembly. To make the relation between the performance of the hand written assembly and the optimized C code using the macro interface, both values are normalized to the number of cycles of the hand written version and shown in Figure 3. This figure shows that, with the exception of one case, the optimized C code has performance that rivals that of the hand written code. The differences in performance can be partially explained by the following: - FOR loops FOR loops are statements that are widely used by programmers. The way a FOR loop is translated into assembly is in general dependent on the way it is written in C. An initialization or loop condition with a certain variable for example will give a different result than the same thing with values that are known at compile time. GCC makes these kind of decisions based on what is known at compile time, without regards to the context of the loop (something that the compiler can not recognize because it lacks the intelligence to do so). A human programmer however, can take the context into account and therefore write either a more efficient FOR loop or (unconsciously) a less efficient one. - registers When it comes to registers, the compiler is mostly better than human programmers. Where GCC has no problem at all taking all available registers into account, a human programmer has far more difficulties in doing so. For that reason a human programmer will most likely give up his efforts before they begin and take a decrease in performance for granted to be able to write code that is more understandable (and thus easier to write), where GCC writes code that is less readable but more efficient when it comes to registers. IV. Conclusion This paper has presented the SimpleScalar Macro Tool. A tool that allows a developer to use new instructions that are written for the SimpleScalar simulator in a high level programming language by means of a macro interface. The macro interface simplifies the use of ISA extensions when programming applications that use them. Although the programmer is not completely relieved from programming new applications in assembly language, SSMT does make the job easier by removing the burden of register allocation and all other non-isa extension related programming from the programmers mind. Besides that,
6 6 Fig. 2 Number of execution cycles that is needed to perform a kernel, normalized to the execution time of the unoptimized original C code Fig. 3 Number of execution cycles that is needed to execute a kernel, normalized to the number of cycles needed by the hand written assembly SSMT usually does not introduce a significant performance decrease that is often associated with the use of a macro(-like) interface. In some cases the macro using C code even outperforms the hand-written code. References [1] Todd Austin, Eric Larson, and Dan Ernst. SimpleScalar: An Infrastructure for Computer System Modeling. Computer, 35(2):59 67, [2] Doug Burger and Todd M. Austin. The SimpleScalar Tool Set, Version 2.0. SIGARCH Comput. Archit. News, 25(3):13 25, [3] Daniel A. Jimenez. Piecewise Linear Branch Prediction. In Proc. 32nd Annual Int. Symp. on Computer Architecture (ISCA-05), pages , Washington, DC, USA, IEEE Computer Society. [4] Renju Thomas, Manoj Franklin, Chris Wilkerson, and Jared Stark. Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlated Branches From a Large Global History. In Proc. 30th Annual Int. Symp. on Computer Architecture (ISCA-05), pages , New York, NY, USA, ACM Press.
7 7 [5] Serkan Ozdemir, Debjit Sinha, Gokhan Memik, Jonathan Adams, and Hai Zhou. Yield-aware cache architectures. In Proc. 39th Annual IEEE/ACM Int. Symp. on Microarchitecture (MICRO-39), pages 15 25, Washington, DC, USA, IEEE Computer Society. [6] Chuanjun Zhang. Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches. In Proc. 33rd Annual Int. Symp. on Computer Architecture (ISCA-06), pages , Washington, DC, USA, IEEE Computer Society. [7] D. Cheresiz, B.H.H. Juurlink, S. Vassiliadis, and H.A.G. Wijshoff. The CSI Multimedia Architecture. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pages 1 13, January [8] Asadollah Shahbahrami, Ben Juurlink, Demid Borodin, and Stamatis Vassiliadis. Avoiding Conversion and Rearrangement Overhead in SIMD Architectures. International Journal of Parallel Programming, pages , June [9] Eric Rotenberg. AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors. In FTCS-29, pages 84 91, Madison, Wisconsin, USA, Jun IEEE Computer Society Press. [10] D. Borodin, B.H.H. Juurlink, and S. Vassiliadis. Instruction-Level Fault Tolerance Configurability. In IC- SAMOS VII: Int. Conf. on Embedded Computer Systems: Architectures, Modeling, and Simulation. [11] B.H.H. (Ben) Juurlink, Demid Borodin, Roel J. Meeuws, Gerard Th. Aalbers, and Hugo Leisink. The SimpleScalar Instruction Tool (SSIT) and the SimpleScalar Architecture Tool (SSAT). unpublished manuscript, available at demid/ssiat/. [12] SSIT, SSAT and SSIAT webpage. nl/~demid/ssiat/.
FPGA area allocation for parallel C applications
1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University
TODAY, FEW PROGRAMMERS USE ASSEMBLY LANGUAGE. Higher-level languages such
9 Inline Assembly Code TODAY, FEW PROGRAMMERS USE ASSEMBLY LANGUAGE. Higher-level languages such as C and C++ run on nearly all architectures and yield higher productivity when writing and maintaining
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed
DATA ITEM DESCRIPTION
DD Form 1664, APR 89 Previous editions are obsolete Page 1 of 4 Pages 135/123 DATA ITEM DESCRIPTION Form Approved OMB NO.0704-0188 Public reporting burden for collection of this information is estimated
The programming language C. sws1 1
The programming language C sws1 1 The programming language C invented by Dennis Ritchie in early 1970s who used it to write the first Hello World program C was used to write UNIX Standardised as K&C (Kernighan
RUNAHEAD EXECUTION: AN EFFECTIVE ALTERNATIVE TO LARGE INSTRUCTION WINDOWS
RUNAHEAD EXECUTION: AN EFFECTIVE ALTERNATIVE TO LARGE INSTRUCTION WINDOWS AN INSTRUCTION WINDOW THAT CAN TOLERATE LATENCIES TO DRAM MEMORY IS PROHIBITIVELY COMPLEX AND POWER HUNGRY. TO AVOID HAVING TO
Storing Measurement Data
Storing Measurement Data File I/O records or reads data in a file. A typical file I/O operation involves the following process. 1. Create or open a file. Indicate where an existing file resides or where
Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27
Logistics Week 1: Wednesday, Jan 27 Because of overcrowding, we will be changing to a new room on Monday (Snee 1120). Accounts on the class cluster (crocus.csuglab.cornell.edu) will be available next week.
Analysis of Memory Sensitive SPEC CPU2006 Integer Benchmarks for Big Data Benchmarking
Analysis of Memory Sensitive SPEC CPU2006 Integer Benchmarks for Big Data Benchmarking Kathlene Hurt and Eugene John Department of Electrical and Computer Engineering University of Texas at San Antonio
NEON. Support in Compilation Tools. Development Article. Copyright 2009 ARM Limited. All rights reserved. DHT 0004A (ID081609)
NEON Support in Compilation Tools Development Article Copyright 2009 ARM Limited. All rights reserved. DHT 0004A () NEON Support in Compilation Tools Development Article Copyright 2009 ARM Limited. All
Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)
Unit- I Introduction to c Language: C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating
Binary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
Sources: On the Web: Slides will be available on:
C programming Introduction The basics of algorithms Structure of a C code, compilation step Constant, variable type, variable scope Expression and operators: assignment, arithmetic operators, comparison,
General Framework for an Iterative Solution of Ax b. Jacobi s Method
2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,
Using a Failure Modes, Effects and Diagnostic Analysis (FMEDA) to Measure Diagnostic Coverage in Programmable Electronic Systems.
Using a Failure Modes, Effects and Diagnostic Analysis (FMEDA) to Measure Diagnostic Coverage in Programmable Electronic Systems. Dr. William M. Goble exida.com, 42 Short Rd., Perkasie, PA 18944 Eindhoven
Lumousoft Visual Programming Language and its IDE
Lumousoft Visual Programming Language and its IDE Xianliang Lu Lumousoft Inc. Waterloo Ontario Canada Abstract - This paper presents a new high-level graphical programming language and its IDE (Integration
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,
İSTANBUL AYDIN UNIVERSITY
İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER
Comp151. Definitions & Declarations
Comp151 Definitions & Declarations Example: Definition /* reverse_printcpp */ #include #include using namespace std; int global_var = 23; // global variable definition void reverse_print(const
Lecture 22: C Programming 4 Embedded Systems
Lecture 22: C Programming 4 Embedded Systems Today s Goals Basic C programming process Variables and constants in C Pointers to access addresses Using a High Level Language High-level languages More human
Informatica e Sistemi in Tempo Reale
Informatica e Sistemi in Tempo Reale Introduction to C programming Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa October 25, 2010 G. Lipari (Scuola Superiore Sant Anna)
The Behavior of Efficient Virtual Machine Interpreters on Modern Architectures
The Behavior of Efficient Virtual Machine Interpreters on Modern Architectures M. Anton Ertl and David Gregg 1 Institut für Computersprachen, TU Wien, A-1040 Wien [email protected] 2 Department
PyMTL and Pydgin Tutorial. Python Frameworks for Highly Productive Computer Architecture Research
PyMTL and Pydgin Tutorial Python Frameworks for Highly Productive Computer Architecture Research Derek Lockhart, Berkin Ilbeyi, Christopher Batten Computer Systems Laboratory School of Electrical and Computer
Software Pipelining. for (i=1, i<100, i++) { x := A[i]; x := x+1; A[i] := x
Software Pipelining for (i=1, i
Memory Systems. Static Random Access Memory (SRAM) Cell
Memory Systems This chapter begins the discussion of memory systems from the implementation of a single bit. The architecture of memory chips is then constructed using arrays of bit implementations coupled
IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS
Volume 2, No. 3, March 2011 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE
Data Deduplication Scheme for Cloud Storage
26 Data Deduplication Scheme for Cloud Storage 1 Iuon-Chang Lin and 2 Po-Ching Chien Abstract Nowadays, the utilization of storage capacity becomes an important issue in cloud storage. In this paper, we
Implementation of Full -Parallelism AES Encryption and Decryption
Implementation of Full -Parallelism AES Encryption and Decryption M.Anto Merline M.E-Commuication Systems, ECE Department K.Ramakrishnan College of Engineering-Samayapuram, Trichy. Abstract-Advanced Encryption
Architectures and Platforms
Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation
Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices
Proc. of Int. Conf. on Advances in Computer Science, AETACS Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Ms.Archana G.Narawade a, Mrs.Vaishali Kolhe b a PG student, D.Y.Patil
ASSEMBLY PROGRAMMING ON A VIRTUAL COMPUTER
ASSEMBLY PROGRAMMING ON A VIRTUAL COMPUTER Pierre A. von Kaenel Mathematics and Computer Science Department Skidmore College Saratoga Springs, NY 12866 (518) 580-5292 [email protected] ABSTRACT This paper
An Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1}
An Efficient RNS to Binary Converter Using the oduli Set {n + 1, n, n 1} Kazeem Alagbe Gbolagade 1,, ember, IEEE and Sorin Dan Cotofana 1, Senior ember IEEE, 1. Computer Engineering Laboratory, Delft University
Compiling Recursion to Reconfigurable Hardware using CLaSH
Compiling Recursion to Reconfigurable Hardware using CLaSH Ruud Harmsen University of Twente P.O. Box 217, 7500AE Enschede The Netherlands [email protected] ABSTRACT Recursion is an important
Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.
Name: Class: Date: Exam #1 - Prep True/False Indicate whether the statement is true or false. 1. Programming is the process of writing a computer program in a language that the computer can respond to
Java CPD (I) Frans Coenen Department of Computer Science
Java CPD (I) Frans Coenen Department of Computer Science Content Session 1, 12:45-14:30 (First Java Programme, Inheritance, Arithmetic) Session 2, 14:45-16:45 (Input and Programme Constructs) Materials
Chapter 5 Instructor's Manual
The Essentials of Computer Organization and Architecture Linda Null and Julia Lobur Jones and Bartlett Publishers, 2003 Chapter 5 Instructor's Manual Chapter Objectives Chapter 5, A Closer Look at Instruction
A comprehensive survey on various ETC techniques for secure Data transmission
A comprehensive survey on various ETC techniques for secure Data transmission Shaikh Nasreen 1, Prof. Suchita Wankhade 2 1, 2 Department of Computer Engineering 1, 2 Trinity College of Engineering and
Categories and Subject Descriptors C.1.1 [Processor Architecture]: Single Data Stream Architectures. General Terms Performance, Design.
Enhancing Memory Level Parallelism via Recovery-Free Value Prediction Huiyang Zhou Thomas M. Conte Department of Electrical and Computer Engineering North Carolina State University 1-919-513-2014 {hzhou,
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit
Performance Analysis and Optimization Tool
Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL [email protected] Performance Analysis Team, University of Versailles http://www.maqao.org Introduction Performance Analysis Develop
Pexip Speeds Videoconferencing with Intel Parallel Studio XE
1 Pexip Speeds Videoconferencing with Intel Parallel Studio XE by Stephen Blair-Chappell, Technical Consulting Engineer, Intel Over the last 18 months, Pexip s software engineers have been optimizing Pexip
Vector Architectures
EE482C: Advanced Computer Organization Lecture #11 Stream Processor Architecture Stanford University Thursday, 9 May 2002 Lecture #11: Thursday, 9 May 2002 Lecturer: Prof. Bill Dally Scribe: James Bonanno
Parallel Computing. Benson Muite. [email protected] http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite [email protected] http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
Robot Task-Level Programming Language and Simulation
Robot Task-Level Programming Language and Simulation M. Samaka Abstract This paper presents the development of a software application for Off-line robot task programming and simulation. Such application
Artificial Intelligence. Class: 3 rd
Artificial Intelligence Class: 3 rd Teaching scheme: 4 hours lecture credits: Course description: This subject covers the fundamentals of Artificial Intelligence including programming in logic, knowledge
Performance Measurement of Dynamically Compiled Java Executions
Performance Measurement of Dynamically Compiled Java Executions Tia Newhall and Barton P. Miller University of Wisconsin Madison Madison, WI 53706-1685 USA +1 (608) 262-1204 {newhall,bart}@cs.wisc.edu
GUJARAT TECHNOLOGICAL UNIVERSITY Computer Engineering (07) BE 1st To 8th Semester Exam Scheme & Subject Code
GUJARAT TECHNOLOGICAL UNIVERSITY Computer Engineering (07) BE 1st To 8th Semester Scheme & EVALUATION SCHEME Continuous (Theory) (E) Evaluation Practical (I) (Practical) (E) Process(M) MAX MIN MAX MIN
Degree Reduction of Interval SB Curves
International Journal of Video&Image Processing and Network Security IJVIPNS-IJENS Vol:13 No:04 1 Degree Reduction of Interval SB Curves O. Ismail, Senior Member, IEEE Abstract Ball basis was introduced
Big-data Analytics: Challenges and Opportunities
Big-data Analytics: Challenges and Opportunities Chih-Jen Lin Department of Computer Science National Taiwan University Talk at 台 灣 資 料 科 學 愛 好 者 年 會, August 30, 2014 Chih-Jen Lin (National Taiwan Univ.)
AES Power Attack Based on Induced Cache Miss and Countermeasure
AES Power Attack Based on Induced Cache Miss and Countermeasure Guido Bertoni, Vittorio Zaccaria STMicroelectronics, Advanced System Technology Agrate Brianza - Milano, Italy, {guido.bertoni, vittorio.zaccaria}@st.com
Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C
Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C 1 An essential part of any embedded system design Programming 2 Programming in Assembly or HLL Processor and memory-sensitive
Computation of Forward and Inverse MDCT Using Clenshaw s Recurrence Formula
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 5, MAY 2003 1439 Computation of Forward Inverse MDCT Using Clenshaw s Recurrence Formula Vladimir Nikolajevic, Student Member, IEEE, Gerhard Fettweis,
5MD00. Assignment Introduction. Luc Waeijen 16-12-2014
5MD00 Assignment Introduction Luc Waeijen 16-12-2014 Contents EEG application Background on EEG Early Seizure Detection Algorithm Implementation Details Super Scalar Assignment Description Tooling (simple
Large-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
More on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction
Image Compression through DCT and Huffman Coding Technique
International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
Atmel AVR4027: Tips and Tricks to Optimize Your C Code for 8-bit AVR Microcontrollers. 8-bit Atmel Microcontrollers. Application Note.
Atmel AVR4027: Tips and Tricks to Optimize Your C Code for 8-bit AVR Microcontrollers Features Atmel AVR core and Atmel AVR GCC introduction Tips and tricks to reduce code size Tips and tricks to reduce
Introduction to MIPS Assembly Programming
1 / 26 Introduction to MIPS Assembly Programming January 23 25, 2013 2 / 26 Outline Overview of assembly programming MARS tutorial MIPS assembly syntax Role of pseudocode Some simple instructions Integer
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview
Fuzzy Active Queue Management for Assured Forwarding Traffic in Differentiated Services Network
Fuzzy Active Management for Assured Forwarding Traffic in Differentiated Services Network E.S. Ng, K.K. Phang, T.C. Ling, L.Y. Por Department of Computer Systems & Technology Faculty of Computer Science
1/20/2016 INTRODUCTION
INTRODUCTION 1 Programming languages have common concepts that are seen in all languages This course will discuss and illustrate these common concepts: Syntax Names Types Semantics Memory Management We
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
Processor Architectures
ECPE 170 Jeff Shafer University of the Pacific Processor Architectures 2 Schedule Exam 3 Tuesday, December 6 th Caches Virtual Memory Input / Output OperaKng Systems Compilers & Assemblers Processor Architecture
Levels of Programming Languages. Gerald Penn CSC 324
Levels of Programming Languages Gerald Penn CSC 324 Levels of Programming Language Microcode Machine code Assembly Language Low-level Programming Language High-level Programming Language Levels of Programming
On Demand Loading of Code in MMUless Embedded System
On Demand Loading of Code in MMUless Embedded System Sunil R Gandhi *. Chetan D Pachange, Jr.** Mandar R Vaidya***, Swapnilkumar S Khorate**** *Pune Institute of Computer Technology, Pune INDIA (Mob- 8600867094;
Chapter 1. Dr. Chris Irwin Davis Email: [email protected] Phone: (972) 883-3574 Office: ECSS 4.705. CS-4337 Organization of Programming Languages
Chapter 1 CS-4337 Organization of Programming Languages Dr. Chris Irwin Davis Email: [email protected] Phone: (972) 883-3574 Office: ECSS 4.705 Chapter 1 Topics Reasons for Studying Concepts of Programming
Parallel and Distributed Computing Programming Assignment 1
Parallel and Distributed Computing Programming Assignment 1 Due Monday, February 7 For programming assignment 1, you should write two C programs. One should provide an estimate of the performance of ping-pong
International Language Character Code
, pp.161-166 http://dx.doi.org/10.14257/astl.2015.81.33 International Language Character Code with DNA Molecules Wei Wang, Zhengxu Zhao, Qian Xu School of Information Science and Technology, Shijiazhuang
Instruction Set Architecture (ISA)
Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine
Linux Driver Devices. Why, When, Which, How?
Bertrand Mermet Sylvain Ract Linux Driver Devices. Why, When, Which, How? Since its creation in the early 1990 s Linux has been installed on millions of computers or embedded systems. These systems may
X86-64 Architecture Guide
X86-64 Architecture Guide For the code-generation project, we shall expose you to a simplified version of the x86-64 platform. Example Consider the following Decaf program: class Program { int foo(int
Best practices for efficient HPC performance with large models
Best practices for efficient HPC performance with large models Dr. Hößl Bernhard, CADFEM (Austria) GmbH PRACE Autumn School 2013 - Industry Oriented HPC Simulations, September 21-27, University of Ljubljana,
A Review of Customized Dynamic Load Balancing for a Network of Workstations
A Review of Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer Science Department, University of Rochester
Analysis of Compression Algorithms for Program Data
Analysis of Compression Algorithms for Program Data Matthew Simpson, Clemson University with Dr. Rajeev Barua and Surupa Biswas, University of Maryland 12 August 3 Abstract Insufficient available memory
Introducción. Diseño de sistemas digitales.1
Introducción Adapted from: Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg431 [Original from Computer Organization and Design, Patterson & Hennessy, 2005, UCB] Diseño de sistemas digitales.1
Design of a NAND Flash Memory File System to Improve System Boot Time
International Journal of Information Processing Systems, Vol.2, No.3, December 2006 147 Design of a NAND Flash Memory File System to Improve System Boot Time Song-Hwa Park*, Tae-Hoon Lee*, and Ki-Dong
A Predictive Model for Cache-Based Side Channels in Multicore and Multithreaded Microprocessors
A Predictive Model for Cache-Based Side Channels in Multicore and Multithreaded Microprocessors Leonid Domnitser, Nael Abu-Ghazaleh and Dmitry Ponomarev Department of Computer Science SUNY-Binghamton {lenny,
Video-Conferencing System
Video-Conferencing System Evan Broder and C. Christoher Post Introductory Digital Systems Laboratory November 2, 2007 Abstract The goal of this project is to create a video/audio conferencing system. Video
340368 - FOPR-I1O23 - Fundamentals of Programming
Coordinating unit: 340 - EPSEVG - Vilanova i la Geltrú School of Engineering Teaching unit: 723 - CS - Department of Computer Science Academic year: Degree: 2015 BACHELOR'S DEGREE IN INFORMATICS ENGINEERING
Bob Boothe. Education. Research Interests. Teaching Experience
Bob Boothe Computer Science Dept. University of Southern Maine 96 Falmouth St. P.O. Box 9300 Portland, ME 04103--9300 (207) 780-4789 email: [email protected] 54 Cottage Park Rd. Portland, ME 04103 (207)
Java Virtual Machine: the key for accurated memory prefetching
Java Virtual Machine: the key for accurated memory prefetching Yolanda Becerra Jordi Garcia Toni Cortes Nacho Navarro Computer Architecture Department Universitat Politècnica de Catalunya Barcelona, Spain
A Lab Course on Computer Architecture
A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,
CPU Organization and Assembly Language
COS 140 Foundations of Computer Science School of Computing and Information Science University of Maine October 2, 2015 Outline 1 2 3 4 5 6 7 8 Homework and announcements Reading: Chapter 12 Homework:
An Introduction to Assembly Programming with the ARM 32-bit Processor Family
An Introduction to Assembly Programming with the ARM 32-bit Processor Family G. Agosta Politecnico di Milano December 3, 2011 Contents 1 Introduction 1 1.1 Prerequisites............................. 2
Data Structures For IP Lookup With Bursty Access Patterns
Data Structures For IP Lookup With Bursty Access Patterns Sartaj Sahni & Kun Suk Kim sahni, kskim @cise.ufl.edu Department of Computer and Information Science and Engineering University of Florida, Gainesville,
Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup
Chapter 12: Multiprocessor Architectures Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup Objective Be familiar with basic multiprocessor architectures and be able to
Automatic Logging of Operating System Effects to Guide Application-Level Architecture Simulation
Automatic Logging of Operating System Effects to Guide Application-Level Architecture Simulation Satish Narayanasamy, Cristiano Pereira, Harish Patil, Robert Cohn, and Brad Calder Computer Science and
Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors
Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors Lesson 05: Array Processors Objective To learn how the array processes in multiple pipelines 2 Array Processor
LONG BEACH CITY COLLEGE MEMORANDUM
LONG BEACH CITY COLLEGE MEMORANDUM DATE: May 5, 2000 TO: Academic Senate Equivalency Committee FROM: John Hugunin Department Head for CBIS SUBJECT: Equivalency statement for Computer Science Instructor
CSC230 Getting Starting in C. Tyler Bletsch
CSC230 Getting Starting in C Tyler Bletsch What is C? The language of UNIX Procedural language (no classes) Low-level access to memory Easy to map to machine language Not much run-time stuff needed Surprisingly
