How to Benchmark Code Execution Times on Intel IA-32 and IA-64 Instruction Set Architectures

Size: px
Start display at page:

Download "How to Benchmark Code Execution Times on Intel IA-32 and IA-64 Instruction Set Architectures"

Transcription

1 White Paper Gabriele Paoloni Linux Kernel/Embedded Software Engineer Intel Corporation How to Benchmark Code Execution Times on Intel IA-32 and IA-64 Instruction Set Architectures September

2 Abstract This paper provides precise methods to measure the clock cycles spent when executing a certain C code on a Linux* environment with a generic Intel architecture processor (either 32 bits or 64 bits). 2

3 Contents 1 0BIntroduction Purpose/Scope Assumptions Terminology Problem Description Introduction Problems with RDTSC Instruction in C Inline Assembly Variance Introduced by CPUID and Improvements with RTDSCP Instruction Problems with the CPUID Instruction Code Analysis Evaluation of the First Benchmarking Method Improvements Using RDTSCP Instruction The Improved Benchmarking Method Evaluation of the Improved Benchmarking Method An Alternative Method for Architecture Not Supporting RDTSCP Evaluation of the Alternative Method Resolution of the Benchmarking Methodologies Resolution with RDTSCP Resolution with the Alternative Method BSummary Appendix Reference List Figures Figure 1. Minimum Value Behavior Graph Figure 2. Variance Behavior Graph Figure 3. Minimum Value Behavior Graph Figure 4. Variance Behavior Graph Figure 5. Minimum Value Behavior Graph Figure 6. Variance Behavior Graph Figure 7. Minimum Value Behavior Graph Figure 8. Variance Behavior Graph Figure 9. Variance Behavior Graph

4 Tables Figure 10. Variance Behavior Graph Table 1. List of Terms

5 1 0BIntroduction 1.1 Purpose/Scope The purpose of this document is to provide software developers with precise methods to measure the clock cycles required to execute specific C code in a Linux environment running on a generic Intel architecture processor. These methods can be very useful in a CPU-benchmarking context, in a code-optimization context, and also in an OS-tuning context. In all these cases, the developer is interested in knowing exactly how many clock cycles are elapsed while executing code. At the time of this writing, the best description of how to benchmark code execution can be found in [1]. Unfortunately, many problems were encountered while using this method. This paper describes the problems and proposes two separate solutions. 1.2 Assumptions In this paper, all the results shown were obtained by running tests on a platform whose BIOS was optimized by removing every factor that could cause indeterminism. All power optimization, Intel Hyper-Threading technology, frequency scaling and turbo mode functionalities were turned off. The OS used was opensuse* 11.2 (linux ). 1.3 Terminology Table 1 lists the terms used in this document. Table 1. List of Terms Term Description CPU Central Processing Unit IA32 Intel 32-bit Architecture IA64 Intel 64-bit Architecture GCC GNU* Compiler Collection ICC Intel C/C++ Compiler 5

6 Term RDTSCP Description Read Time-Stamp Counter and Processor ID IA assembly instruction RTDSC Read Time-Stamp Counter and Processor ID IA assembly instruction 6

7 2 Problem Description This section explains the issues involved in reading the timestamp register and discusses the correct methodology to return precise and reliable clock cycles measurements. It is expected that readers have knowledge of basic GCC and ICC compiling techniques, basic Intel assembly syntax, and AT&T* assembly syntax. Those not interested in the problem description and method justification can skip this section and go to Section (if their platform supports the RDTSCP instruction) or Section (if not) to acquire the code. 2.1 Introduction Intel CPUs have a timestamp counter to keep track of every cycle that occurs on the CPU. Starting with the Intel Pentium processor, the devices have included a per-core timestamp register that stores the value of the timestamp counter and that can be accessed by the RDTSC and RDTSCP assembly instructions. When running a Linux OS, the developer can check if his CPU supports the RDTSCP instruction by looking at the flags field of /proc/cpuinfo ; if rdtscp is one of the flags, then it is supported. 2.2 Problems with RDTSC Instruction in C Inline Assembly Assume that you are working in a Linux environment, and are compiling by using GCC. You have C code and want to know how many clock cycles are spent to execute the code itself or just a part of it. To make sure that our measurements are not tainted by any kind of interrupt (including scheduling preemption), we are going to write a kernel module where we guarantee the exclusive ownership of the CPU when executing the code that we want to benchmark. To understand the practical implementation, let s consider the following dummy kernel module; it simply calls a function that is taking a pointer as input and is setting the pointed value to 1. We want to measure how many clock cycles it takes to call such a function: #include <linux/module.h> #include <linux/kernel.h> #include <linux/init.h> #include <linux/hardirq.h> #include <linux/preempt.h> #include <linux/sched.h> void inline measured_function(volatile int *var) { (*var) = 1; 7

8 } static int init hello_start(void) { unsigned long flags; uint64_t start, end; int variable = 0; unsigned cycles_low, cycles_high, cycles_low1, cycles_high1; printk(kern_info "Loading test module...\n"); preempt_disable(); /*we disable preemption on our CPU*/ raw_local_irq_save(flags); /*we disable hard interrupts on our CPU*/ /*at this stage we exclusively own the CPU*/ asm volatile ( "RDTSC\n\t" "mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low)); measured_function(&variable); asm volatile ( "RDTSC\n\t" "mov %%eax, %1\n\t": "=r" (cycles_high1), "=r" (cycles_low1)); raw_local_irq_restore(flags); /*we enable hard interrupts on our CPU*/ preempt_enable();/*we enable preemption*/ start = ( ((uint64_t)cycles_high << 32) cycles_low ); end = ( ((uint64_t)cycles_high1 << 32) cycles_low1 ); printk(kern_info "\n function execution time is %llu clock cycles", (endstart)); } return 0; static void exit hello_end(void) { printk(kern_info "Goodbye Mr.\n"); } module_init(hello_start); module_exit(hello_end); The RDTSC instruction loads the high-order 32 bits of the timestamp register into EDX, and the low-order 32 bits into EAX. A bitwise OR is performed to reconstruct and store the register value into a local variable. In the code above, to guarantee the exclusive ownership of the CPU before performing the measure, we disable the preemption (preempt_disable()) and we disable the hard interrupts (raw_local_irq_save()). Then we call the RDTSC assembly instruction to read the timestamp register. We call our function (measured_function()), and we read the timestamp register again (RDTSC) to see how many clock cycles have been elapsed since the first read. The two variables 8

9 start and end store the timestamp register values at the respective times of the RDTSC calls. Finally, we print the measurement on the screen. Logically the code above makes sense, but if we try to compile it, we could get segmentation faults or some weird results. This is because we didn t consider a few issues that are related to the RDTSC instruction itself and to the Intel Architecture: Register Overwriting RDTSC instruction, once called, overwrites the EAX and EDX registers. In the inline assembly that we presented above, we didn t declare any clobbered register. Basically we have to push those register statuses onto the stack before calling RDTSC and popping them afterwards. The practical solution for that is to write the inline assembly as follows (note bold items): asm volatile ("RDTSC\n\t" "mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low):: %eax, %edx ); In case we are using an IA64 platform rather than an IA32, in the list of clobbered registers we have to replace %eax, %edx with %rax, %rdx. In fact, in the Intel 64 and IA-32 Architectures Software Developer s Manual Volume 2B ([3]), it states that On processors that support the Intel 64 architecture, the high-order 32 bits of each of RAX and RDX are cleared. Out of Order Execution Starting with the Intel Pentium processor, most Intel CPUs support out-of-order execution of the code. The purpose is to optimize the penalties due to the different instruction latencies. Unfortunately this feature does not guarantee that the temporal sequence of the single compiled C instructions will respect the sequence of the instruction themselves as written in the source C file. When we call the RDTSC instruction, we pretend that that instruction will be executed exactly at the beginning and at the end of code being measured (i.e., we don t want to measure compiled code executed outside of the RDTSC calls or executed in between the calls themselves). The solution is to call a serializing instruction before calling the RDTSC one. A serializing instruction is an instruction that forces the CPU to complete every preceding instruction of the C code before continuing the program execution. By doing so we guarantee that only the code that is under measurement will be executed in between the RDTSC calls and that no part of that code will be executed outside the calls. The complete list of available serializing instructions on IA64 and IA32 can be found in the Intel 64 and IA-32 Architectures Software Developer s Manual Volume 3A [4]. Reading this manual, we find that CPUID can be executed at any privilege level to serialize instruction execution with no effect on program flow, except that the EAX, EBX, ECX and EDX registers are modified. Accordingly, the 9

10 natural choice to avoid out of order execution would be to call CPUID just before both RTDSC calls; this method works but there is a lot of variance (in terms of clock cycles) that is intrinsically associated with the CPUID instruction execution itself. This means that to guarantee serialization of instructions, we lose in terms of measurement resolution when using CPUID. A quantitative analysis about this is presented in Section An important consideration that we have to make is that the CPUID instruction overwrites EAX, EBX, ECX, and EDX registers. So we have to add EBX and ECX to the list of clobbered registers mentioned in Register Overwriting above. If we are using an IA64 rather than an IA32 platform, in the list of clobbered registers we have to replace "%eax", "%ebx", "%ecx", "%edx" with "%rax", "%rbx", "%rcx", "%rdx". In fact, in the Intel 64 and IA-32 Architectures Software Developer s Manual Volume 2A ([3]), it states that On Intel 64 processors, CPUID clears the high 32 bits of the RAX/RBX/RCX/RDX registers in all modes. Overhead in Calling CPUID and RDTSC When we call the instructions to capture the clock (the serializing one plus RDTSC) an overhead (in terms of clock cycles) is associated with the calls themselves; such overhead has to be measured and subtracted from the measurement of the code we are interested in. Later in this paper, we show how to properly measure the overhead involved in taking the measurement itself. 10

11 3 Variance Introduced by CPUID and Improvements with RTDSCP Instruction This section shows that if, from one side, the CPUID instruction guarantees no code cross-contamination, then, from a measurement perspective, the other can introduce a variance in terms of clock cycles that is too high to guarantee an acceptable measurement resolution. To solve this issue, we use an alternative implementation using the RTDSCP instruction. 3.1 Problems with the CPUID Instruction Let s consider the code shown in the Appendix. Later in this paper we reference numbered code lines in the appendix to help avoid duplication of code. Also, in this case, the code has been written in kernel space to guarantee the exclusive ownership of the CPU Code Analysis Ln98: Init function of the kernel module. Ln101: Here we declare **times double pointer. This pointer is allocated with a double array of memory (ln108 to ln122) of size BOUND_OF_LOOP*SIZE_OF_STAT: the meaning of these two values is explained later in this paper. The purpose of **times is to store all the time measurements (clock cycles). Ln102/ln103: The pointers *variances and *min_values are declared. Those pointers are used to respectively store the array of the variances and the array of minimum values of different ensembles of measures. The memory for both arrays is allocated at lines 124 to 134. Ln137: Filltimes function is called. Such function is defined at ln12; it is the core function of our code. Its purpose is to calculate the execution times of the code under measurement and to fill accordingly the **times double array. Ln19 to Ln30: In these lines we are consecutively calling the inline assembly instructions used just afterwards in the code to calculate the times. The purpose of this is to warm up the instruction cache to avoid spurious measurements due to cache effects in the first iterations of the following loop. 11

12 Ln33/34: Here we have two nested loops inside which the measurements take place. There are two reasons for having two loops for the following scenarios: When there is no function to be measured - in this case we are evaluating the statistical nature of the offset associated with the process of taking the measure itself. The inner loop is used to calculate statistic values such as minimum, maximum deviation from the minimum, variance; the outer loop is used to evaluate the ergodicity of the method taking the measures. When evaluating a function duration - the outer loop is used to increase step by step the complexity of the function itself in such a way to evaluate the goodness of the measuring method itself (in terms of clock cycles resolution). Ln38/39: Here we get the exclusive ownership of the CPU (see Section 2.2). Ln41 to Ln51: Here we implement the inline assembly code used to take the measurement. This is the part that along with this paper can change evaluation techniques and introduce improvements in the method. Ln53/54: We release the ownership of the CPU (see Section 2.2). Ln68: We fill the times array with the measured time. Ln139: At this stage the **times array is entirely filled with the calculated values. Following the array there are two nested loops: the inner one (ln145 to ln150) goes from zero to (SIZE_OF_STAT-1) and is used to calculate the minimum value and the maximum deviation from the minimum (max - min) for a certain ensemble of measures; the external one (ln139) is used to go across different ensembles. On the same ensemble the variance is calculated (ln160) and is stored in the array of variances. An accumulator (tot_var) is used to calculate the total variance (calculated also on the outer loop) of all the measurements. spurious (ln156) is a counter that is increased whenever between contiguous ensembles the minimum value of the previous is bigger than the one that follows. It is a useful index in case we are evaluating a function whose complexity is increasing along the external loop: a more complex function has to take more cycles to be executed; if the minimum measured value is smaller than the one measured on the ensemble for the less complex function, there is something wrong (we will see later how this index is useful). Finally, at ln168/169, the variance of the variances is calculated, and the variance of the minimum values. Both are needed to evaluate the ergodicity of the measurement process (if the process is ergodic the variance of variances tends to zero and, in this specific case, also the variance of the minimum value) Evaluation of the First Benchmarking Method Having built the kernel module using the code in the Appendix, we load this module and look at the kernel log ( dmesg ). The output is as follows: Loading hello module... loop_size:0 >>>> variance(cycles): 85; max_deviation: 80 ;min time:

13 loop_size:1 >>>> variance(cycles): 85; max_deviation: 56 ;min time: 456 loop_size:2 >>>> variance(cycles): 85; max_deviation: 96 ;min time: 456 loop_size:997 >>>> variance(cycles): 85; max_deviation: 92 ;min time: 456 loop_size:998 >>>> variance(cycles): 85; max_deviation: 100 ;min time: 452 loop_size:999 >>>> variance(cycles): 85; max_deviation: 60 ;min time: 452 total number of spurious min values = 262 total variance = 48 absolute max deviation = 636 variance of variances = 2306 variance of minimum values = 118 The loop_size index refers to the external loop (ln33); accordingly each row of the log shows, for a certain ensemble of measures, the variance, the maximum deviation and the minimum measured time (all of them in clock cycles). At the end of the log there are: the number of spurious minimum values (that in this case is meaningless and can be neglected): the total variance (the average of the variances in each row); the absolute maximum deviation (the maximum value amongst the max deviations of all the rows); the variance of variances and the variance of the minimum values. Looking at the results, it is clear that this method is not reliable for benchmarking. There are different reasons for this: The minimum value is not constant between different ensembles (the variance of the minimum values is 118 cycles). This means that we cannot evaluate the cost of calling the benchmarking function itself. When we are benchmarking a function we want to be able to subtract the cost of calling the benchmarking function itself from the measurement of the function to be benchmarked. This cost is the minimum possible number of cycles that it takes to call the benchmarking function (i.e., the min times in the rows of the kernel log above). Basically, in this case, each statistic is performed over 100,000 samples. The fact that over 100,000 samples an absolute minimum cannot be determined means that we cannot calculate the cost to be subtracted when benchmarking any function. A solution could be to increase the number of samples till we always get the same minimum value, but this is practically too costly since the developer would have to wait quite a lot for orders of magnitude greater than 10^5 samples. 13

14 The total variance is 48 cycles. This means that this method would introduce an uncertainty on the measure (standard deviation) of 6.9 cycles. If the developer wanted to have an average error on the measure less than 5%, it would mean that he cannot benchmark functions whose execution is shorter than 139 clock cycles. If the average desired error was less than 1%, he couldn t benchmark functions that take less than 690 cycles! The variance itself is not constant between different ensembles: the variance of the variances is 2306 cycles (i.e., a standard error on the variance of 48 cycles that is as big as the total variance itself!). This means that the standard deviation varies between measurements (i.e., the measuring process itself is not ergodic) and the error on the measure cannot be identified. A graphic view of both variances and minimum values behavior between different ensembles is shown in Figure 1 and Figure 2: Figure 1. Minimum Value Behavior Graph 1 graph clock cycles Series ensembles 14

15 Figure 2. Variance Behavior Graph 2 graph clock cycles Series ensembles 3.2 Improvements Using RDTSCP Instruction The RDTSCP instruction is described in the Intel 64 and IA-32 Architectures Software Developer s Manual Volume 2B ([3]) as an assembly instruction that, at the same time, reads the timestamp register and the CPU identifier. The value of the timestamp register is stored into the EDX and EAX registers; the value of the CPU id is stored into the ECX register ( On processors that support the Intel 64 architecture, the high order 32 bits of each of RAX, RDX, and RCX are cleared ). What is interesting in this case is the pseudo serializing property of RDTSCP. The manual states: The RDTSCP instruction waits until all previous instructions have been executed before reading the counter. However, subsequent instructions may begin execution before the read operation is performed. This means that this instruction guarantees that everything that is above its call in the source code is executed before the instruction itself is called. It cannot, however, guarantee that for optimization purposes the CPU will not execute, before the RDTSCP call, instructions that, in the source code, are placed after the RDTSCP function call itself. If this happens, a contamination caused by instructions in the source code that come after the RDTSCP will occur in the code under measurement.. 15

16 The problem is graphically described as follows: asm volatile ( "CPUID\n\t"/*serialize*/ "RDTSC\n\t"/*read the clock*/ "mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low):: "%rax", "%rbx", "%rcx", "%rdx"); /* Call the function to benchmark */ This out of order execution corrupts the*** asm volatile( "RDTSCP\n\t"/*read the clock*/ "mov %%eax, %1\n\t": "=r" (cycles_high1), "=r" (cycles_low1):: "%rax", "%rcx", "%rdx"); /*do other things*/ If we find a way to avoid the undesired behavior described above we can avoid calling the serializing CPUID instruction between the two timestamp register reads The Improved Benchmarking Method The solution to the problem presented in Section 0 is to add a CPUID instruction just after the RDTPSCP and the two mov instructions (to store in memory the value of edx and eax). The implementation is as follows: asm volatile ("CPUID\n\t" "RDTSC\n\t" "mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low):: "%rax", "%rbx", "%rcx", "%rdx"); /***********************************/ /*call the function to measure here*/ /***********************************/ asm volatile("rdtscp\n\t" "mov %%eax, %1\n\t" "CPUID\n\t": "=r" (cycles_high1), "=r" (cycles_low1):: "%rax", "%rbx", "%rcx", "%rdx"); In the code above, the first CPUID call implements a barrier to avoid out-of-order execution of the instructions above and below the RDTSC instruction. Nevertheless, this call does not affect the measurement since it comes before the RDTSC (i.e., before the timestamp register is read). The first RDTSC then reads the timestamp register and the value is stored in memory. Then the code that we want to measure is executed. If the code is a call to a function, it is recommended to declare such function as inline so that from an assembly perspective there is no overhead in calling the function itself. 16

17 The RDTSCP instruction reads the timestamp register for the second time and guarantees that the execution of all the code we wanted to measure is completed. The two mov instructions coming afterwards store the edx and eax registers values into memory. Both instructions are guaranteed to be executed after RDTSC (i.e., they don t corrupt the measure) since there is a logical dependency between RDTSCP and the register edx and eax (RDTSCP is writing those registers and the CPU is obliged to wait for RDTSCP to be finished before executing the two mov ). Finally a CPUID call guarantees that a barrier is implemented again so that it is impossible that any instruction coming afterwards is executed before CPUID itself (and logically also before RDTSCP). With this method we avoid to call a CPUID instruction in between the reads of the real-time registers (avoiding all the problems described in Section 3.1) Evaluation of the Improved Benchmarking Method In reference to the code presented in Section 3.1, we replace the previous benchmarking method with the new one, i.e., we replace ln19 to ln54 in the Appendix with the following code: asm volatile ("CPUID\n\t" "RDTSC\n\t" "mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low):: "%rax", "%rbx", "%rcx", "%rdx"); asm volatile("rdtscp\n\t" "mov %%eax, %1\n\t" "CPUID\n\t": "=r" (cycles_high1), "=r" (cycles_low1):: "%rax", "%rbx", "%rcx", "%rdx"); asm volatile ("CPUID\n\t" "RDTSC\n\t" "mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low):: "%rax", "%rbx", "%rcx", "%rdx"); asm volatile("rdtscp\n\t" "mov %%eax, %1\n\t" "CPUID\n\t": "=r" (cycles_high1), "=r" (cycles_low1):: "%rax", "%rbx", "%rcx", "%rdx"); for (j=0; j<bound_of_loop; j++) { for (i =0; i<size_of_stat; i++) { variable = 0; preempt_disable(); raw_local_irq_save(flags); asm volatile ("CPUID\n\t" "RDTSC\n\t" 17

18 "mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low):: "%rax", "%rbx", "%rcx", "%rdx"); /***********************************/ /*call the function to measure here*/ /***********************************/ asm volatile("rdtscp\n\t" "mov %%eax, %1\n\t" "CPUID\n\t": "=r" (cycles_high1), "=r" (cycles_low1):: "%rax", "%rbx", "%rcx", "%rdx"); raw_local_irq_restore(flags); preempt_enable(); If we perform the same analysis as in Section 3.2.1, we obtain a kernel log as follows: Loading hello module... loop_size:0 >>>> variance(cycles): 2; max_deviation: 4 ;min time: 44 loop_size:1 >>>> variance(cycles): 3; max_deviation: 4 ;min time: 44 loop_size:2 >>>> variance(cycles): 3; max_deviation: 4 ;min time: 44 loop_size:997 >>>> variance(cycles): 2; max_deviation: 4 ;min time: 44 loop_size:998 >>>> variance(cycles): 3; max_deviation: 4 ;min time: 44 loop_size:999 >>>> variance(cycles): 3; max_deviation: 4 ;min time: 44 total number of spurious min values = 0 total variance = 2 absolute max deviation = 104 variance of variances = 0 variance of minimum values = 0 In this case, the minimum time does not change between different ensembles (it is always the same along all the 1000 repetitions of each ensemble); this means that the overhead of calling the benchmarking function itself can be exactly determined. The total variance is 2 cycles, i.e., the standard error on the measure is 1,414 cycles (before it was 6,9 cycles). 18

19 Both the variance of variances and the variance of the minimum values are zero. This means that this improved benchmarking method is completely ergodic (between different ensembles the maximum fluctuation of the variance is 1 clock cycle and the minimum value is perfectly constant). This is the most important characteristic that we need for a method to be suitable for benchmarking purposes. For completeness, Figure 3 and Figure 4 show the same graphic analysis as done above in Section Figure 3. Minimum Value Behavior Graph 3 graph clock cycles minimum value ensembles 19

20 Figure 4. Variance Behavior Graph 4 graph4 clock cycles ensembles Variance In Figure 3 we can see that the minimum value is perfectly constant between ensembles; in Figure 4 the variance is either equal to 2 or 3 clock cycles An Alternative Method for Architecture Not Supporting RDTSCP This section presents an alternative method to benchmark code execution cycles for architectures that do not support the RDTSCP instruction. Such a method is not as good as the one presented in Section 3.2.1, but it is still much better than the one using CPUID to serialize code execution. In this method between the two timestamp register reads we serialize the code execution by writing the control register CR0. Regarding the code in the Appendix, the developer should replace ln19 to ln54 with the following: asm volatile( "CPUID\n\t" "RDTSC\n\t" "mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low):: "%rax", "%rbx", "%rcx", "%rdx"); asm volatile( "mov %%cr0, %%rax\n\t" "mov %%rax, %%cr0\n\t" "RDTSC\n\t" "mov %%eax, %1\n\t": "=r" (cycles_high1), "=r" (cycles_low1):: "%rax", "%rdx"); 20

21 asm volatile( "CPUID\n\t" "RDTSC\n\t" "mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low):: "%rax", "%rbx", "%rcx", "%rdx"); asm volatile( "mov %%cr0, %%rax\n\t" "mov %%rax, %%cr0\n\t" "RDTSC\n\t" "mov %%eax, %1\n\t": "=r" (cycles_high1), "=r" (cycles_low1):: "%rax", "%rdx"); asm volatile( "CPUID\n\t" "RDTSC\n\t" "mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low):: "%rax", "%rbx", "%rcx", "%rdx"); asm volatile( "mov %%cr0, %%rax\n\t" "mov %%rax, %%cr0\n\t" "RDTSC\n\t" "mov %%eax, %1\n\t": "=r" (cycles_high1), "=r" (cycles_low1):: "%rax", "%rdx"); for (j=0; j<bound_of_loop; j++) { for (i =0; i<size_of_stat; i++) { variable = 0; preempt_disable(); raw_local_irq_save(flags); asm volatile ("CPUID\n\t"::: "%rax", "%rbx", "%rcx", "%rdx"); asm volatile ("RDTSC\n\t" "mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low):: "%rax", "%rdx"); /*call the function to measure here*/ asm volatile("mov %%cr0, %%rax\n\t" "mov %%rax, %%cr0\n\t" "RDTSC\n\t" "mov %%eax, %1\n\t": "=r" (cycles_high1), "=r" (cycles_low1):: "%rax", "%rdx"); raw_local_irq_restore(flags); preempt_enable(); In the code above, first we have the repetition (three times) of the instructions called in the body of the following nested loops; this is just to warm up the instructions cache. Then the body of the loop is executed. We: First take the exclusive ownership of the CPU (preempt_disable(), raw_local_irq_save()) Call CPUID to serialize 21

22 Read the timestamp register the first time by RDTSC and store the value in memory Read the value of the control register CR0 into RAX register Write the value of RAX back to CR0 (this instruction serializes) Read the timestamp register the second time by RDTSC and store the value in memory Release the CPU ownership (raw_local_irq_restore, preempt_enable) Evaluation of the Alternative Method As done in Section and Section 3.2.2, hereafter we present the statistical analysis of the method. The resulting kernel log is as follows: loop_size:2 >>>> variance(cycles): 3; max_deviation: 4 ;min time: 208 loop_size:3 >>>> variance(cycles): 3; max_deviation: 4 ;min time: 208 loop_size:4 >>>> variance(cycles): 3; max_deviation: 4 ;min time: 208 loop_size:998 >>>> variance(cycles): 4; max_deviation: 4 ;min time: 208 loop_size:999 >>>> variance(cycles): 3; max_deviation: 4 ;min time: 208 total number of spurious min values = 0 total variance = 3 absolute max deviation = 176 variance of variances = 0 variance of minimum values = 0 In the log we see that the total variance of this method is 3 cycles rather than 2 and the absolute max deviation is 176 cycles rather than 104 cycles. This means that the standard error on the measure is 1,73 rather than 1,414 and the maximum error is increased as well. Nevertheless, we still have met the ergodicity requirements since the variance does not change between different ensembles (the maximum fluctuation is 1 clock cycle) as well as the minimum value; both the variance of the variances and the variance of the minimum values are zero. Such a method may be suitable for benchmarking whenever the RDTSCP instruction is not available on the CPU. As done previously, the following graphs show the behavior of the minimum values and of the variances between different ensembles. 22

23 Figure 5. Minimum Value Behavior Graph 5 graph clock cycles minimum value ensembles Figure 6. Variance Behavior Graph 6 graph6 clock cycles ensembles variance In Figure 5, we can see how the minimum value is perfectly constant between ensembles. In Figure 6, we have the variance being either equal to 3 or 4 clock cycles. 23

24 3.3 Resolution of the Benchmarking Methodologies In this section we analyze the resolution of our benchmarking methods, focusing on an Intel Core i7 processor-based platform, which is representative of a highend server solution and supports the RDTSCP instruction and out of order execution. For resolution, we mean the minimum number of sequential assembly instructions that a method is able to benchmark (appreciate). The purpose of this evaluation is mostly intended to define the methodology to calculate the resolution rather than to analyze the resolutions themselves. The resolution itself is, in fact, strictly dependant on the target machine and the developer is advised to run the following test before starting to benchmark to evaluate the benchmarking limits of the platform. With reference to the code presented in the Appendix, the developer should make the proper replacements according to the benchmarking method he intends to use (Section if RDTSCP is supported, Section otherwise). Also, the following code should be added at ln11: void inline measured_loop(unsigned int n, volatile int *var) { int k; for (k=0; k<n; k++) (*var)= 1; } and the following in place of /*call the function to measure here*/ : measured_loop(j, &variable); By doing so, the external loop of the two nested ones (j index) is intended to increase step by step the complexity of the function to benchmark; more in details between two consecutive ensembles it increases the parameter that determines the size of the measured loop, thus adding exactly one assembly instruction to the code under benchmark Resolution with RDTSCP According to the guidelines in Section 3.3, we effect the recommended replacements in the code for the method using RDTSCP instruction, we build the kernel module, and we call insmod from the shell. The resulting kernel log is as follows: loop_size:0 >>>> variance(cycles): 3; max_deviation: 16 ;min time: 44 loop_size:1 >>>> variance(cycles): 3; max_deviation: 16 ;min time: 44 loop_size:2 >>>> variance(cycles): 3; max_deviation: 48 ;min time: 44 loop_size:3 >>>> variance(cycles): 4; max_deviation: 16 ;min time: 44 24

25 loop_size:4 >>>> variance(cycles): 3; max_deviation: 32 ;min time: 44 loop_size:5 >>>> variance(cycles): 0; max_deviation: 36 ;min time: 44 loop_size:6 >>>> variance(cycles): 3; max_deviation: 28 ;min time: 48 loop_size:7 >>>> variance(cycles): 4; max_deviation: 32 ;min time: 48 loop_size:8 >>>> variance(cycles): 3; max_deviation: 16 ;min time: 48 loop_size:9 >>>> variance(cycles): 2; max_deviation: 48 ;min time: 48 loop_size:10 >>>> variance(cycles): 0; max_deviation: 28 ;min time: 48 loop_size:11 >>>> variance(cycles): 3; max_deviation: 64 ;min time: 52 loop_size:994 >>>> variance(cycles): 3; max_deviation: 4 ;min time: 2036 loop_size:995 >>>> variance(cycles): 0; max_deviation: 4 ;min time: 2036 loop_size:996 >>>> variance(cycles): 3; max_deviation: 4 ;min time: 2040 loop_size:997 >>>> variance(cycles): 21; max_deviation: 4 ;min time: 2044 loop_size:998 >>>> variance(cycles): 22; max_deviation: 112 ;min time: 2048 loop_size:999 >>>> variance(cycles): 23; max_deviation: 160 ;min time: 2048 total number of spurious min values = 0 total variance = 1 absolute max deviation = 176 variance of variances = 2 variance of minimum values = As seen, each row is an ensemble of measures for a specific size of measured_loop ; going down row by row, the complexity of the measured_loop function is increased exactly by one assembly instruction. Accordingly we see that min time increases as we scroll down the rows. Now let s look at the final part of the log: the total number of spurious min values is zero: this means that as we increase the complexity of the measured loop, the minimum measured time monotonically increases (if we perform enough measures we can exactly determine the minimum number of clock cycles that it takes to run a certain function) the total variance is 1 cycle: that means that the overall standard error is 1 cycle (that is very good) 25

26 the absolute max deviation is 176 cycles: from a benchmarking perspective it is not very important; instead this parameter is fundamental to evaluate the capabilities of the system to meet real-time constraints (we will not pursue this further since it is out of the scope of this paper). The variance of the variances is 2 cycles: this index is very representative of how reliable our benchmarking method is (i.e., the variance of the measures does not vary according to the complexity of the function under benchmark). Finally the variance of the minimum values is completely meaningless and useless in this context and can be neglected (this index was crucial for Section 0) From a resolution perspective we can see that we have the min value that is constant (44 cycles) between 0 and 5 measured assembly instructions and between 6 and 10 assembly instructions (48 clock cycles). Then it increases very regularly by four cycles every two assembly instructions. So unless the function under benchmark is very small, in this case the resolution of this benchmarking method is two assembly instructions (that is the minimum variation in the code complexity that can be revealed). For completeness, the following graph (Graph 7) shows the minimum values, and the next graph (Graph 8) shows the variances. Figure 7. Minimum Value Behavior Graph 7 graph clock cycles min value ensembles 26

27 Figure 8. Variance Behavior Graph 8 graph8 clock cycles ensembles variance Resolution with the Alternative Method According to what we did in Section 3.3.1, we run the same test using the alternative benchmarking method presented in Section The resulting kernel log is as follows: loop_size:0 >>>> variance(cycles): 3; max_deviation: 88 ;min time: 208 loop_size:1 >>>> variance(cycles): 0; max_deviation: 16 ;min time: 208 loop_size:2 >>>> variance(cycles): 4; max_deviation: 56 ;min time: 208 loop_size:3 >>>> variance(cycles): 0; max_deviation: 20 ;min time: 212 loop_size:4 >>>> variance(cycles): 3; max_deviation: 36 ;min time: 212 loop_size:5 >>>> variance(cycles): 3; max_deviation: 36 ;min time: 216 loop_size:6 >>>> variance(cycles): 4; max_deviation: 36 ;min time: 216 loop_size:7 >>>> variance(cycles): 0; max_deviation: 68 ;min time: 220 loop_size:994 >>>> variance(cycles): 28; max_deviation: 112 ;min time:

28 loop_size:995 >>>> variance(cycles): 0; max_deviation: 0 ;min time: 2216 loop_size:996 >>>> variance(cycles): 28; max_deviation: 4 ;min time: 2216 loop_size:997 >>>> variance(cycles): 0; max_deviation: 112 ;min time: 2216 loop_size:998 >>>> variance(cycles): 28; max_deviation: 116 ;min time: 2220 loop_size:999 >>>> variance(cycles): 0; max_deviation: 0 ;min time: 2224 total number of spurious min values = 0 total variance = 1 absolute max deviation = 220 variance of variances = 2 variance of minimum values = With this method we achieved results as good as the previous ones. The only difference is the absolute maximum deviation that here is slightly higher; this does not affect the quality of the method from a benchmarking perspective. For completeness, the following graphs present behaviors of the variance and the minimum value. Figure 9. Variance Behavior Graph 9 graph clock cycles min value ensembles 28

29 Figure 10. Variance Behavior Graph 10 graph10 clock cycles ensembles variance 29

30 4 5BSummary In Section and Section we showed two suitable methods for benchmarking the execution time of a generic C/C++ function running on an IA32/IA64 platform. The former should be chosen if the RDTSCP instruction is available; if not, the other one can be used. Whenever taking a measurement, the developer should perform the following steps: 1. Run the tests in Section or Section (according to the platform). 2. Analyze the variance of the variances and the variance of the minimum values to validate the method on his platform. If the values that the user obtains are not satisfactory, he may have to change the BIOS settings or the BIOS itself. 3. Calculate the resolution that the method is able to guarantee. 4. Make the measurement and subtract the offset (additional cost of calling the measuring function itself) that the user will have calculated before (minimum value from Section or Section 3.2.4). A couple final considerations should be made: Counter Overflow: The timestamp register is 64 bit. On a single overflow, we encounter no problems since we are making a difference between unsigned int and the results would be still correct. The problem arises if the duration of the code under measurement takes longer than 2^64 cycles. For a 1-GHz CPU, that would mean that your code should take longer than (2^64)/(10^9) = seconds ~ 585 years So it shouldn t be a problem or, if it is, the developer won t still be alive to see it! 32- vs. 64-Bit Architectures: Particular attention must be paid to the 64-bit registers used in the code presented in this paper. Whenever working with a 32b platform, the code presented is still valid, but whatever occurrence of rax, rbx, rcx, rdx has to be replaced respectively with eax, ebx, ecx, edx. The Intel Embedded Design Center provides qualified developers with web-based access to technical resources. Access Intel Confidential design materials, step-by-step guidance, application reference solutions, training, Intel s tool loaner program, and connect with an e-help desk and the embedded community. Design Fast. Design Smart. Get started today. 43Hhttp://intel.com/embedded/edc. 30

31 5 Appendix 1 #include <linux/module.h> 2 #include <linux/kernel.h> 3 #include <linux/init.h> 4 #include <linux/hardirq.h> 5 #include <linux/preempt.h> 6 #include <linux/sched.h> 7 8 #define SIZE_OF_STAT #define BOUND_OF_LOOP #define UINT64_MAX ( ULL) void inline Filltimes(uint64_t **times) { 13 unsigned long flags; 14 int i, j; 15 uint64_t start, end; 16 unsigned cycles_low, cycles_high, cycles_low1, cycles_high1; 17 volatile int variable = 0; asm volatile ("CPUID\n\t" 20 "RDTSC\n\t" "mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low):: "%rax", "%rbx", "%rcx", "%rdx"); 23 asm volatile ("CPUID\n\t" 24 "RDTSC\n\t" 25 "CPUID\n\t" 26 "RDTSC\n\t" "mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low):: "%rax", "%rbx", "%rcx", "%rdx"); 29 asm volatile ("CPUID\n\t" 30 "RDTSC\n\t"::: "%rax", "%rbx", "%rcx", "%rdx"); for (j=0; j<bound_of_loop; j++) { 34 for (i =0; i<size_of_stat; i++) { variable = 0; preempt_disable(); 39 raw_local_irq_save(flags); asm volatile ( 42 "CPUID\n\t" 43 "RDTSC\n\t" "mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low):: "%rax", "%rbx", "%rcx", "%rdx"); 46 /*call the function to measure here*/ 47 asm volatile( 48 "CPUID\n\t" 49 "RDTSC\n\t" 50 31

32 51 "mov %%eax, %1\n\t": "=r" (cycles_high1), "=r" (cycles_low1):: "%rax", "%rbx", "%rcx", "%rdx"); raw_local_irq_restore(flags); 54 preempt_enable(); start = ( ((uint64_t)cycles_high << 32) cycles_low ); end = ( ((uint64_t)cycles_high1 << 32) cycles_low1 ); if ( (end - start) < 0) { 62 printk(kern_err "\n\n>>>>>>>>>>>>>> CRITICAL ERROR IN TAKING THE TIME!!!!!!\n loop(%d) stat(%d) start = %llu, end = %llu, variable = %u\n", j, i, start, end, variable); 63 times[j][i] = 0; 64 } 65 else 66 { 67 times[j][i] = end - start; 68 } 69 } 70 } 71 return; 72} 73uint64_t var_calc(uint64_t *inputs, int size) 74{ 75 int i; 76 uint64_t acc = 0, previous = 0, temp_var = 0; 77 for (i=0; i< size; i++) { 78 if (acc < previous) goto overflow; 79 previous = acc; 80 acc += inputs[i]; 81 } 82 acc = acc * acc; 83 if (acc < previous) goto overflow; 84 previous = 0; 85 for (i=0; i< size; i++){ 86 if (temp_var < previous) goto overflow; 87 previous = temp_var; 88 temp_var+= (inputs[i]*inputs[i]); 89 } 90 temp_var = temp_var * size; 91 if (temp_var < previous) goto overflow; 92 temp_var =(temp_var - acc)/(((uint64_t)(size))*((uint64_t)(size))); 93 return (temp_var); 94overflow: 95 printk(kern_err "\n\n>>>>>>>>>>>>>> CRITICAL OVERFLOW ERROR IN var_calc!!!!!!\n\n"); 96 return -EINVAL; 97} 98static int init hello_start(void) 99{ 100 int i = 0, j = 0, spurious = 0, k =0; 101 uint64_t **times; 102 uint64_t *variances; 32

33 103 uint64_t *min_values; 104 uint64_t max_dev = 0, min_time = 0, max_time = 0, prev_min =0, tot_var=0, max_dev_all=0, var_of_vars=0, var_of_mins=0; printk(kern_info "Loading hello module...\n"); times = kmalloc(bound_of_loop*sizeof(uint64_t*), GFP_KERNEL); 109 if (!times) { 110 printk(kern_err "unable to allocate memory for times\n"); 111 return 0; 112 } for (j=0; j<bound_of_loop; j++) { 115 times[j] = kmalloc(size_of_stat*sizeof(uint64_t), GFP_KERNEL); 116 if (!times[j]) { 117 printk(kern_err "unable to allocate memory for times[%d]\n", j); 118 for (k=0; k<j; k++) 119 kfree(times[k]); 120 return 0; 121 } 122 } variances = kmalloc(bound_of_loop*sizeof(uint64_t), GFP_KERNEL); 125 if (!variances) { 126 printk(kern_err "unable to allocate memory for variances\n"); 127 return 0; 128 } min_values = kmalloc(bound_of_loop*sizeof(uint64_t), GFP_KERNEL); 131 if (!min_values) { 132 printk(kern_err "unable to allocate memory for min_values\n"); 133 return 0; 134 } Filltimes(times); for (j=0; j<bound_of_loop; j++) { max_dev = 0; 142 min_time = 0; 143 max_time = 0; for (i =0; i<size_of_stat; i++) { 146 if ((min_time == 0) (min_time > times[j][i])) 147 min_time = times[j][i]; 148 if (max_time < times[j][i]) 149 max_time = times[j][i]; 150 } max_dev = max_time - min_time; 153 min_values[j] = min_time; if ((prev_min!= 0) && (prev_min > min_time)) 156 spurious++; 157 if (max_dev > max_dev_all) 158 max_dev_all = max_dev; variances[j] = var_calc(times[j], SIZE_OF_STAT); 33

34 161 tot_var += variances[j]; printk(kern_err "loop_size:%d >>>> variance(cycles): %llu; max_deviation: %llu ;min time: %llu", j, variances[j], max_dev, min_time); prev_min = min_time; 166 } var_of_vars = var_calc(variances, BOUND_OF_LOOP); 169 var_of_mins = var_calc(min_values, BOUND_OF_LOOP); printk(kern_err "\n total number of spurious min values = %d", spurious); 172 printk(kern_err "\n total variance = %llu", (tot_var/bound_of_loop)); 173 printk(kern_err "\n absolute max deviation = %llu", max_dev_all); 174 printk(kern_err "\n variance of variances = %llu", var_of_vars); 175 printk(kern_err "\n variance of minimum values = %llu", var_of_mins); for (j=0; j<bound_of_loop; j++) { 178 kfree(times[j]); 179 } 180 kfree(times); 181 kfree(variances); 182 kfree(min_values); 183 return 0; 184} static void exit hello_end(void) 187{ 188 printk(kern_info "Goodbye Mr.\n"); 189} module_init(hello_start); 192module_exit(hello_end); 34

35 6 Reference List [1] Using the RDTSC Instruction for Performance Monitoring [2] GCC Inline Assembly HowTo [3] Intel 64 and IA-32 Architectures Software Developer s Manual Volume 2B [4] Intel 64 and IA-32 Architectures Software Developer s Manual Volume 2A [5] The Linux Kernel Module Programming Guide 35

36 Author Gabriele Paoloni is an Embedded/Linux software engineer in the Intel Architecture Group. 36

37 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. This paper is for informational purposes only. THIS DOCUMENT IS PROVIDED "AS IS" WITH NO WARRANTIES WHATSOEVER, INCLUDING ANY WARRANTY OF MERCHANTABILITY, NONINFRINGEMENT, FITNESS FOR ANY PARTICULAR PURPOSE, OR ANY WARRANTY OTHERWISE ARISING OUT OF ANY PROPOSAL, SPECIFICATION OR SAMPLE. Intel disclaims all liability, including liability for infringement of any proprietary rights, relating to use of information in this specification. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted herein. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, Go to: 4Hhttp:// BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Inside, Core Inside, i960, Intel, the Intel logo, Intel AppUp, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, the Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel Sponsors of Tomorrow., the Intel Sponsors of Tomorrow. logo, Intel StrataFlash, Intel Viiv, Intel vpro, Intel XScale, InTru, the InTru logo, InTru soundmark, Itanium, Itanium Inside, MCS, MMX, Moblin, Pentium, Pentium Inside, skoool, the skoool logo, Sound Mark, The Journey Inside, vpro Inside, VTune, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Copyright 2010 Intel Corporation. All rights reserved. 37

Using the RDTSC Instruction for Performance Monitoring

Using the RDTSC Instruction for Performance Monitoring Using the Instruction for Performance Monitoring http://developer.intel.com/drg/pentiumii/appnotes/pm1.htm Using the Instruction for Performance Monitoring Information in this document is provided in connection

More information

Upgrading Intel AMT 5.0 drivers to Linux kernel v2.6.31

Upgrading Intel AMT 5.0 drivers to Linux kernel v2.6.31 White Paper Zerene Sangma Platform Application Engineer Intel Corporation Upgrading Intel AMT 5.0 drivers to Linux kernel v2.6.31 For Intel Q45 and Intel GM45 based embedded platforms June 2010 323961

More information

ARM* to Intel Atom Microarchitecture - A Migration Study

ARM* to Intel Atom Microarchitecture - A Migration Study White Paper Mark Oliver Senior Systems Engineer Intel Corporation ARM* to Intel Atom Microarchitecture - A Migration Study November 2011 326398-001 1 Introduction At Intel, our engineers do not perform

More information

CPU performance monitoring using the Time-Stamp Counter register

CPU performance monitoring using the Time-Stamp Counter register CPU performance monitoring using the Time-Stamp Counter register This laboratory work introduces basic information on the Time-Stamp Counter CPU register, which is used for performance monitoring. The

More information

Internal LVDS Dynamic Backlight Brightness Control

Internal LVDS Dynamic Backlight Brightness Control White Paper Ho Nee Shen Senior Software Engineer Intel Corporation Chan Swee Tat System Engineer Intel Corporation Internal LVDS Dynamic Backlight Brightness Control A platform and software design using

More information

Intel Platform Controller Hub EG20T

Intel Platform Controller Hub EG20T Intel Platform Controller Hub EG20T General Purpose Input Output (GPIO) Driver for Windows* Order Number: 324257-002US Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION

More information

Using GStreamer for hardware accelerated video decoding on Intel Atom Processor E6xx series

Using GStreamer for hardware accelerated video decoding on Intel Atom Processor E6xx series White Paper Abhishek Girotra Graphics SW TME Intel Corporation Using GStreamer for hardware accelerated video decoding on Intel Atom Processor E6xx series September 2010 324294 Contents Executive Summary...3

More information

DDR2 x16 Hardware Implementation Utilizing the Intel EP80579 Integrated Processor Product Line

DDR2 x16 Hardware Implementation Utilizing the Intel EP80579 Integrated Processor Product Line Utilizing the Intel EP80579 Integrated Processor Product Line Order Number: 320296-002US Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

PHYSICAL CORES V. ENHANCED THREADING SOFTWARE: PERFORMANCE EVALUATION WHITEPAPER

PHYSICAL CORES V. ENHANCED THREADING SOFTWARE: PERFORMANCE EVALUATION WHITEPAPER PHYSICAL CORES V. ENHANCED THREADING SOFTWARE: PERFORMANCE EVALUATION WHITEPAPER Preface Today s world is ripe with computing technology. Computing technology is all around us and it s often difficult

More information

Cyber Security Framework: Intel s Implementation Tools & Approach

Cyber Security Framework: Intel s Implementation Tools & Approach Cyber Security Framework: Intel s Implementation Tools & Approach Tim Casey Senior Strategic Risk Analyst @timcaseycyber NIST Workshop #6 October 29, 2014 Intel s Goals in Using the CSF Establish alignment

More information

Debugging Machine Check Exceptions on Embedded IA Platforms

Debugging Machine Check Exceptions on Embedded IA Platforms White Paper Ashley Montgomery Platform Application Engineer Intel Corporation Tian Tian Platform Application Engineer Intel Corporation Debugging Machine Check Exceptions on Embedded IA Platforms July

More information

Processor Reorder Buffer (ROB) Timeout

Processor Reorder Buffer (ROB) Timeout White Paper Ai Bee Lim Senior Platform Application Engineer Embedded Communications Group Performance Products Division Intel Corporation Jack R Johnson Senior Platform Application Engineer Embedded Communications

More information

Accessing the Real Time Clock Registers and the NMI Enable Bit

Accessing the Real Time Clock Registers and the NMI Enable Bit White Paper Sam Fleming Technical Marketing Engineer Intel Corporation Accessing the Real Time Clock Registers and the NMI Enable Bit A Study in I/O Locations 0x70-0x77 January 2009 321088 1 Executive

More information

White Paper David Hibler Jr Platform Solutions Engineer Intel Corporation. Considerations for designing an Embedded IA System with DDR3 ECC SO-DIMMs

White Paper David Hibler Jr Platform Solutions Engineer Intel Corporation. Considerations for designing an Embedded IA System with DDR3 ECC SO-DIMMs White Paper David Hibler Jr Platform Solutions Engineer Intel Corporation Considerations for designing an Embedded IA System with DDR3 ECC SO-DIMMs September 2012 326491 Executive Summary What are ECC

More information

Intel EP80579 Software for Security Applications on Intel QuickAssist Technology Cryptographic API Reference

Intel EP80579 Software for Security Applications on Intel QuickAssist Technology Cryptographic API Reference Intel EP80579 Software for Security Applications on Intel QuickAssist Technology Cryptographic API Reference Automatically generated from sources, May 19, 2009. Reference Number: 320184, Revision -003

More information

Intel IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor: Spread-Spectrum Clocking to Reduce EMI

Intel IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor: Spread-Spectrum Clocking to Reduce EMI Intel IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor: Spread-Spectrum Clocking to Reduce EMI Application Note July 2004 Document Number: 254028-002 INFORMATION IN THIS DOCUMENT

More information

Embedded Controller Usage in Low Power Embedded Designs

Embedded Controller Usage in Low Power Embedded Designs White Paper Matthew Lee Platform Application Engineer Intel Corporation Embedded Controller Usage in Low Power Embedded Designs An Overview September 2011 326133 Executive Summary Embedded Controllers

More information

CPU Monitoring With DTS/PECI

CPU Monitoring With DTS/PECI White Paper Michael Berktold Thermal/Mechanical Engineer Tian Tian Technical Marketing Engineer Intel Corporation CPU Monitoring With DTS/PECI September 2010 322683 Executive Summary This document describes

More information

IT@Intel. Memory Sizing for Server Virtualization. White Paper Intel Information Technology Computer Manufacturing Server Virtualization

IT@Intel. Memory Sizing for Server Virtualization. White Paper Intel Information Technology Computer Manufacturing Server Virtualization White Paper Intel Information Technology Computer Manufacturing Server Virtualization Memory Sizing for Server Virtualization Intel IT has standardized on 16 gigabytes (GB) of memory for dual-socket virtualization

More information

CT Bus Clock Fallback for Linux Operating Systems

CT Bus Clock Fallback for Linux Operating Systems CT Bus Clock Fallback for Linux Operating Systems Demo Guide August 2005 05-1900-002 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL

More information

Enabling new usage models for Intel Embedded Platforms

Enabling new usage models for Intel Embedded Platforms White Paper David Galus Graphics Platform Application Engineer Kirk Blum Graphics Solution Architect Intel Corporation Hybrid Multimonitor Support Enabling new usage models for Intel Embedded Platforms

More information

Intel Rapid Storage Technology (Intel RST) in Linux*

Intel Rapid Storage Technology (Intel RST) in Linux* White Paper Ramu Ramakesavan (Technical Marketing Engineer) Patrick Thomson (Software Engineer) Intel Corporation Intel Rapid Storage Technology (Intel RST) in Linux* August 2011 326024 Executive Summary

More information

Performance monitoring with Intel Architecture

Performance monitoring with Intel Architecture Performance monitoring with Intel Architecture CSCE 351: Operating System Kernels Lecture 5.2 Why performance monitoring? Fine-tune software Book-keeping Locating bottlenecks Explore potential problems

More information

Enhanced Intel SpeedStep Technology for the Intel Pentium M Processor

Enhanced Intel SpeedStep Technology for the Intel Pentium M Processor Enhanced Intel SpeedStep Technology for the Intel Pentium M Processor White Paper March 2004 Order Number: 301170-001 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

Considerations for Designing an Embedded Intel Architecture System with System Memory Down

Considerations for Designing an Embedded Intel Architecture System with System Memory Down White Paper David Hibler Jr Technical Marketing Engineer Intel Corporation Considerations for Designing an Embedded Intel Architecture System with System Memory Down August 2009 322506 Executive Summary

More information

Software PBX Performance on Intel Multi- Core Platforms - a Study of Asterisk* White Paper

Software PBX Performance on Intel Multi- Core Platforms - a Study of Asterisk* White Paper Software PBX Performance on Intel Multi- Core Platforms - a Study of Asterisk* Order Number: 318862-001US Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS.

More information

Developing secure software A practical approach

Developing secure software A practical approach Developing secure software A practical approach Juan Marcelo da Cruz Pinto Security Architect Legal notice Intel Active Management Technology requires the computer system to have an Intel(R) AMT-enabled

More information

Intel Embedded Virtualization Manager

Intel Embedded Virtualization Manager White Paper Kelvin Lum Fee Foon Kong Platform Application Engineer, ECG Penang Intel Corporation Kam Boon Hee (Thomas) Marketing Development Manager, ECG Penang Intel Corporation Intel Embedded Virtualization

More information

White Paper Amy Chong Yew Ee Online Sales Account Manager APAC Online Sales Center Intel Corporation. BOM Cost Reduction by Removing S3 State

White Paper Amy Chong Yew Ee Online Sales Account Manager APAC Online Sales Center Intel Corporation. BOM Cost Reduction by Removing S3 State White Paper Amy Chong Yew Ee Online Sales Account Manager APAC Online Sales Center Intel Corporation BOM Cost Reduction by Removing S3 State May 2011 325448 Executive Summary In today s embedded design,

More information

Page Modification Logging for Virtual Machine Monitor White Paper

Page Modification Logging for Virtual Machine Monitor White Paper Page Modification Logging for Virtual Machine Monitor White Paper This document is intended only for VMM or hypervisor software developers and not for application developers or end-customers. Readers are

More information

Video Encoding on Intel Atom Processor E38XX Series using Intel EMGD and GStreamer

Video Encoding on Intel Atom Processor E38XX Series using Intel EMGD and GStreamer White Paper Lim Siew Hoon Graphics Software Engineer Intel Corporation Kumaran Kalaiyappan Graphics Software Engineer Intel Corporation Tay Boon Wooi Graphics Software Engineer Intel Corporation Video

More information

Implementing Multiple Displays with IEGD Multi-GPU - Multi-Monitor Mode on Intel Atom Processor with Intel System Controller Hub US15W Chipset

Implementing Multiple Displays with IEGD Multi-GPU - Multi-Monitor Mode on Intel Atom Processor with Intel System Controller Hub US15W Chipset White Paper Chris Wojslaw Lead Technical Marketing Engineer Kirk Blum Graphics Solution Architect Intel Corporation Implementing Multiple Displays with IEGD Multi-GPU - Multi-Monitor Mode on Intel Atom

More information

Intel Core TM i7-660ue, i7-620le/ue, i7-610e, i5-520e, i3-330e and Intel Celeron Processor P4505, U3405 Series

Intel Core TM i7-660ue, i7-620le/ue, i7-610e, i5-520e, i3-330e and Intel Celeron Processor P4505, U3405 Series Intel Core TM i7-660ue, i7-620le/ue, i7-610e, i5-520e, i3-330e and Intel Celeron Processor P4505, U3405 Series Datasheet Addendum Specification Update Document Number: 323179 Legal Lines and Disclaimers

More information

Contents -------- Overview and Product Contents -----------------------------

Contents -------- Overview and Product Contents ----------------------------- ------------------------------------------------------------------------ Intel(R) Threading Building Blocks - Release Notes Version 2.0 ------------------------------------------------------------------------

More information

Intel(R) IT Director User's Guide

Intel(R) IT Director User's Guide Intel(R) IT Director User's Guide Table of Contents Disclaimer and Legal Information... 1 Introduction... 3 To set up Intel IT Director:... 3... 3 System Configuration... 5... 5 Settings Page: Overview...

More information

Development for Mobile Devices Tools from Intel, Platform of Your Choice!

Development for Mobile Devices Tools from Intel, Platform of Your Choice! Development for Mobile Devices Tools from Intel, Platform of Your Choice! Sergey Lunev, Intel Corporation HTML5 Tools Development Manager Optional: Download App Preview Android bit.ly/1i8vegl ios bit.ly/1a3w7bk

More information

Inside Linux* graphics

Inside Linux* graphics White Paper Ragav Gopalan Platform Applications Engineer Intel Corporation Inside Linux* graphics Understanding the components, upgrading drivers, and advanced use cases April 2011 325350 Inside Linux*

More information

Intel Processor Serial Number

Intel Processor Serial Number APPLICATION NOTE Intel Processor Serial Number March 1999 ORDER NUMBER: 245125-001 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel

More information

64-Bit NASM Notes. Invoking 64-Bit NASM

64-Bit NASM Notes. Invoking 64-Bit NASM 64-Bit NASM Notes The transition from 32- to 64-bit architectures is no joke, as anyone who has wrestled with 32/64 bit incompatibilities will attest We note here some key differences between 32- and 64-bit

More information

Keil C51 Cross Compiler

Keil C51 Cross Compiler Keil C51 Cross Compiler ANSI C Compiler Generates fast compact code for the 8051 and it s derivatives Advantages of C over Assembler Do not need to know the microcontroller instruction set Register allocation

More information

Device Management API for Windows* and Linux* Operating Systems

Device Management API for Windows* and Linux* Operating Systems Device Management API for Windows* and Linux* Operating Systems Library Reference September 2004 05-2222-002 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS

More information

Dual-Core Processors on Dell-Supported Operating Systems

Dual-Core Processors on Dell-Supported Operating Systems Dual-Core Processors on Dell-Supported Operating Systems With the advent of Intel s dual-core, in addition to existing Hyper-Threading capability, there is confusion regarding the number of that software

More information

Xen in Embedded Systems. Ray Kinsella Senior Software Engineer Embedded and Communications Group Intel Corporation

Xen in Embedded Systems. Ray Kinsella Senior Software Engineer Embedded and Communications Group Intel Corporation Xen in Embedded Systems Ray Kinsella Senior Software Engineer Embedded and Communications Group Intel Corporation Legal Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS.

More information

Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction

Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction White Paper Vinodh Gopal Erdinc Ozturk Jim Guilford Gil Wolrich Wajdi Feghali Martin Dixon IA Architects Intel Corporation Deniz Karakoyunlu PhD Student Worcester Polytechnic Institute Fast CRC Computation

More information

How To Install An Intel System Studio 2015 For Windows* For Free On A Computer Or Mac Or Ipa (For Free)

How To Install An Intel System Studio 2015 For Windows* For Free On A Computer Or Mac Or Ipa (For Free) Intel System Studio 2015 for Windows* Installation Guide and Release Notes Installation Guide and Release Notes for Windows* Host and Windows* target Document number: 331182-002US 8 October 2014 Contents

More information

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual Overview Metrics Monitor is part of Intel Media Server Studio 2015 for Linux Server. Metrics Monitor is a user space shared library

More information

Processing Multiple Buffers in Parallel to Increase Performance on Intel Architecture Processors

Processing Multiple Buffers in Parallel to Increase Performance on Intel Architecture Processors White Paper Vinodh Gopal Jim Guilford Wajdi Feghali Erdinc Ozturk Gil Wolrich Martin Dixon IA Architects Intel Corporation Processing Multiple Buffers in Parallel to Increase Performance on Intel Architecture

More information

Intel Server Raid Controller. RAID Configuration Utility (RCU)

Intel Server Raid Controller. RAID Configuration Utility (RCU) Intel Server Raid Controller RAID Configuration Utility (RCU) Revision 1.1 July 2000 Revision History Date Rev Modifications 02/13/00 1.0 Initial Release 07/20/00 1.1 Update to include general instructions

More information

Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com

Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com CSCI-UA.0201-003 Computer Systems Organization Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com Some slides adapted (and slightly modified)

More information

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

Measuring Cache and Memory Latency and CPU to Memory Bandwidth White Paper Joshua Ruggiero Computer Systems Engineer Intel Corporation Measuring Cache and Memory Latency and CPU to Memory Bandwidth For use with Intel Architecture December 2008 1 321074 Executive Summary

More information

Using Windows* 7/Windows Embedded Standard 7* with Platforms Based on the Intel Atom Processor Z670/Z650 and Intel SM35 Express Chipset

Using Windows* 7/Windows Embedded Standard 7* with Platforms Based on the Intel Atom Processor Z670/Z650 and Intel SM35 Express Chipset Using Windows* 7/Windows Embedded Standard 7* with Platforms Based on the Intel Atom Processor Z670/Z650 and Intel SM35 Express Chipset Quick Start Guide December 2011 Document Number: 326555-001 Introduction

More information

Intel Virtualization Technology FlexMigration Application Note

Intel Virtualization Technology FlexMigration Application Note Intel Virtualization Technology FlexMigration Application Note This document is intended only for VMM or hypervisor software developers and not for application developers or end-customers. Readers are

More information

Software Solutions for Multi-Display Setups

Software Solutions for Multi-Display Setups White Paper Bruce Bao Graphics Application Engineer Intel Corporation Software Solutions for Multi-Display Setups January 2013 328563-001 Executive Summary Multi-display systems are growing in popularity.

More information

IT@Intel. Comparing Multi-Core Processors for Server Virtualization

IT@Intel. Comparing Multi-Core Processors for Server Virtualization White Paper Intel Information Technology Computer Manufacturing Server Virtualization Comparing Multi-Core Processors for Server Virtualization Intel IT tested servers based on select Intel multi-core

More information

White Paper Amit Aneja Platform Architect Intel Corporation. Xen* Hypervisor Case Study - Designing Embedded Virtualized Intel Architecture Platforms

White Paper Amit Aneja Platform Architect Intel Corporation. Xen* Hypervisor Case Study - Designing Embedded Virtualized Intel Architecture Platforms White Paper Amit Aneja Platform Architect Intel Corporation Xen* Hypervisor Case Study - Designing Embedded Virtualized Intel Architecture Platforms March 2011 325258 Executive Summary Virtualization technology

More information

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze Whitepaper December 2012 Anita Banerjee Contents Introduction... 3 Sorenson Squeeze... 4 Intel QSV H.264... 5 Power Performance...

More information

An Architecture to Deliver a Healthcare Dial-tone

An Architecture to Deliver a Healthcare Dial-tone An Architecture to Deliver a Healthcare Dial-tone Using SOA for Healthcare Data Interoperability Joe Natoli Platform Architect Intel SOA Products Division April 2008 Legal Notices This presentation is

More information

Intel Media SDK Library Distribution and Dispatching Process

Intel Media SDK Library Distribution and Dispatching Process Intel Media SDK Library Distribution and Dispatching Process Overview Dispatching Procedure Software Libraries Platform-Specific Libraries Legal Information Overview This document describes the Intel Media

More information

Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C

Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C 1 An essential part of any embedded system design Programming 2 Programming in Assembly or HLL Processor and memory-sensitive

More information

Intel Virtualization Technology FlexMigration Application Note

Intel Virtualization Technology FlexMigration Application Note Intel Virtualization Technology FlexMigration Application Note This document is intended only for VMM or hypervisor software developers and not for application developers or end-customers. Readers are

More information

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service Eddie Dong, Yunhong Jiang 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

Intel Media SDK Features in Microsoft Windows 7* Multi- Monitor Configurations on 2 nd Generation Intel Core Processor-Based Platforms

Intel Media SDK Features in Microsoft Windows 7* Multi- Monitor Configurations on 2 nd Generation Intel Core Processor-Based Platforms Intel Media SDK Features in Microsoft Windows 7* Multi- Monitor Configurations on 2 nd Generation Intel Core Processor-Based Platforms Technical Advisory December 2010 Version 1.0 Document Number: 29437

More information

Hexadecimal Object File Format Specification

Hexadecimal Object File Format Specification Hexadecimal Object File Format Specification Revision A January 6, 1988 This specification is provided "as is" with no warranties whatsoever, including any warranty of merchantability, noninfringement,

More information

Breakthrough AES Performance with. Intel AES New Instructions

Breakthrough AES Performance with. Intel AES New Instructions White Paper Breakthrough AES Performance with Intel AES New Instructions Kahraman Akdemir, Martin Dixon, Wajdi Feghali, Patrick Fay, Vinodh Gopal, Jim Guilford, Erdinc Ozturk, Gil Wolrich, Ronen Zohar

More information

Intel RAID Controller Troubleshooting Guide

Intel RAID Controller Troubleshooting Guide Intel RAID Controller Troubleshooting Guide A Guide for Technically Qualified Assemblers of Intel Identified Subassemblies/Products Intel order number C18781-001 September 2, 2002 Revision History Troubleshooting

More information

Intel EP80579 Software for IP Telephony Applications on Intel QuickAssist Technology

Intel EP80579 Software for IP Telephony Applications on Intel QuickAssist Technology Intel EP80579 Software for IP Telephony Applications on Intel QuickAssist Technology Linux* Tuning Guide September 2008 Order Number: 320524-001US Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT

More information

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Intel Data Direct I/O Technology (Intel DDIO): A Primer > Intel Data Direct I/O Technology (Intel DDIO): A Primer > Technical Brief February 2012 Revision 1.0 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

Intel Server Board S3420GPRX Intel Server System SR1630GPRX Intel Server System SR1630HGPRX

Intel Server Board S3420GPRX Intel Server System SR1630GPRX Intel Server System SR1630HGPRX Server WHQL Testing Services Enterprise Platforms and Services Division Intel Server Board S3420GPRX Intel Server System SR1630GPRX Intel Server System SR1630HGPRX Rev 1.0 Server Test Submission (STS)

More information

Intel Dialogic System Software for PCI Products on Windows

Intel Dialogic System Software for PCI Products on Windows Intel Dialogic System Software for PCI Products on Windows Administration Guide November 2003 05-1910-001 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS

More information

Performance Evaluation and Optimization of A Custom Native Linux Threads Library

Performance Evaluation and Optimization of A Custom Native Linux Threads Library Center for Embedded Computer Systems University of California, Irvine Performance Evaluation and Optimization of A Custom Native Linux Threads Library Guantao Liu and Rainer Dömer Technical Report CECS-12-11

More information

Intel Virtualization Technology (VT) in Converged Application Platforms

Intel Virtualization Technology (VT) in Converged Application Platforms Intel Virtualization Technology (VT) in Converged Application Platforms Enabling Improved Utilization, Change Management, and Cost Reduction through Hardware Assisted Virtualization White Paper January

More information

Specification Update. January 2014

Specification Update. January 2014 Intel Embedded Media and Graphics Driver v36.15.0 (32-bit) & v3.15.0 (64-bit) for Intel Processor E3800 Product Family/Intel Celeron Processor * Release Specification Update January 2014 Notice: The Intel

More information

How To Get A Client Side Virtualization Solution For Your Financial Services Business

How To Get A Client Side Virtualization Solution For Your Financial Services Business SOLUTION BRIEF Financial Services Industry 2nd Generation Intel Core i5 vpro and Core i7 vpro Processors Benefits of Client-Side Virtualization A Flexible, New Solution for Improving Manageability, Security,

More information

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture White Paper Intel Xeon processor E5 v3 family Intel Xeon Phi coprocessor family Digital Design and Engineering Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture Executive

More information

Haswell Cryptographic Performance

Haswell Cryptographic Performance White Paper Sean Gulley Vinodh Gopal IA Architects Intel Corporation Haswell Cryptographic Performance July 2013 329282-001 Executive Summary The new Haswell microarchitecture featured in the 4 th generation

More information

Introduction to PCI Express Positioning Information

Introduction to PCI Express Positioning Information Introduction to PCI Express Positioning Information Main PCI Express is the latest development in PCI to support adapters and devices. The technology is aimed at multiple market segments, meaning that

More information

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms EXECUTIVE SUMMARY Intel Cloud Builder Guide Intel Xeon Processor-based Servers Red Hat* Cloud Foundations Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms Red Hat* Cloud Foundations

More information

Jorix kernel: real-time scheduling

Jorix kernel: real-time scheduling Jorix kernel: real-time scheduling Joris Huizer Kwie Min Wong May 16, 2007 1 Introduction As a specialized part of the kernel, we implemented two real-time scheduling algorithms: RM (rate monotonic) and

More information

MCA Enhancements in Future Intel Xeon Processors June 2013

MCA Enhancements in Future Intel Xeon Processors June 2013 MCA Enhancements in Future Intel Xeon Processors June 2013 Reference Number: 329176-001, Revision: 1.0 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR

More information

EMC Business Continuity for Microsoft SQL Server Enabled by SQL DB Mirroring Celerra Unified Storage Platforms Using iscsi

EMC Business Continuity for Microsoft SQL Server Enabled by SQL DB Mirroring Celerra Unified Storage Platforms Using iscsi EMC Business Continuity for Microsoft SQL Server Enabled by SQL DB Mirroring Applied Technology Abstract Microsoft SQL Server includes a powerful capability to protect active databases by using either

More information

(Refer Slide Time: 00:01:16 min)

(Refer Slide Time: 00:01:16 min) Digital Computer Organization Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture No. # 04 CPU Design: Tirning & Control

More information

An Implementation Of Multiprocessor Linux

An Implementation Of Multiprocessor Linux An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than

More information

Intel 8086 architecture

Intel 8086 architecture Intel 8086 architecture Today we ll take a look at Intel s 8086, which is one of the oldest and yet most prevalent processor architectures around. We ll make many comparisons between the MIPS and 8086

More information

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging Memory Management Outline Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging 1 Background Memory is a large array of bytes memory and registers are only storage CPU can

More information

RAID Basics Training Guide

RAID Basics Training Guide RAID Basics Training Guide Discover a Higher Level of Performance RAID matters. Rely on Intel RAID. Table of Contents 1. What is RAID? 2. RAID Levels RAID 0 RAID 1 RAID 5 RAID 6 RAID 10 RAID 0+1 RAID 1E

More information

A Study of Performance Monitoring Unit, perf and perf_events subsystem

A Study of Performance Monitoring Unit, perf and perf_events subsystem A Study of Performance Monitoring Unit, perf and perf_events subsystem Team Aman Singh Anup Buchke Mentor Dr. Yann-Hang Lee Summary Performance Monitoring Unit, or the PMU, is found in all high end processors

More information

Intel Desktop Board DG31GL

Intel Desktop Board DG31GL Intel Desktop Board DG31GL Basic Certified Motherboard Logo Program (MLP) Report 4/1/2008 Purpose: This report describes the DG31GL Motherboard Logo Program testing run conducted by Intel Corporation.

More information

Intel Server Board S3420GPV

Intel Server Board S3420GPV Server WHQL Testing Services Enterprise Platforms and Services Division Intel Server Board S3420GPV Rev 1.0 Server Test Submission (STS) Report For the Microsoft Windows Logo Program (WLP) Dec. 30 th,

More information

Intel Storage System SSR212CC Enclosure Management Software Installation Guide For Red Hat* Enterprise Linux

Intel Storage System SSR212CC Enclosure Management Software Installation Guide For Red Hat* Enterprise Linux Intel Storage System SSR212CC Enclosure Management Software Installation Guide For Red Hat* Enterprise Linux Order Number: D58855-002 Disclaimer Information in this document is provided in connection with

More information

Atmel AVR4027: Tips and Tricks to Optimize Your C Code for 8-bit AVR Microcontrollers. 8-bit Atmel Microcontrollers. Application Note.

Atmel AVR4027: Tips and Tricks to Optimize Your C Code for 8-bit AVR Microcontrollers. 8-bit Atmel Microcontrollers. Application Note. Atmel AVR4027: Tips and Tricks to Optimize Your C Code for 8-bit AVR Microcontrollers Features Atmel AVR core and Atmel AVR GCC introduction Tips and tricks to reduce code size Tips and tricks to reduce

More information

Intel X38 Express Chipset Memory Technology and Configuration Guide

Intel X38 Express Chipset Memory Technology and Configuration Guide Intel X38 Express Chipset Memory Technology and Configuration Guide White Paper January 2008 Document Number: 318469-002 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

Programing the Microprocessor in C Microprocessor System Design and Interfacing ECE 362

Programing the Microprocessor in C Microprocessor System Design and Interfacing ECE 362 PURDUE UNIVERSITY Programing the Microprocessor in C Microprocessor System Design and Interfacing ECE 362 Course Staff 1/31/2012 1 Introduction This tutorial is made to help the student use C language

More information

Intel Desktop Board DG43RK

Intel Desktop Board DG43RK Intel Desktop Board DG43RK Specification Update December 2010 Order Number: E92421-003US The Intel Desktop Board DG43RK may contain design defects or errors known as errata, which may cause the product

More information

IA-32 Intel Architecture Software Developer s Manual

IA-32 Intel Architecture Software Developer s Manual IA-32 Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The IA-32 Intel Architecture Software Developer s Manual consists of three volumes: Basic Architecture, Order Number

More information

Computer Organization and Architecture

Computer Organization and Architecture Computer Organization and Architecture Chapter 11 Instruction Sets: Addressing Modes and Formats Instruction Set Design One goal of instruction set design is to minimize instruction length Another goal

More information

RAID and Storage Options Available on Intel Server Boards and Systems

RAID and Storage Options Available on Intel Server Boards and Systems and Storage Options Available on Intel Server Boards and Systems Revision 1.0 March, 009 Revision History and Storage Options Available on Intel Server Boards and Systems Revision History Date Revision

More information

Steps to Migrating to a Private Cloud

Steps to Migrating to a Private Cloud Deploying and Managing Private Clouds The Essentials Series Steps to Migrating to a Private Cloud sponsored by Introduction to Realtime Publishers by Don Jones, Series Editor For several years now, Realtime

More information