System Configuration and Programming Considerations for High Performance Embedded Systems on Multicore x86/64- Based Systems

Size: px
Start display at page:

Download "System Configuration and Programming Considerations for High Performance Embedded Systems on Multicore x86/64- Based Systems"

Transcription

1 System Configuration and Programming Considerations for High Performance Embedded Systems on Multicore x86/64- Based Systems Daron Underwood, CTO, IntervalZero, Inc. 400 Fifth Avenue, Fourth Floor Waltham, MA

2 Introduction It is no secret that more and more companies are looking for ways to take advantage of the powerful, lowcost General Purpose Processors (GPP) available today, in order to reduce cost of product design, development and manufacturing, and to quicken their time-to-market. Although these processors are very powerful, it is imperative that developers have some basic understanding of the hardware architecture in order to squeeze out every last bit of performance in order to use these systems for extremely high-performance products. This paper will focus on the areas where the maximum benefit can be realized to optimize the determinism of these multicore systems. The main areas of focus are: System configuration, hardware, firmware, and software Multicore/multithreading programming Memory, I/O and cache The performance increases and scalability of SMP is immediately attractive to anyone developing embedded systems. On one end, you have those who want to take advantage of the ability to create systems that are seemingly segregated functionally, yet tightly integrated. On the other are those who need extremely high performance. Then there is everything in between. As attractive as all these design possibilities are with the new multicore systems, it does come at a price. That price turns out to be a need to manage the resource usage of these robust processors which are implemented to increase the performance for normal general purpose use whether for consumer or server systems. In many cases, the functional requirements for embedded systems, realtime/deterministic or tightly-bounded high performance, are impacted adversely by these very mechanisms. Although this seems to be a show stopper, there is a silver lining. With some effort to understand these mechanisms as well as the software and configuration techniques available to manipulate them to one s advantage, the pay-off can be tremendous in terms of performance and scalability. There was a common opinion that the software developer s free lunch went away when multicore, lower frequency processor design was chosen as the way to move processor technology forward. This of course is referring to the fact that a program without change could benefit from the previous practice of increasing the processor frequencies of new single core designs. The thinking was that the free lunch was now going to cost money as work, in the form of software redesign, was needed to add performance gains. In some regards I think that was true, however, only for software that existed during the transition. If you put the right effort into making your new designs multicore-aware and design the software to be scalable, I would argue that the free lunch still exists and maybe even more so. Maybe it s time to come up with a new term for the era of multicore. Perhaps we do have to pay a little now, but for that small price, you get an unlimited dinner buffet. It is time for embedded systems to truly be in a position to take advantage of Moore s Law, by developing systems that can readily scale to performance doubling roughly every 18 months. Best Practices / IZ-DOC-0120-R2 2

3 System Configuration It is extremely important to be able to configure a system to optimize it for the specific embedded application. One of the biggest black-boxes on a PC system is the hardware configuration which is controlled for the most part by the Basic Input Output System (BIOS). It is critical to select a system that allows for the most granular configuration of processor and chipset features from BIOS setup. Without this level of control, it can sometimes be a daunting task to qualify that a system is capable of providing the level of performance and determinism required by the embedded application. There are many features of a system that can have significant effects on performance relative to an embedded application. The system features that have the most potential impact to embedded application performance are categorized into four functional areas: Legacy Device Support CPU configuration Power Management Memory Configuration Legacy Device Support One area that has been seen to inject some unavoidable, although usually small, System Management Interrupts (SMI) activity in today s systems is support for legacy devices. This is typically the use of USB mouse and keyboard at boot time. On some systems, with this support enabled, there are some unavoidable processor interrupts that can be attributed to System Management Interrupts required to handle these devices. This can cause significant in the otherwise near-constant duration when executing certain code paths. It is recommended to disable in BIOS the support for legacy USB. The drawback is that on some systems the keyboard will not be recognized during boot, so breaking into BIOS setup after disabling this is not supported. If needed, a ps2 keyboard can be used to break-in when required. However, on many newer systems, the BIOS appears to handle the keyboard correctly even when the legacy support is disabled. CPU Configuration Hyper-threading is specific to Intel in the x86/x64 architecture. More generically known as Simultaneous Multithreading (SMT), it is actually a second hardware thread that can be executed on a processor. The goal of SMT is to be able to keep the core busy more of the time by executing a second thread when the initial thread is waiting on some resource (known as a stall) to continue execution. This should effectively give you some performance increase; however it is not that straight forward as the threads share many processor components such as L1 and L2 cache, execution units, etc. Due to this sharing, there is a level of jitter introduced in the execution that may be too high in terms of determinism for some embedded systems. It is recommended that for any given application, if there is a consideration to use processors with hyperthreading, the developer should qualify that the system can operate within the required time bounds of the real-time application. Also, one should note, current versions of the 32-bit Windows operating system do not allow for the selection of which processors the OS can execute on. This can be problematic for a Windows real-time extension, as it can only operate on the cores that Windows was instructed to ignore. Windows will enumerate cores using all physical cores first followed by the HT cores. Given a system with a quad-core hyper-threaded processor, if Windows is directed to only use 4 cores, Windows will select the primary HW Best Practices / IZ-DOC-0120-R2 3

4 thread from each core and enumerate them as processor 0-3. This would leave the 4 hyper-threads for the real-time extension. Since this could lead to an increase in cache conflicts, performance of the realtime applications can be significantly impacted. It is recommended that hyper-threading be disabled, or Windows be configured to use fewer cores, ensuring that there is at least one primary thread available to the real-time system. The figures below show examples of how cores would be enumerated on a Windows 7 32-bit OS running on a Quad-core hyper-threaded system. Below, the red indicates RTSS cores. The gray indicates the L1/L2 cache sharing, where a core and its hyper-thread are owned by different subsystems. This configuration would potentially have a higher incident of cache collision on the 4 th and 5 th core due to sharing of the caches with Windows processes running on core 0 and core 1. Best Practices / IZ-DOC-0120-R2 4

5 From the diagrams that follow, you can see that this method is employed even when adding additional processor sockets to the system. This has potential to have even more impact as there are more cores/threads that could have shared caches. 64-bit Windows does enumerate cores differently. Although, currently, specific processors cannot be selected, 64-bit windows does enumerate the 2 threads of a hyper threaded core sequentially. This enumeration method gives the system designer much more flexibility in terms of how processors are divided between Windows and the real-time extension since it avoids many of the cache conflicts associated with sharing threads of a single core as noted above in the 32-bit Windows processor enumeration method. The key difference is that the shared resource is pushed out to much larger L3 cache, which is further away physically and logically from the cores. The benefit is that the potential for Windows activities to dirty the more critical l1/l2 caches is significantly reduced. The figures below illustrate the 64-bit windows enumeration method. Note the second diagram that now shows no sharing of the L1/L2 caches with a 2+6 configuration on an 8 logical core system. Best Practices / IZ-DOC-0120-R2 5

6 Power Management Configuration There are a few power management features that can really impact the consistent performance level of today s systems. This section focuses only on the Intel versions of these features. However they do in many cases have their counterparts in processors from other manufacturers. SpeedStep SpeedStep was one of the first technologies introduced by Intel to attempt to reduce the amount of power consumed by processors when there was not enough work to keep them busy. SpeedStep effectively controls both the processor voltage and frequency in a dynamic way to reduce unneeded power and heat. Best Practices / IZ-DOC-0120-R2 6

7 This power savings is great and very effective for typical consumer and even server systems; however it can wreak havoc on the embedded system. Consider the case where a thread performing a specific function has been optimized to execute with a worse-case time of 100 microseconds (µs). Put another way, the system designer is 100% certain that the execution of that threads code path will always be less than 100µs. Using a typical safety margin of, say, 20%, the system should be capable of running this thread every 120µs without ever overrunning that cycle time. This is by definition very deterministic and exactly the type of performance that the embedded systems developers require. However, using the above example, if SpeedStep or similar technologies are enabled the execution of the thread may incur drastic variances in execution time as a result of the processors voltage and frequency being dynamically ratcheted up and down. Due to this, it recommended that embedded systems disable the SpeedStep setting in BIOS if possible. Turbo Boost Technology Turbo Boost is a technology that allows processors to operate more efficiently by dynamically increasing the processor frequency when the operating system is requesting the highest available power state. Like SpeedStep technology, Turbo Boost does cause the frequency to essentially be unknown and nondeterministic. However, unlike SpeedStep, an increase in the frequency tends to not have a negative effect to the thread execution times as stated in the scenario above. In fact, that thread would typically execute in less time than the average and therefore not pose an overrun risk. That said, it is recommended that the developer understand the base operating frequency of the processor and design to that such that Turbo Boost would not affect the processing and performance of the embedded application. C-States CPU states, or C-States for short, are another mechanism designed to reduce the consumption of power in a system. Unlike the power states (P-States) that are used to control the clock frequency and voltage levels, C-States are used to efficiently describe the functional components of the CPU that can be turned off, when not in use. If it is not on, it is not drawing power and generating excess heat. There are many C-States which can vary by processor and vendor. C0 is the special state assigned when the processor is fully turned on. C-States are grouped typically by what they turn off. C1 through C3 essentially cut the clock signal to the CPU or some of its internal components, while C4-C6 typically work by cutting the voltage levels. Then there are hybrid states that define a combination of both. The drawback to C-States in embedded systems comes down to one thing; is the function of the CPU that is needed at any given time, available immediately for use. And the answer with C-states enables is a big MAYBE. This is due to the fact that as clock signals and or voltage levels are cut or reduced to the CPU and its internal components, it takes time to wake these systems up for use and that time can vary significantly. It is difficult, if not impossible, to design a high performing, highly deterministic embedded system under those conditions. Thankfully, the use of C-States can be disabled, typically in BIOS, or even with some suggestions to the operating system. Either way, it is recommended that these power saving states be disabled or at a minimum reduced in use so that the system can run as predictable as possible. Best Practices / IZ-DOC-0120-R2 7

8 Memory Configuration There are generally two memory configurations that can be configured on today s systems. These are typically referred to as Symmetric Multiprocessing (SMP) or Uniform Memory Access (UMA) and Non- Uniform Memory Architecture (NUMA). It is typical that the memory architecture used can be selected from BIOS. In most cases, if this option is not available, it is most likely that SMP architecture is employed. SMP memory architecture simply means that all memory is equally addressable from any core and there is no concept of locality. The SMP memory configuration can be thought of as truly global in nature. The NUMA architecture, on the other hand, is logically a hybrid of local and global memory. This means that all memory is globally accessible, however, memory does have locality associated with processing cores. NUMA memory can easily be thought of as another level of cache beyond L3 that makes use of the physical placement of system memory (RAM) to allow faster access to memory that is closer (more local) to a given processor/core. Best Practices / IZ-DOC-0120-R2 8

9 Memory, I/O and Cache This is probably the most technical section of this paper, as it is extremely important to understand how these hardware subsystems affect the performance of executing code. The use and tuning of these subsystems can be the difference of an embedded system that is highly deterministic or not. Everything done on the computer requires memory of some kind, whether it is RAM, I/O ports, device memory, etc. The problem with memory, which everyone is aware of, is it is significantly slower to access than are the speeds at which the processors execute instructions. Since this gap has not been overcome, not even on today s advanced systems, processor vendors have developed multi-level memory hierarchies which included fast memory close to the processor, known as cache. The problem with cache is that it is expensive, small and fixed in size (amount, not physical) and needs to be on the same die as the processor to minimize the physical distance, which is required to have the fastest access time possible. This along with improvements and in RAM speed, sizes and configuration (UMA/NUMA) are helping embedded system designers to optimize these commodity systems for their needs. The cost, however, is the requirement of knowing how this memory hierarchy functions. Let s take a look at memory from the bottom up. That is, we ll start as with the memory that is closest to the processor and work out from there. Cache Architecture Using Level 1/2/3 interchangeably with L1/2/3 It is critical that system designers and developers understand the cache architecture of the processors that they are working with. Knowing this can make the difference not only with a product that is delivered on-schedule, but ending up with a product that meets and even exceeds design requirements. A great paper that goes into depth about memory architecture is What Every Programmer Should Know About Memory by Ulrich Drepper at Red Hat. Best Practices / IZ-DOC-0120-R2 9

10 As shown earlier, the cache of most x86/64 systems today have a 3-level hierarchical cache architecture. The level 1 cache is smallest and closest to the processor and is split into an instruction and a data cache of equal size. The Level 2 cache is per processor core like L1, however is a unified cache containing both instructions and data. L2 is typically larger than L1. On hyper-threaded cores, L1 and L2 cache are shared by both hardware threads. Finally there is the Level 3 cache which is also referred to as smart cache. This cache is considerably larger than L1 and L2, but like L2 it is also a unified cache. L3 cache is a processor shared cache, meaning all cores/hardware threads in a processor package have equal use of the L3 cache. The graphics from the CPU configuration section above depict the logical hierarchy of the cache as described here. Not to over-simplify things too much, but when data is required to execute an instruction by the processor, that data is read and moved into L1 cache. When that data is no longer needed and other data is required for the processor to continue execution, the old data is evicted to the next level of cache away from the processor. This eviction logic can continue until the data only exists in the memory (RAM). If this data is required again, a lengthy fetch of the data from system memory must be performed. However, if the data was never evicted in the first place, access to it is quite fast as the cache version is still valid. Understanding the cache architecture and its operation logic to ensure data validity across multiple cores is the best way to reliably predict where the data is when you will need it to ensure a deterministic execution of real-time threads. Shy of knowing this, at least at an operational level is simply allowing the system to determine the way your applications are going to execute and not you. Embedded system design is about understanding the hardware and software at a level that allow as complete control of the system as possible, and on today s systems it really does start with the cache. Multicore/Multithreaded Programming The biggest issue with programming on multicore systems is the need for data concurrency across all the cores. We have found that a significant portion of latency is in the allocation and use of memory and the impact that cache mechanisms have on performance. The good news is there are design rules and techniques you can use to both minimize these types of performance hits and tune a system for optimal performance. Intel documents this in the Intel(R) 64 and IA-32 Architectures Optimization Reference Manual. It is recommended that this manual be used in determining the best practices for developing critical areas of software where specific performance requirements are needed. Given that this document is over 700 pages, an example of a best practice that we recommend is a technique that will result in an improvement in the very types of thread execution latency that are associated with false sharing and out of order processing. False Sharing The main culprit of latency that was found by a study focused on latency issues reported by IntervalZero customers was the sharing of modified data and false-sharing between threads executing on different processors. Based on the Intel optimization manual noted above, memory sharing under these conditions should be designed carefully to avoid these types of performance penalties. This all has come about as modern CPUs have gotten much faster than modern memory systems. This disparity in speed --- more than two orders of magnitudes has resulted in the multi-megabyte caches found on modern CPUs. As the Manual describes in section Prevent Sharing of Modified Data and False-Sharing : on an Intel Core Duo processor or a processor based on Intel Core microarchitecture, sharing of modified data incurs a performance penalty when a thread running on one core tries to read or write data that is currently present in modified state in the first level cache of the other core. This will cause eviction of the modified cache line back into memory and reading it into the first-level cache of the other core. The latency of such cache line transfer is much higher than using data in the immediate first level cache or second level cache. Best Practices / IZ-DOC-0120-R2 10

11 In the process of application design for high performance/deterministic functionality, cache impact must be considered. 1. The first decision is whether sharing memory between threads is really required. 2. If required, the second decision is whether the data in the shared memory should be protected. 3. If protection is required, use synchronization objects to protect the shared memory. 4. If deterministic latency of accessing the shared memory region is required, threads should avoid modifying the data within a cache line or within a sector. Following is a code sample that demonstrates this jitter. The jitter is due to the sharing of adjacent memory that is modified by threads on other processors. The writing of the adjacent data causes cache eviction that requires a trip to memory to reload the cache for subsequent reads. Sample 1 is a code snippet of how deterministic latency of accessing the shared memory region will be affected by the system cache. The sample creates multiple threads and assigns one thread per core. Each thread repeatedly reads out 4 byte data from one offset of shared memory region, and writes into another offset of shared memory region. The sample measures the time it takes within certain loops. The problem is each thread may modify the data within the cache line of other threads. As a result, there may be different latency among each measurement. Sample 1 static ULONG Inputs[SAMPLES_PER_TIMER][NUM_INPUTS]; static ULONG Outputs[SAMPLES_PER_TIMER][NUM_OUTPUTS]; typedef struct _MEMORY INPUTS* volatile Inputs; OUTPUTS* volatile Outputs; HANDLE htimer; ULONG loopcount; ULONG data[1]; MEMORY, *PMEMORY; void _cdecl wmain( ) int argc, wchar_t **argv, wchar_t **envp for ( i = 0; i < g_usedcores; i++ ) PMEMORY pmemory; pmemory->inputs = &Inputs; pmemory->outputs = &Outputs; //Setup a thread affined to each core HANDLE hthread = CreateThread(NULL, 0, InnerLoop, pmemory, CREATE_SUSPENDED, NULL); SetThreadAffinityMask(hThread, affinity); SetThreadPriority(hThread, RT_PRIORITY_MAX); ResumeThread(hThread); CloseHandle(hThread); Best Practices / IZ-DOC-0120-R2 11

12 VOID RTFCNDCL InnerLoop(PVOID pargument) PMEMORY pmemory = (PMEMORY)pArgument; LARGE_INTEGER start = 0; LARGE_INTEGER stop = 0; LARGE_INTEGER diff = 0; ULONG i = g_numdatapointspercycle; ULONG max = 0; ULONG min = (ULONG)-1; ULONG curus = 0; //Work loop variables int samplecount, OutCount, InCount; for ( i; i > 0; i-- ) _asm lea rdtsc mov mov ebx, start [ebx], eax [ebx+4], edx //Do work. for (samplecount = 0; samplecount < SAMPLES_PER_TIMER; samplecount++) for (OutCount = 0; OutCount < NUM_OUTPUTS; OutCount++) (*pmemory->outputs)[samplecount][outcount] = 0; for (InCount = 0; InCount < NUM_INPUTS; InCount++) (*pmemory >Outputs)[sampleCount][OutCount] += (*pmemory->inputs)[samplecount][incount]; _asm rdtsc lea mov mov ebx, stop [ebx], eax [ebx+4], edx //Calculate diff.quadpart = stop.quadpart - start.quadpart; curus = (ULONG)((diff.QuadPart * )/freq.QuadPart); if ( curus > max ) max = curus; if (curus < min ) min = curus; if ( StoreData(pMemory, curus, min, max) ) Best Practices / IZ-DOC-0120-R2 12

13 //Shouldn't return from here. OutputData(pMemory); Below is a sample of the timing results from the running of sample 1. This clearly shows a wide amount of jitter due to these types of cache evictions of shared memory. Thread:27 Cur: 402 Min: 401 Max: 432 Thread:27 Cur: 403 Min: 401 Max: 440 Thread:27 Cur: 402 Min: 401 Max: 428 Thread:27 Cur: 403 Min: 401 Max: 428 Thread:27 Cur: 404 Min: 401 Max: 439 Thread:27 Cur: 403 Min: 401 Max: 431 Thread:27 Cur: 401 Min: 401 Max: 447 Thread:27 Cur: 404 Min: 401 Max: 614 Thread:27 Cur: 513 Min: 401 Max: 649 Thread:27 Cur: 403 Min: 401 Max: 665 The above jitter issue is nearly eliminated by using separated memory as in Sample 2, shown below. Unlike in Sample 1, where one memory area is allocated in a single call and passes the pointer to each thread, sample 2 allocates memory for each thread/core. Each of these separate allocations assures that the memory blocks are on page boundaries and as such are far enough away from each other (physically in memory) to assure that there will not be cache line sharing. Sample 2 void _cdecl wmain( ) int argc, wchar_t **argv, wchar_t **envp for ( i = 0; i < g_usedcores; i++ ) PMEMORY pmemory; pmemory->inputs = (INPUTS*)malloc(sizeof(ULONG) * SAMPLES_PER_TIMER * NUM_INPUTS); pmemory->outputs = (OUTPUTS*)malloc(sizeof(ULONG) * SAMPLES_PER_TIMER * NUM_OUTPUTS); //Setup a thread affined to each core HANDLE hthread = CreateThread(NULL, 0, InnerLoop, pmemory, CREATE_SUSPENDED, NULL); SetThreadAffinityMask(hThread, affinity); SetThreadPriority(hThread, RT_PRIORITY_MAX); ResumeThread(hThread); CloseHandle(hThread); Best Practices / IZ-DOC-0120-R2 13

14 Below is a sample of the timing results from the execution of Sample 2. This clearly shows a significant improvement in the amount of jitter due to separate memory allocations to assure non-adjacency of data. Here the jitter is within 1µs using the same inner loop code. The only difference is the way memory was allocated. Thread:61 Cur: 402 Min: 401 Max: 402 Thread:61 Cur: 402 Min: 401 Max: 402 Thread:61 Cur: 402 Min: 401 Max: 402 Thread:61 Cur: 401 Min: 401 Max: 402 Thread:61 Cur: 401 Min: 401 Max: 402 Thread:61 Cur: 401 Min: 401 Max: 402 Thread:61 Cur: 401 Min: 401 Max: 402 Thread:61 Cur: 401 Min: 401 Max: 402 Thread:61 Cur: 402 Min: 401 Max: 402 Thread:61 Cur: 402 Min: 401 Max: 402 Although there are many areas for optimization as the Intel Intel(R) 64 and IA-32 Architectures Optimization Reference Manual document describes, we are confident that the case of sharing of modified data and false sharing is the biggest contributor to the latencies that have been reported to IntervalZero. Out of Order Execution Another area of unexpected latency is introduced by the sophisticated and complex mechanism employed in today s modern processors known as out-of-order execution. This feature is yet another mechanism added to reduce processor stalls from the mismatch of processing speed and memory access. Reordering can occur in two places, one in the hardware and the other is from the compiler. You will potentially need to address one or both of these to make sure you are controlling the execution of your threads in a predictable way. Memory fences can be used to address the hardware out-of-order execution. A memory fence forces all reads and/or writes preceding the fence to complete before the instructions following the fence are executed. The read or write operation that is forced is determined by the type of fence used, which include the read fence, write fence, or a full fence. The reason this can be so important is due to the fact that out-of-order execution may cause cache evictions to occur more frequently depending on the nature of the memory allocation in the program. An example of this is the code snippet below: Sample 3 // Initial conditions: int x = 0, y = 0; // Thread A, started first: while (x == 0) // Spin until x is non-0. ; std::cout << y; // Thread B, started later: y = 42; x = 1; Assume that thread A and Thread B are run on two different cores. Under certain conditions this code can output 0. How can this be? Because the 42 was buffered in another core s cache write buffer and the write to x was seen by thread A before the write to y, even though no instruction reordering happened! Best Practices / IZ-DOC-0120-R2 14

15 Also, cache write buffers are not flushed by the inter-processor cache coherency logic. In order to enforce the correct operation a write fence or a full fence should be used before while loop in thread A. In the case of compiler reordering, each compiler will have its own way of suggesting that the operations should not be optimized and thereby reordered. You should consult the compiler documentation. Keep in mind that the compile cannot prevent runtime processor instruction reordering. Code Branching Although it seems common sense, code branching should be avoided in code where high performance and determinism is required. Code branching can cause a cache invalidate on both the initial branch and the return, which could add a considerable amount of latency that is difficult to quantify. Because of this, code branching under these conditions should be avoided or at the least minimized as much as possible. Best Practices / IZ-DOC-0120-R2 15

16 Conclusion This article was designed to point the spotlight on specific areas of multi-core system design and their use as platforms for embedded systems. The area of system configuration, memory architecture, and programming were addressed, but this is not a definitive list. The information presented here should be used as a primer by the embedded systems engineers to start the systematic thought process of how today s systems are architected and how the off-the-shelf designs of these systems can be used to develop some of today s most high performing, deterministic applications for anything from machine tool control to digital media processing to cutting-edge medical instruments that require a rich interactive user interface with the power and performance and scalability that only a multicore RTOS platform can provide. So, take this information, use it as a starting point of understanding, and go build the platforms of tomorrow starting with the foundations of the platforms being built today. Best Practices / IZ-DOC-0120-R2 16

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Multi-core and Linux* Kernel

Multi-core and Linux* Kernel Multi-core and Linux* Kernel Suresh Siddha Intel Open Source Technology Center Abstract Semiconductor technological advances in the recent years have led to the inclusion of multiple CPU execution cores

More information

Advanced Core Operating System (ACOS): Experience the Performance

Advanced Core Operating System (ACOS): Experience the Performance WHITE PAPER Advanced Core Operating System (ACOS): Experience the Performance Table of Contents Trends Affecting Application Networking...3 The Era of Multicore...3 Multicore System Design Challenges...3

More information

Distribution One Server Requirements

Distribution One Server Requirements Distribution One Server Requirements Introduction Welcome to the Hardware Configuration Guide. The goal of this guide is to provide a practical approach to sizing your Distribution One application and

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

CPU Scheduling. Core Definitions

CPU Scheduling. Core Definitions CPU Scheduling General rule keep the CPU busy; an idle CPU is a wasted CPU Major source of CPU idleness: I/O (or waiting for it) Many programs have a characteristic CPU I/O burst cycle alternating phases

More information

Fastboot Techniques for x86 Architectures. Marcus Bortel Field Application Engineer QNX Software Systems

Fastboot Techniques for x86 Architectures. Marcus Bortel Field Application Engineer QNX Software Systems Fastboot Techniques for x86 Architectures Marcus Bortel Field Application Engineer QNX Software Systems Agenda Introduction BIOS and BIOS boot time Fastboot versus BIOS? Fastboot time Customizing the boot

More information

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007 Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer

More information

evm Virtualization Platform for Windows

evm Virtualization Platform for Windows B A C K G R O U N D E R evm Virtualization Platform for Windows Host your Embedded OS and Windows on a Single Hardware Platform using Intel Virtualization Technology April, 2008 TenAsys Corporation 1400

More information

EECS 750: Advanced Operating Systems. 01/28 /2015 Heechul Yun

EECS 750: Advanced Operating Systems. 01/28 /2015 Heechul Yun EECS 750: Advanced Operating Systems 01/28 /2015 Heechul Yun 1 Recap: Completely Fair Scheduler(CFS) Each task maintains its virtual time V i = E i 1 w i, where E is executed time, w is a weight Pick the

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

How System Settings Impact PCIe SSD Performance

How System Settings Impact PCIe SSD Performance How System Settings Impact PCIe SSD Performance Suzanne Ferreira R&D Engineer Micron Technology, Inc. July, 2012 As solid state drives (SSDs) continue to gain ground in the enterprise server and storage

More information

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower The traditional approach better performance Why computers are

More information

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Intel Data Direct I/O Technology (Intel DDIO): A Primer > Intel Data Direct I/O Technology (Intel DDIO): A Primer > Technical Brief February 2012 Revision 1.0 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

More information

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis White Paper Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis White Paper March 2014 2014 Cisco and/or its affiliates. All rights reserved. This document

More information

Operating Systems 4 th Class

Operating Systems 4 th Class Operating Systems 4 th Class Lecture 1 Operating Systems Operating systems are essential part of any computer system. Therefore, a course in operating systems is an essential part of any computer science

More information

Technical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment

Technical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment Technical Paper Moving SAS Applications from a Physical to a Virtual VMware Environment Release Information Content Version: April 2015. Trademarks and Patents SAS Institute Inc., SAS Campus Drive, Cary,

More information

An Implementation Of Multiprocessor Linux

An Implementation Of Multiprocessor Linux An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than

More information

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage sponsored by Dan Sullivan Chapter 1: Advantages of Hybrid Storage... 1 Overview of Flash Deployment in Hybrid Storage Systems...

More information

2

2 1 2 3 4 5 For Description of these Features see http://download.intel.com/products/processor/corei7/prod_brief.pdf The following Features Greatly affect Performance Monitoring The New Performance Monitoring

More information

Virtuoso and Database Scalability

Virtuoso and Database Scalability Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of

More information

Delivering Quality in Software Performance and Scalability Testing

Delivering Quality in Software Performance and Scalability Testing Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,

More information

Embedded Parallel Computing

Embedded Parallel Computing Embedded Parallel Computing Lecture 5 - The anatomy of a modern multiprocessor, the multicore processors Tomas Nordström Course webpage:: Course responsible and examiner: Tomas

More information

A Powerful solution for next generation Pcs

A Powerful solution for next generation Pcs Product Brief 6th Generation Intel Core Desktop Processors i7-6700k and i5-6600k 6th Generation Intel Core Desktop Processors i7-6700k and i5-6600k A Powerful solution for next generation Pcs Looking for

More information

SAS Business Analytics. Base SAS for SAS 9.2

SAS Business Analytics. Base SAS for SAS 9.2 Performance & Scalability of SAS Business Analytics on an NEC Express5800/A1080a (Intel Xeon 7500 series-based Platform) using Red Hat Enterprise Linux 5 SAS Business Analytics Base SAS for SAS 9.2 Red

More information

AMD PhenomII. Architecture for Multimedia System -2010. Prof. Cristina Silvano. Group Member: Nazanin Vahabi 750234 Kosar Tayebani 734923

AMD PhenomII. Architecture for Multimedia System -2010. Prof. Cristina Silvano. Group Member: Nazanin Vahabi 750234 Kosar Tayebani 734923 AMD PhenomII Architecture for Multimedia System -2010 Prof. Cristina Silvano Group Member: Nazanin Vahabi 750234 Kosar Tayebani 734923 Outline Introduction Features Key architectures References AMD Phenom

More information

CPU Scheduling Outline

CPU Scheduling Outline CPU Scheduling Outline What is scheduling in the OS? What are common scheduling criteria? How to evaluate scheduling algorithms? What are common scheduling algorithms? How is thread scheduling different

More information

CPU performance monitoring using the Time-Stamp Counter register

CPU performance monitoring using the Time-Stamp Counter register CPU performance monitoring using the Time-Stamp Counter register This laboratory work introduces basic information on the Time-Stamp Counter CPU register, which is used for performance monitoring. The

More information

0408 - Avoid Paying The Virtualization Tax: Deploying Virtualized BI 4.0 The Right Way. Ashish C. Morzaria, SAP

0408 - Avoid Paying The Virtualization Tax: Deploying Virtualized BI 4.0 The Right Way. Ashish C. Morzaria, SAP 0408 - Avoid Paying The Virtualization Tax: Deploying Virtualized BI 4.0 The Right Way Ashish C. Morzaria, SAP LEARNING POINTS Understanding the Virtualization Tax : What is it, how it affects you How

More information

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure White Paper Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure White Paper March 2014 2014 Cisco and/or its affiliates. All rights reserved. This

More information

Performance Tuning Guidelines for PowerExchange for Microsoft Dynamics CRM

Performance Tuning Guidelines for PowerExchange for Microsoft Dynamics CRM Performance Tuning Guidelines for PowerExchange for Microsoft Dynamics CRM 1993-2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,

More information

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

Measuring Cache and Memory Latency and CPU to Memory Bandwidth White Paper Joshua Ruggiero Computer Systems Engineer Intel Corporation Measuring Cache and Memory Latency and CPU to Memory Bandwidth For use with Intel Architecture December 2008 1 321074 Executive Summary

More information

Run-Time Scheduling Support for Hybrid CPU/FPGA SoCs

Run-Time Scheduling Support for Hybrid CPU/FPGA SoCs Run-Time Scheduling Support for Hybrid CPU/FPGA SoCs Jason Agron jagron@ittc.ku.edu Acknowledgements I would like to thank Dr. Andrews, Dr. Alexander, and Dr. Sass for assistance and advice in both research

More information

An Oracle White Paper July 2011. Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

An Oracle White Paper July 2011. Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide An Oracle White Paper July 2011 1 Disclaimer The following is intended to outline our general product direction.

More information

Agenda. Context. System Power Management Issues. Power Capping Overview. Power capping participants. Recommendations

Agenda. Context. System Power Management Issues. Power Capping Overview. Power capping participants. Recommendations Power Capping Linux Agenda Context System Power Management Issues Power Capping Overview Power capping participants Recommendations Introduction of Linux Power Capping Framework 2 Power Hungry World Worldwide,

More information

Building Applications Using Micro Focus COBOL

Building Applications Using Micro Focus COBOL Building Applications Using Micro Focus COBOL Abstract If you look through the Micro Focus COBOL documentation, you will see many different executable file types referenced: int, gnt, exe, dll and others.

More information

Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6

Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6 Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6 Winter Term 2008 / 2009 Jun.-Prof. Dr. André Brinkmann Andre.Brinkmann@uni-paderborn.de Universität Paderborn PC² Agenda Multiprocessor and

More information

SYSTEM ecos Embedded Configurable Operating System

SYSTEM ecos Embedded Configurable Operating System BELONGS TO THE CYGNUS SOLUTIONS founded about 1989 initiative connected with an idea of free software ( commercial support for the free software ). Recently merged with RedHat. CYGNUS was also the original

More information

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency

More information

ClearPath MCP Software Series Compatibility Guide

ClearPath MCP Software Series Compatibility Guide ClearPath Software Series Compatibility Guide Overview The ClearPath Software Series is designed to deliver new cost and performance competitive attributes and to continue to advance environment attributes

More information

HyperThreading Support in VMware ESX Server 2.1

HyperThreading Support in VMware ESX Server 2.1 HyperThreading Support in VMware ESX Server 2.1 Summary VMware ESX Server 2.1 now fully supports Intel s new Hyper-Threading Technology (HT). This paper explains the changes that an administrator can expect

More information

Enhancing SQL Server Performance

Enhancing SQL Server Performance Enhancing SQL Server Performance Bradley Ball, Jason Strate and Roger Wolter In the ever-evolving data world, improving database performance is a constant challenge for administrators. End user satisfaction

More information

Which ARM Cortex Core Is Right for Your Application: A, R or M?

Which ARM Cortex Core Is Right for Your Application: A, R or M? Which ARM Cortex Core Is Right for Your Application: A, R or M? Introduction The ARM Cortex series of cores encompasses a very wide range of scalable performance options offering designers a great deal

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Real-time KVM from the ground up

Real-time KVM from the ground up Real-time KVM from the ground up KVM Forum 2015 Rik van Riel Red Hat Real-time KVM What is real time? Hardware pitfalls Realtime preempt Linux kernel patch set KVM & qemu pitfalls KVM configuration Scheduling

More information

Intel 64 and IA-32 Architectures Software Developer s Manual

Intel 64 and IA-32 Architectures Software Developer s Manual Intel 64 and IA-32 Architectures Software Developer s Manual Volume 3B: System Programming Guide, Part 2 NOTE: The Intel 64 and IA-32 Architectures Software Developer's Manual consists of eight volumes:

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

Unifying Information Security

Unifying Information Security Unifying Information Security CLEARSWIFT SECURE Gateways VMware Deployment Guide Version 3.3 1 Introduction The Clearswift SECURE Web and Email Gateways are multi-process, multi-threaded applications,

More information

Intel Core i3-2310m Processor (3M Cache, 2.10 GHz)

Intel Core i3-2310m Processor (3M Cache, 2.10 GHz) Intel Core i3-2310m Processor All Essentials Memory Specifications Essentials Status Launched Compare w (0) Graphics Specifications Launch Date Q1'11 Expansion Options Package Specifications Advanced Technologies

More information

10.04.2008. Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

10.04.2008. Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details Thomas Fahrig Senior Developer Hypervisor Team Hypervisor Architecture Terminology Goals Basics Details Scheduling Interval External Interrupt Handling Reserves, Weights and Caps Context Switch Waiting

More information

361 Computer Architecture Lecture 14: Cache Memory

361 Computer Architecture Lecture 14: Cache Memory 1 361 Computer Architecture Lecture 14 Memory cache.1 The Motivation for s Memory System Processor DRAM Motivation Large memories (DRAM) are slow Small memories (SRAM) are fast Make the average access

More information

Magento & Zend Benchmarks Version 1.2, 1.3 (with & without Flat Catalogs)

Magento & Zend Benchmarks Version 1.2, 1.3 (with & without Flat Catalogs) Magento & Zend Benchmarks Version 1.2, 1.3 (with & without Flat Catalogs) 1. Foreword Magento is a PHP/Zend application which intensively uses the CPU. Since version 1.1.6, each new version includes some

More information

Parallels Virtuozzo Containers

Parallels Virtuozzo Containers Parallels Virtuozzo Containers White Paper Top Ten Considerations For Choosing A Server Virtualization Technology www.parallels.com Version 1.0 Table of Contents Introduction... 3 Technology Overview...

More information

Informatica Ultra Messaging SMX Shared-Memory Transport

Informatica Ultra Messaging SMX Shared-Memory Transport White Paper Informatica Ultra Messaging SMX Shared-Memory Transport Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance This document contains Confidential, Proprietary and Trade

More information

Know your Cluster Bottlenecks and Maximize Performance

Know your Cluster Bottlenecks and Maximize Performance Know your Cluster Bottlenecks and Maximize Performance Hands-on training March 2013 Agenda Overview Performance Factors General System Configuration - PCI Express (PCIe) Capabilities - Memory Configuration

More information

Low Power AMD Athlon 64 and AMD Opteron Processors

Low Power AMD Athlon 64 and AMD Opteron Processors Low Power AMD Athlon 64 and AMD Opteron Processors Hot Chips 2004 Presenter: Marius Evers Block Diagram of AMD Athlon 64 and AMD Opteron Based on AMD s 8 th generation architecture AMD Athlon 64 and AMD

More information

Parallel Processing and Software Performance. Lukáš Marek

Parallel Processing and Software Performance. Lukáš Marek Parallel Processing and Software Performance Lukáš Marek DISTRIBUTED SYSTEMS RESEARCH GROUP http://dsrg.mff.cuni.cz CHARLES UNIVERSITY PRAGUE Faculty of Mathematics and Physics Benchmarking in parallel

More information

Resource Utilization of Middleware Components in Embedded Systems

Resource Utilization of Middleware Components in Embedded Systems Resource Utilization of Middleware Components in Embedded Systems 3 Introduction System memory, CPU, and network resources are critical to the operation and performance of any software system. These system

More information

9/26/2011. What is Virtualization? What are the different types of virtualization.

9/26/2011. What is Virtualization? What are the different types of virtualization. CSE 501 Monday, September 26, 2011 Kevin Cleary kpcleary@buffalo.edu What is Virtualization? What are the different types of virtualization. Practical Uses Popular virtualization products Demo Question,

More information

Hitachi Virtage Embedded Virtualization Hitachi BladeSymphony 10U

Hitachi Virtage Embedded Virtualization Hitachi BladeSymphony 10U Hitachi Virtage Embedded Virtualization Hitachi BladeSymphony 10U Datasheet Brings the performance and reliability of mainframe virtualization to blade computing BladeSymphony is the first true enterprise-class

More information

Real-Time Operating Systems for MPSoCs

Real-Time Operating Systems for MPSoCs Real-Time Operating Systems for MPSoCs Hiroyuki Tomiyama Graduate School of Information Science Nagoya University http://member.acm.org/~hiroyuki MPSoC 2009 1 Contributors Hiroaki Takada Director and Professor

More information

IA-64 Application Developer s Architecture Guide

IA-64 Application Developer s Architecture Guide IA-64 Application Developer s Architecture Guide The IA-64 architecture was designed to overcome the performance limitations of today s architectures and provide maximum headroom for the future. To achieve

More information

AirWave 7.7. Server Sizing Guide

AirWave 7.7. Server Sizing Guide AirWave 7.7 Server Sizing Guide Copyright 2013 Aruba Networks, Inc. Aruba Networks trademarks include, Aruba Networks, Aruba Wireless Networks, the registered Aruba the Mobile Edge Company logo, Aruba

More information

Virtual SAN Design and Deployment Guide

Virtual SAN Design and Deployment Guide Virtual SAN Design and Deployment Guide TECHNICAL MARKETING DOCUMENTATION VERSION 1.3 - November 2014 Copyright 2014 DataCore Software All Rights Reserved Table of Contents INTRODUCTION... 3 1.1 DataCore

More information

Multilevel Load Balancing in NUMA Computers

Multilevel Load Balancing in NUMA Computers FACULDADE DE INFORMÁTICA PUCRS - Brazil http://www.pucrs.br/inf/pos/ Multilevel Load Balancing in NUMA Computers M. Corrêa, R. Chanin, A. Sales, R. Scheer, A. Zorzo Technical Report Series Number 049 July,

More information

Performance Testing. Configuration Parameters for Performance Testing

Performance Testing. Configuration Parameters for Performance Testing Optimizing an ecommerce site for performance on a global scale requires additional oversight, budget, dedicated technical resources, local expertise, and specialized vendor solutions to ensure that international

More information

Hardware Compatibility List

Hardware Compatibility List The devices in the following list are approved for use with SoundCheck. Other devices may be compatible but have not been tested and verified as compatible. We do not support any hardware that is not listed

More information

Multicore Programming with LabVIEW Technical Resource Guide

Multicore Programming with LabVIEW Technical Resource Guide Multicore Programming with LabVIEW Technical Resource Guide 2 INTRODUCTORY TOPICS UNDERSTANDING PARALLEL HARDWARE: MULTIPROCESSORS, HYPERTHREADING, DUAL- CORE, MULTICORE AND FPGAS... 5 DIFFERENCES BETWEEN

More information

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION A DIABLO WHITE PAPER AUGUST 2014 Ricky Trigalo Director of Business Development Virtualization, Diablo Technologies

More information

Real-time processing the basis for PC Control

Real-time processing the basis for PC Control Beckhoff real-time kernels for DOS, Windows, Embedded OS and multi-core CPUs Real-time processing the basis for PC Control Beckhoff employs Microsoft operating systems for its PCbased control technology.

More information

The Motherboard Chapter #5

The Motherboard Chapter #5 The Motherboard Chapter #5 Amy Hissom Key Terms Advanced Transfer Cache (ATC) A type of L2 cache contained within the Pentium processor housing that is embedded on the same core processor die as the CPU

More information

Reducing Cost and Complexity with Industrial System Consolidation

Reducing Cost and Complexity with Industrial System Consolidation WHITE PAPER Multi- Virtualization Technology Industrial Automation Reducing Cost and Complexity with Industrial System Consolidation Virtualization on multi-core Intel vpro processors helps lower overall

More information

Operating System Impact on SMT Architecture

Operating System Impact on SMT Architecture Operating System Impact on SMT Architecture The work published in An Analysis of Operating System Behavior on a Simultaneous Multithreaded Architecture, Josh Redstone et al., in Proceedings of the 9th

More information

Operating System Overview. Otto J. Anshus

Operating System Overview. Otto J. Anshus Operating System Overview Otto J. Anshus A Typical Computer CPU... CPU Memory Chipset I/O bus ROM Keyboard Network A Typical Computer System CPU. CPU Memory Application(s) Operating System ROM OS Apps

More information

Memory-Centric Database Acceleration

Memory-Centric Database Acceleration Memory-Centric Database Acceleration Achieving an Order of Magnitude Increase in Database Performance A FedCentric Technologies White Paper September 2007 Executive Summary Businesses are facing daunting

More information

MODULE 3 VIRTUALIZED DATA CENTER COMPUTE

MODULE 3 VIRTUALIZED DATA CENTER COMPUTE MODULE 3 VIRTUALIZED DATA CENTER COMPUTE Module 3: Virtualized Data Center Compute Upon completion of this module, you should be able to: Describe compute virtualization Discuss the compute virtualization

More information

Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 26 Real - Time POSIX. (Contd.) Ok Good morning, so let us get

More information

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 CELL INTRODUCTION 2 1 CELL SYNERGY Cell is not a collection of different processors, but a synergistic whole Operation paradigms,

More information

Nexenta Performance Scaling for Speed and Cost

Nexenta Performance Scaling for Speed and Cost Nexenta Performance Scaling for Speed and Cost Key Features Optimize Performance Optimize Performance NexentaStor improves performance for all workloads by adopting commodity components and leveraging

More information

QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE

QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE QlikView Technical Brief April 2011 www.qlikview.com Introduction This technical brief covers an overview of the QlikView product components and architecture

More information

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2 Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of

More information

Putting it all together: Intel Nehalem. http://www.realworldtech.com/page.cfm?articleid=rwt040208182719

Putting it all together: Intel Nehalem. http://www.realworldtech.com/page.cfm?articleid=rwt040208182719 Putting it all together: Intel Nehalem http://www.realworldtech.com/page.cfm?articleid=rwt040208182719 Intel Nehalem Review entire term by looking at most recent microprocessor from Intel Nehalem is code

More information

Fall 2009. Lecture 1. Operating Systems: Configuration & Use CIS345. Introduction to Operating Systems. Mostafa Z. Ali. mzali@just.edu.

Fall 2009. Lecture 1. Operating Systems: Configuration & Use CIS345. Introduction to Operating Systems. Mostafa Z. Ali. mzali@just.edu. Fall 2009 Lecture 1 Operating Systems: Configuration & Use CIS345 Introduction to Operating Systems Mostafa Z. Ali mzali@just.edu.jo 1-1 Chapter 1 Introduction to Operating Systems An Overview of Microcomputers

More information

Rambus Smart Data Acceleration

Rambus Smart Data Acceleration Rambus Smart Data Acceleration Back to the Future Memory and Data Access: The Final Frontier As an industry, if real progress is to be made towards the level of computing that the future mandates, then

More information

An Overview of Stack Architecture and the PSC 1000 Microprocessor

An Overview of Stack Architecture and the PSC 1000 Microprocessor An Overview of Stack Architecture and the PSC 1000 Microprocessor Introduction A stack is an important data handling structure used in computing. Specifically, a stack is a dynamic set of elements in which

More information

Application Performance Testing Basics

Application Performance Testing Basics Application Performance Testing Basics ABSTRACT Todays the web is playing a critical role in all the business domains such as entertainment, finance, healthcare etc. It is much important to ensure hassle-free

More information

Computer Organization and Components

Computer Organization and Components Computer Organization and Components IS5, fall 25 Lecture : Pipelined Processors ssociate Professor, KTH Royal Institute of Technology ssistant Research ngineer, University of California, Berkeley Slides

More information

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the

More information

Performance Management for Cloudbased STC 2012

Performance Management for Cloudbased STC 2012 Performance Management for Cloudbased Applications STC 2012 1 Agenda Context Problem Statement Cloud Architecture Need for Performance in Cloud Performance Challenges in Cloud Generic IaaS / PaaS / SaaS

More information

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database WHITE PAPER Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive

More information

A Brief Tutorial on Power Management in Computer Systems. David Chalupsky, Emily Qi, & Ilango Ganga Intel Corporation March 13, 2007

A Brief Tutorial on Power Management in Computer Systems. David Chalupsky, Emily Qi, & Ilango Ganga Intel Corporation March 13, 2007 A Brief Tutorial on Power Management in Computer Systems David Chalupsky, Emily Qi, & Ilango Ganga Intel Corporation March 13, 2007 Objective & Agenda Objective: Establish a common foundation for EEESG

More information

LOOKING FOR AN AMAZING PROCESSOR. Product Brief 6th Gen Intel Core Processors for Desktops: S-series

LOOKING FOR AN AMAZING PROCESSOR. Product Brief 6th Gen Intel Core Processors for Desktops: S-series Product Brief 6th Gen Intel Core Processors for Desktops: Sseries LOOKING FOR AN AMAZING PROCESSOR for your next desktop PC? Look no further than 6th Gen Intel Core processors. With amazing performance

More information

FUSION iocontrol HYBRID STORAGE ARCHITECTURE 1 WWW.FUSIONIO.COM

FUSION iocontrol HYBRID STORAGE ARCHITECTURE 1 WWW.FUSIONIO.COM 1 WWW.FUSIONIO.COM FUSION iocontrol HYBRID STORAGE ARCHITECTURE Contents Contents... 2 1 The Storage I/O and Management Gap... 3 2 Closing the Gap with Fusion-io... 4 2.1 Flash storage, the Right Way...

More information

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of CS:APP Chapter 4 Computer Architecture Wrap-Up William J. Taffe Plymouth State University using the slides of Randal E. Bryant Carnegie Mellon University Overview Wrap-Up of PIPE Design Performance analysis

More information

You re not alone if you re feeling pressure

You re not alone if you re feeling pressure How the Right Infrastructure Delivers Real SQL Database Virtualization Benefits The amount of digital data stored worldwide stood at 487 billion gigabytes as of May 2009, and data volumes are doubling

More information

Technical Note. Micron NAND Flash Controller via Xilinx Spartan -3 FPGA. Overview. TN-29-06: NAND Flash Controller on Spartan-3 Overview

Technical Note. Micron NAND Flash Controller via Xilinx Spartan -3 FPGA. Overview. TN-29-06: NAND Flash Controller on Spartan-3 Overview Technical Note TN-29-06: NAND Flash Controller on Spartan-3 Overview Micron NAND Flash Controller via Xilinx Spartan -3 FPGA Overview As mobile product capabilities continue to expand, so does the demand

More information