Process-Level Virtualization for Runtime Adaptation of Embedded Software
|
|
|
- Branden Terry
- 9 years ago
- Views:
Transcription
1 Process-Level Virtualization for Runtime Adaptation of Embedded Software Kim Hazelwood Department of Computer Science University of Virginia ABSTRACT Modern processor architectures call for software that is highly tuned to an unpredictable operating environment. Processlevel virtualization systems allow existing software to adapt to the operating environment, including resource contention and other dynamic events, by modifying the application instructions at runtime. While these systems are becoming widespread in the general-purpose computing communities, various challenges have prevented widespread adoption on resource-constrained devices, with memory and performance overheads being at the forefront. In this paper, we discuss the advantages and opportunities of runtime adaptation of embedded software. We also describe some of the existing dynamic binary modification tools that can be used to perform runtime adaptation, and discuss the challenges of balancing memory overheads and performance when developing these tools for embedded platforms. Categories and Subject Descriptors C.3 [Computer Systems Organization]: Special-Purpose and Application-Based Systems Real-time and embedded systems; D.3.4 [Programming Languages]: Processors optimization, run-time environments General Terms Design, Experimentation, Measurement, Performance Keywords embedded systems, virtualization software, runtime adaptation, dynamic binary optimization 1. INTRODUCTION The recent proliferation of processor architectures that incorporate multiple and potentially heterogeneous processing elements has resulted in an unintended consequence. More now than ever before, software developers must factor in the Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DAC 11 San Diego, California, USA Copyright 2011 ACM /11/06...$ fine details of the underlying architecture in order to achieve improved performance on the latest generation of microprocessors. Furthermore, runtime inefficiencies inevitably arise due to the unpredictable environment that a given application is exposed to at runtime. For instance, the fact that multiple processing elements share resources, such as caches or buses, means that significant contention may arise at runtime that a software developer could never have predicted, and in fact, software development time is the wrong time to try to predict such runtime contention. In addition, the specific load on any accelerator or graphics processor will often determine whether a given calculation should occur on the main CPU or any accelerator hardware. What we want is software that is capable of adapting to the runtime environment that it encounters, and system and hardware support for making this runtime adaptation possible. While the general-purpose computing community has begun to embrace the importance and potential of runtime adaptation, the need to adapt to the runtime environment is especially important in the case of embedded systems. Aside from the new complexities introduced by multiple cores and heterogeneity, several longstanding challenges have always existed on embedded systems that make it the ideal platform for adaptation. Power, energy, and battery life concerns are often at the forefront of the challenges for embedded systems, and these concerns often trump performance as a design constraint. These factors are in a constant state of flux. Meanwhile, at the software level, the specific implementation of a software application tends to be fixed at runtime regardless of the dramatic changes that may occur in terms of energy or battery life. Often, it is possible to implement an alternative, lower energy calculation in place of the standard implementation, but developers would not want to (and should not) implement the lower energy option permanently, as it would undoubtedly suffer performance consequences, which will always remain a concern, often for correctness reasons (e.g. real-time applications). Process-level virtualization [10] is a promising opportunity for enabling runtime adaptation of software. By placing a virtualization layer between the running application and the underlying system (be it an operating system or simply bare metal), there is the opportunity to inspect and potentially modify every instruction that executes on the system. This includes shared library code, dynamically-generated code, and even self-modifying code. Changes to the code can be as simple as inserting extra profiling instructions or as complex as translating all instructions from one architecture to another. Meanwhile, all of these changes can be made trans-
2 parent to the application such that any introspection provides identical (or at least similar enough) results to a native execution, including bug-for-bug compatibility. Virtualizing a single process at a time has several distinct advantages over virtualizing the entire system and all running applications (below an operating system, for instance). The dynamic compilation oriented approach means that the virtualization layer has a global view of the code of the running application, including information about potential execution phases and events, even before they occur. The ability to focus on application-specific profiles combined with environmental information simplifies the task of adaptation. Runtime adaptation using process-level virtualization comes with a set of distinct challenges, however, and the challenges are especially acute in the embedded systems domain. The act of modifying any application requires processing resources, and when the modification is performed at runtime, this modification competes for resources with running the application itself. This overhead can be difficult to overcome. As a result, the motivation for adapting the software must be worth the overhead incurred to perform the adaptation, and only in rare cases will a motivation of performance improvement alone actually pay off. The memory footprint of the virtualization software is another major challenge. In many cases, the virtualization software has been known to greatly exceed the size of the guest application, particularly since many known optimizations trade memory overhead for speed. Even for the case where the virtualization system can be designed to be compact and efficient enough for an embedded platform, an additional challenge exists that is specific to systems that virtualize at the process level. A system that supports parallel multitasking will potentially have multiple applications and therefore multiple virtualization engines running at once. This will result in significant memory and performance overheads that make it difficult to scale the solution to a large number of concurrent applications on multiple cores (although an argument can be made that scaling along this dimension is a low priority for embedded systems, at least for the near future). Along with each of these challenges is a set of unique opportunities. It is much more feasible to provide hardware and system support for runtime adaptation in the embedded domain than it is in the general-purpose computing domain for a variety of reasons including, but not limited to, the rapid hardware design cycle. In the general-purpose community, there is significant pressure for virtualization systems to run on stock hardware and systems software. In fact, the ideal case of developing adaptive systems by co-designing the hardware and software layers would be far too disruptive for general-purpose computing, while it is the norm for embedded systems. The remaining sections of this paper elaborate on the benefits and challenges of building adaptive software on embedded systems. Section 2 discusses some motivating cases where adapting software to the operating environment can be very beneficial. Section 3 discusses some software systems that make runtime adaptation of running processes possible. These systems perform dynamic binary modification to achieve the goal of adapting legacy software. Section 4 discusses several of the challenges encountered when building dynamic binary modification tools for embedded systems, while also presenting some of the solutions that have been Application Runtime (seconds) astar gcc libquantum gobmk Running In Isolation Running With Competing Apps mcf Figure 1: This figure demonstrates the problem of contention for shared resources. When the applications run alone on a multicore processor, they perform up to 3X faster than when other applications are running on the other cores. discussed and/or implemented. Finally, Section 5 summarizes and makes the case for additional research into adaptive software for embedded systems. 2. ADAPTING EMBEDDED SOFTWARE Building software that can adapt to an ever-changing runtime environment is particularly important for embedded systems. Battery life, power considerations, security, and code compatibility between dramatically changing architectures are all first order design constraints in this domain. In this section, we describe several scenarios that call for adaptive software solutions. 2.1 Shared Resource Contention Most modern architectures incorporate several processing elements on the same die, and the resulting processors are termed multicore processors. One feature of nearly all known multicore designs is that they incorporate shared structures between the multiple processing cores. Some cores will share an L2 cache; others will share a bus. The problem that arises is that running applications can now be significantly affected by the other applications that happen to be running on a given machine. Meanwhile, the application was designed and optimized assuming an unloaded system (e.g. the memory and cache placement has not been designed to play nicely with other applications). While it would be possible to build software that plays nicely with other applications, we really want software to be greedy whenever it can, and to play nicely when it has to. Figure 1 demonstrates the significant impact of resource contention in multicore systems. For the SPEC 2006 integer benchmark suite, the performance of a particular application can vary by up to 3X depending on whether the application was run in isolation or it was run alongside other applications. Note that all applications were pinned to their respective cores, so the applications were not competing for the cores themselves, but instead for the shared structures between the cores. (While these particular results were perlbench sjeng bzip2 h264ref xalancbmk hmmer omnetpp
3 Temperature (C) core 0 core Time (seconds) Figure 2: This figure demonstrates the manufacturing heterogeneity that can occur between cores on the same chip. While the temperature pattern observed while running the bwaves benchmark followed similar trends, the raw temperatures varied by 5 C. gathered on a general-purpose machine, resource contention should be expected to occur on any processor where structures are shared between cores.) Since it is nearly impossible to predict, at design time, whether (and to what extent) other tasks will be running alongside a given application, the issue of resource contention should be managed at runtime. Meanwhile, an ideal place to manage resource contention is in a layer that is acutely aware of the details of the running application. 2.2 Processor Heterogeneity The challenge of adapting to multiple processing cores is difficult enough even if we assume identical capabilities between the various cores. The world gets even more complicated, however, if we consider that those processing cores may vary in several dimensions [5]. Heterogeneity is cropping up in modern multicore systems for two fundamental reasons. First, manufacturing defects often result in seemingly identical cores that in reality have wildly varying behavior in terms of performance, power, or even correctness. Figure 2 illustrates the temperature heterogeneity observed between cores on the same chip core 0 consistently ran hotter than core 1, even though temperatures should have been identical. Second, many multicore systems are being designed with one or more specialized processing elements or accelerators, often right on the same chip. For both causes of processor heterogeneity, the challenge is scheduling tasks on the available processing resources without a priori knowledge about availability and capability. It also becomes necessary to ensure that the software instructions themselves support relocation from one core to another, particularly in the case of accelerators that feature a different instruction set than the main core. While static assignments are possible, they are not resilient to the effects of runtime occupancy and contention for processing resources. Meanwhile, shipping multi-versioned code that is capable of running on either device is likely to suffer from memory consumption concerns. A runtime adaptation engine can dynamically regenerate the instructions to run correctly and efficiently on a variety of hardware designs. 2.3 Power-Aware Computing Battery life, temperature, and reliability concerns are all first class design constraints for embedded systems, and all can be categorized as a form of power-aware computing. Unfortunately, it is difficult if not impossible to predict whether any of the concerns will become critical when designing and optimizing software. Meanwhile, there are steps that can be taken at the software level to mitigate each of these concerns. For instance, it is possible to generate instruction sequences that are more stable than others [22], which will improve reliability, possibly at the expense of performance. Given the potential performance degradation, it is clear that such code should only be generated when reliability concerns reach a certain level, which will generally not be known before runtime. Similar code generation trade-offs occur for addressing temperature or battery life concerns, but again, the need for such code is best deferred until runtime. By dynamically adapting the software on an as-needed basis, it is possible to achieve the best of both worlds. 2.4 Privacy and Security Program shepherding is an effective technique for observing the runtime behavior of software, detecting anomalies, and enforcing a set of rules that prevent common exploits [16]. This technique works by inspecting each instruction prior to executing it, and determining whether it follows a suspicious pattern, such as writing to the stack. Since the technique can be applied to an unmodified program binary, it has been leveraged by several industrial products. On an embedded system, security and privacy are equally if not more important than on a standard high-performance platform. Therefore, applying program shepherding in the embedded domain can be quite beneficial, assuming that it is still possible to provide full code coverage for the dynamic paths, and that the performance and memory overhead is not prohibitive. The former assumption can be proven for most architectures, while the latter assumption is still an open question that will be discussed in Section Code Compatibility New processors are regularly released with extensions to the instruction-set architecture. This presents several challenges to both architects and software vendors. First, architects need an effective way to measure the utility of any new instructions they propose, prior to committing to the actual hardware changes. For instance, it is often helpful to design and optimize the compiler algorithms that would be used to generate the new instructions before committing to building the hardware to support those new instructions. Unfortunately, having a compiler actually generate the new instructions means that binary cannot be executed until the hardware is available. Otherwise, any attempt to run the program will result in an illegal instruction fault. This catch 22 situation can be resolved by adapting the software to recognize and emulate the behavior of any new instructions that are unsupported by the underlying hardware. At the same time, the application can be adapted to introduce profiling code to determine the dynamic instruction count and other metrics of the new instructions (as discussed in Section 2.6), which will provide valuable feedback to the ISA architecture and compiler teams. Once new instructions have been approved and introduced to the ISA, a second challenge is faced by software vendors,
4 who must now decide whether to include those instructions in their shipped applications. Including the instructions can significantly improve performance on systems that have hardware support for those instructions. Yet, illegal instruction faults will occur on systems that do not support the instructions. An elegant solution to both of these challenges is to adapt the software at runtime. This technique enables backward compatibility, allowing application binaries that contain new instructions to execute on older hardware that does not support those instructions. Meanwhile, it also provides an opportunity for forward compatibility by introducing new instructions to existing binaries on-the-fly, improving the performance of applications running on newer hardware that contains the ISA extensions. Taken to the extreme, a similar runtime adaptation engine could provide full binary translation capabilities, allowing software to run on a completely incompatible platform. 2.6 Program Analysis and Debugging Understanding the dynamic behavior and bottlenecks of running software is important for software developers and system designers alike. An effective way to achieve this goal is to modify the running program to report any feature of interest, such as instruction mix profiles, dynamic code coverage, or memory allocations and deallocations. Yet, building these features into the source code of the application is problematic for several reasons. Aside from the fact that the process would be tedious and error prone, it would also be deceptive because it would miss all of the time spent executing instructions in shared library routines or dynamically-generated code. Therefore, a great way to analyze a program s dynamic behavior is to modify the program s execution stream to interleave profile-collection and analysis routines that will be triggered as the program executes. Software profiling and debugging is critical for embedded systems software, and the same infrastructures that provide runtime adaptation can also be used as robust and thorough program analysis tools. 3. RUNTIME ADAPTATION TOOLS A number of virtualization tools exist today that enable running software to be adapted by an external user. These tools exist both at the system level (a single copy that runs below an operating system) and at the process level (one copy per guest application that runs above an operating system, if present). Focusing on process-level virtualization tools, most run exclusively on general-purpose architectures [2, 4, 17, 18, 20, 21] such as x86. However, a few exceptions exist, and several tools have been ported to run on resource-constrained or embedded architectures. The Pin dynamic binary instrumentation tool, for instance, runs on ARM [11] and the Intel ATOM. Meanwhile, the Strata dynamic binary translator runs on ARM [19]. And finally, DELI runs on the LX architecture [7]. The process-level virtualization systems listed are often categorized as dynamic binary modifiers, virtual execution environments, or software dynamic translators, depending on the preference of the particular researcher, but they all operate in a similar fashion. Each system operates directly on the guest program binary, with no need to recompile, relink, or even access the source code of the guest application. This allows for even legacy or proprietary software to be adapted to the runtime environment. cache execute adapt monitor Figure 3: A runtime adaptation engine will continuously monitor and adapt the program image (often caching the modified code for efficiency). The virtualization system begins by taking control of execution before the guest program launches. Next, it inspects and potentially modifies every instruction (or series of instructions) in the guest application just prior to executing that instruction. The system is able to modify shared library code, dynamically generated code, and even self-modifying code everything except kernel code, since it operates at the process level. Process-level virtualization systems can naturally be leveraged as a runtime adaptation engine, since they allow software to be continuously profiled and modified, as shown in Figure 3. If a region of code is determined to be frequentlyexecuted and in need of adaptation, the system will alter the code and will begin to execute the altered code in lieu of the original code. Finally, to improve performance, the system will often cache the altered code for reuse during a single dynamic run. The fact that these systems operate dynamically (at runtime) means that only the portions of the software that executes will be modified. 4. IMPLEMENTATION CHALLENGES Performing software adaptation at runtime requires the use of a toolkit that can be complex to develop. Achieving correct functionality in the toolkit is relatively straightforward there are a variety of known techniques that have been leveraged by similar systems to ensure robust functionality. The true challenge lies in making the system run efficiently, particularly in a resource-constrained environment. 4.1 Maintaining Control One of the first goals a runtime adaptation engine must ensure is that it maintains complete control of the execution of a guest application, from the first to the last instruction. Security applications, for instance, require that every instruction is inspected and potentially modified prior to executing it. Challenges include acquiring control (injecting the adaptation engine into the guest application) and maintaining control across branches, calls, and runtime events. Indirect branches where the branch target is stored in a register or memory and thus it varies at runtime are particularly challenging to handle efficiently [8, 15] and on architectures like the ARM, any instruction can write to the program counter register, which ultimately results a branch. Finally, self-modifying code presents a challenge on most platforms, but is actually straightforward to handle on architectures like ARM that require explicit synchronization instructions whenever instructions are overwritten [11].
5 B A D I C call F return E H Figure 4: This figure depicts a control-flow graph of a code snippet (left) and the corresponding trace region that would be generated (right). 4.2 Balancing Memory and Performance Another design decision faced by developers of a runtime adaptation engine is whether the engine will behave like an interpreter or like a just-in-time compiler. Interpretationbased approaches perform a lookup into a table that indicates the behavior and added functionality required for each instruction, and this approach works best for short-running programs. It has a higher performance overhead, but a lower memory overhead. Just-in-time compilation approaches will generate new versions of the code with all extra functionality inlined, and will execute that new code in lieu of the old code. This approach works well for longer-running applications where the cost of transforming the code will be amortized. For JIT-based approaches, the next decision is whether to include a code cache in the design. The code cache will store previously modified code to facilitate reuse throughout execution, and the use of a code cache has been shown to improve performance by up to several orders of magnitude. However, the memory overhead of a code cache is significant as well - sometimes up to 5X the memory footprint of the guest application [12]. The high memory overheads are the result of the need to maintain a complete directory of the code cache contents and certain features of the resident code, the need to incorporate trampolines and other auxiliary code to maintain control, and the need to track whether the code has been patched in any way. The memory overhead pays for itself in performance improvements on general-purpose systems, but the balance on embedded systems is much more intricate, and therefore has required special attention [9]. Within the code cache, one design challenge is how to store the modified code. Most systems form traces single-entry multiple exit regions because they are more conducive to optimizations, and they tend to improve performance over storing individual basic blocks. Caching traces uses more memory than caching basic blocks, but it allows fewer entries to be present in the code cache directory (because multiple blocks can be encompassed into one trace) and the directory size reduction saves memory. With the formation of traces comes the challenge of trace selection [1, 6, 13, 14]. The longer the trace, the less likely the tail blocks will be executed, but when the speculation is correct, a longer trace will result in a performance boost. But of course, longer traces use more memory. Finally, with any bounded size code cache comes the challenge of handling cache eviction and replacement policies [3, G A B D E G H I 9, 12, 24, 23, 25]. Standard hardware replacement strategies do not apply because the eviction unit is a trace, which varies significantly in size. Therefore, a traditional LRU policy cannot be implemented since the LRU element may not free enough space and thus a contiguous victim trace must be taken as well. While general-purpose CPU implementations have been able to support features such as unbounded code caches, this is not an option on an embedded platform. Each of the challenges discussed in this section represent a standard memory-performance tradeoff. Most runtime adaptation engines have been developed for general purpose systems, and therefore the memory-performance balance tends to be skewed toward performance. At the extreme, this means that they occupy too much memory to work at all on an embedded platform. However, with careful tuning and additional research, these systems can be made to execute efficiently in both domains. Therefore, it is important for embedded systems researchers and practioners to be aware of the adaptation opportunities enabled by these systems, and to also be willing to invest their time, resources, and creativity to efficiently design and/or leverage these systems. 5. SUMMARY Runtime adaptation is a promising opportunity to resolve many pressing computing challenges today, including resource contention, power, code compatibility, and security. The need for adaptive software is especially true in the case of embedded systems, where each of these concerns is magnified. Several process-level virtualization frameworks are available that enable effective runtime adaptation of software. Unfortunately, a set of key challenges remain that have prevented widespread acceptance of these frameworks by the embedded systems community. Most of the major challenges fall under the realm of the standard memoryperformance tradeoff. In the general-purpose computing domain, developers have tended to favor performance at the expense of memory consumption. However, such an approach is inappropriate in the embedded domain, and there is still much work to be done to find that ideal balance. Acknowledgments This work is funded by NSF Grants CNS , CNS , CNS , and gift awards from Google and Microsoft. The implementation of Pin for ARM was a group effort involving several people including Artur Klauser, Robert Muth, and Robert Cohn. Several of my students contributed to this paper: Dan Upton gathered the resource contention and temperature heterogeneity data, and Apala Guha developed several of the memory-management algorithms for ARM. Finally, I wish to thank the organizers of this DAC special session for providing this opportunity: Christoph Kirsch and Sorin Lerner. 6. REFERENCES [1] J. Baiocchi, B. R. Childers, J. W. Davidson, J. D. Hiser, and J. Misurda. Fragment cache management for dynamic binary translators in embedded systems with scratchpad. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES 07, pages 75 84, Salzburg, Austria, 2007.
6 [2] V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 00, pages 1 12, Vancouver, BC, Canada, June [3] D. Bruening and S. Amarasinghe. Maintaining consistency and bounding capacity of software code caches. In Proceedings of the 3rd Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO 05, pages 74 85, San Jose, CA, March [4] D. Bruening, T. Garnett, and S. Amarasinghe. An infrastructure for adaptive dynamic optimization. In Proceedings of the 1st Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO 03, pages , San Francisco, CA, March [5] A. Cohen and E. Rohou. Processor virtualization and split compilation for heterogeneous multicore embedded systems. In Proceedings of the 47th Design Automation Conference, DAC 10, pages , Anaheim, California, ACM. [6] D. Davis and K. Hazelwood. Improving region selection through loop completion. In Proceedings of the ASPLOS Workshop on Runtime Environments, Systems, Layering, and Virtualized Environments, RESoLVE 11, Newport Beach, CA, March [7] G. Desoli, N. Mateev, E. Duesterwald, P. Faraboschi, and J. A. Fisher. Deli: A new run-time control point. In Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture, MICRO-35, pages , Istanbul, Turkey, [8] B. Dhanasekaran and K. Hazelwood. Improving indirect branch translation in dynamic binary translators. In Proceedings of the ASPLOS Workshop on Runtime Environments, Systems, Layering, and Virtualized Environments, RESoLVE 11, Newport Beach, CA, March [9] A. Guha, K. Hazelwood, and M. L. Soffa. Balancing memory and performance through selective flushing of software code caches. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES 10, pages 1 10, Scottsdale, AZ, October [10] K. Hazelwood. Dynamic Binary Modification: Tools, Techniques, and Applications. Morgan & Claypool Publishers, March [11] K. Hazelwood and A. Klauser. A dynamic binary instrumentation engine for the arm architecture. In Proceedings of the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems, CASES 06, pages , Seoul, Korea, October [12] K. Hazelwood and M. D. Smith. Managing bounded code caches in dynamic binary optimization systems. Transactions on Code Generation and Optimization, 3(3): , September [13] D. J. Hiniker, K. Hazelwood, and M. D. Smith. Improving region selection in dynamic optimization systems. In Proceedings of the 38th Annual International Symposium on Microarchitecture, MICRO-38, pages , Barcelona, Spain, November [14] J. D. Hiser, D. Williams, A. Filipi, J. W. Davidson, and B. R. Childers. Evaluating fragment construction policies for sdt systems. In Proceedings of the 2nd International Conference on Virtual Execution Environments, VEE 06, pages , Ottawa, Ontario, Canada, [15] J. D. Hiser, D. Williams, W. Hu, J. W. Davidson, J. Mars, and B. R. Childers. Evaluating indirect branch handling mechanisms in software dynamic translation systems. In Proceedings of the 5th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO 07, pages 61 73, San Jose, CA, March [16] V. Kiriansky, D. Bruening, and S. Amarasinghe. Secure execution via program shepherding. In Proceedings of the 11th USENIX Security Symposium, pages , San Francisco, CA, August [17] J. Lu, H. Chen, P.-C. Yew, and W. chung Hsu. Design and implementation of a lightweight dynamic optimization system. Journal of Instruction-Level Parallelism, 6(1):1 24, April [18] C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 05, pages , Chicago, IL, June [19] R. W. Moore, J. A. Baiocchi, B. R. Childers, J. W. Davidson, and J. D. Hiser. Addressing the challenges of dbt for the arm architecture. In Proceedings of the ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems, LCTES 09, pages , Dublin, Ireland, [20] N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 07, pages , San Diego, CA, June [21] K. Scott, N. Kumar, S. Velusamy, B. Childers, J. W. Davidson, and M. L. Soffa. Retargetable and reconfigurable software dynamic translation. In Proceedings of the 1st Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO 03, pages 36 47, San Francisco, CA, March [22] D. Williams, A. Sanyal, D. Upton, J. Mars, S. Ghosh,, and K. Hazelwood. A cross-layer approach to heterogeneity and reliability. In Proceedings of the 7th ACM/IEEE International Conference on Formal Methods and Models for Co-Design, MEMOCODE 09, pages 88 97, Cambridge, MA, July [23] L. Zhang and C. Krintz. Adaptive code unloading for resource-constrained jvms. In Proceedings of the ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems, LCTES 04, pages , Washington, DC, [24] L. Zhang and C. Krintz. Profile-driven code unloading for resource-constrained jvms. In Proceedings of the 3rd International Symposium on Principles and Practice of Programming in Java, PPPJ 04, pages 83 90, Las Vegas, Nevada, [25] S. Zhou, B. R. Childers, and M. L. Soffa. Planning for code buffer management in distributed virtual execution environments. In Proceedings of the 1st ACM/USENIX International Conference on Virtual Execution Environments, VEE 05, pages , Chicago, IL, 2005.
Virtual Execution Environments: Support and Tools
Virtual Execution Environments: Support and Tools Apala Guha, Jason D. Hiser, Naveen Kumar, Jing Yang, Min Zhao, Shukang Zhou Bruce R. Childers, Jack W. Davidson, Kim Hazelwood, Mary Lou Soffa Department
Performance Characterization of SPEC CPU2006 Integer Benchmarks on x86-64 64 Architecture
Performance Characterization of SPEC CPU2006 Integer Benchmarks on x86-64 64 Architecture Dong Ye David Kaeli Northeastern University Joydeep Ray Christophe Harle AMD Inc. IISWC 2006 1 Outline Motivation
evm Virtualization Platform for Windows
B A C K G R O U N D E R evm Virtualization Platform for Windows Host your Embedded OS and Windows on a Single Hardware Platform using Intel Virtualization Technology April, 2008 TenAsys Corporation 1400
Analysis of Memory Sensitive SPEC CPU2006 Integer Benchmarks for Big Data Benchmarking
Analysis of Memory Sensitive SPEC CPU2006 Integer Benchmarks for Big Data Benchmarking Kathlene Hurt and Eugene John Department of Electrical and Computer Engineering University of Texas at San Antonio
A Dynamic Binary Translation System in a Client/Server Environment
TR-IIS-12-2 A Dynamic Binary Translation System in a Client/Server Environment Ding-Yong Hong, Chun-Chen Hsu, Chao-Rui Chang, Jan-Jan Wu, Pen-Chung Yew, Wei-Chung Hsu, Pangfeng Liu, Chien-Min Wang, Yeh-Ching
secubt : Hacking the Hackers with User-Space Virtualization
secubt : Hacking the Hackers with User-Space Virtualization Mathias Payer Department of Computer Science ETH Zurich Abstract In the age of coordinated malware distribution and zero-day exploits security
An OS-oriented performance monitoring tool for multicore systems
An OS-oriented performance monitoring tool for multicore systems J.C. Sáez, J. Casas, A. Serrano, R. Rodríguez-Rodríguez, F. Castro, D. Chaver, M. Prieto-Matias Department of Computer Architecture Complutense
Code Coverage Testing Using Hardware Performance Monitoring Support
Code Coverage Testing Using Hardware Performance Monitoring Support Alex Shye Matthew Iyer Vijay Janapa Reddi Daniel A. Connors Department of Electrical and Computer Engineering University of Colorado
Virtual Platforms Addressing challenges in telecom product development
white paper Virtual Platforms Addressing challenges in telecom product development This page is intentionally left blank. EXECUTIVE SUMMARY Telecom Equipment Manufacturers (TEMs) are currently facing numerous
Full and Para Virtualization
Full and Para Virtualization Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF x86 Hardware Virtualization The x86 architecture offers four levels
Compiler-Assisted Binary Parsing
Compiler-Assisted Binary Parsing Tugrul Ince [email protected] PD Week 2012 26 27 March 2012 Parsing Binary Files Binary analysis is common for o Performance modeling o Computer security o Maintenance
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada [email protected] Micaela Serra
PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE
PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE Sudha M 1, Harish G M 2, Nandan A 3, Usha J 4 1 Department of MCA, R V College of Engineering, Bangalore : 560059, India [email protected] 2 Department
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General
Automatic Logging of Operating System Effects to Guide Application-Level Architecture Simulation
Automatic Logging of Operating System Effects to Guide Application-Level Architecture Simulation Satish Narayanasamy, Cristiano Pereira, Harish Patil, Robert Cohn, and Brad Calder Computer Science and
SPIRE: Improving Dynamic Binary Translation through SPC-Indexed Indirect Branch Redirecting
SPIRE: Improving Dynamic Binary Translation through SPC-Indexed Indirect Branch Redirecting Ning Jia Chun Yang Jing Wang Dong Tong Keyi Wang Department of Computer Science and Technology, Peking University,
On Demand Loading of Code in MMUless Embedded System
On Demand Loading of Code in MMUless Embedded System Sunil R Gandhi *. Chetan D Pachange, Jr.** Mandar R Vaidya***, Swapnilkumar S Khorate**** *Pune Institute of Computer Technology, Pune INDIA (Mob- 8600867094;
Advanced Core Operating System (ACOS): Experience the Performance
WHITE PAPER Advanced Core Operating System (ACOS): Experience the Performance Table of Contents Trends Affecting Application Networking...3 The Era of Multicore...3 Multicore System Design Challenges...3
VMware Server 2.0 Essentials. Virtualization Deployment and Management
VMware Server 2.0 Essentials Virtualization Deployment and Management . This PDF is provided for personal use only. Unauthorized use, reproduction and/or distribution strictly prohibited. All rights reserved.
Four Keys to Successful Multicore Optimization for Machine Vision. White Paper
Four Keys to Successful Multicore Optimization for Machine Vision White Paper Optimizing a machine vision application for multicore PCs can be a complex process with unpredictable results. Developers need
Curriculum Vitae. Jason D. Hiser Department of Computer Science Phone: (434)284-2214 85 Engineer s Way Fax: (434)982-2214
Curriculum Vitae Jason D. Hiser Department of Computer Science Phone: (434)284-2214 85 Engineer s Way Fax: (434)982-2214 P.O. Box 400740 Email: [email protected] Charlottesville, VA 22904-4740 http://www.cs.virginia.edu/
12. Introduction to Virtual Machines
12. Introduction to Virtual Machines 12. Introduction to Virtual Machines Modern Applications Challenges of Virtual Machine Monitors Historical Perspective Classification 332 / 352 12. Introduction to
IA-64 Application Developer s Architecture Guide
IA-64 Application Developer s Architecture Guide The IA-64 architecture was designed to overcome the performance limitations of today s architectures and provide maximum headroom for the future. To achieve
Intel Virtualization Technology (VT) in Converged Application Platforms
Intel Virtualization Technology (VT) in Converged Application Platforms Enabling Improved Utilization, Change Management, and Cost Reduction through Hardware Assisted Virtualization White Paper January
The Benefits of Virtualizing Citrix XenApp with Citrix XenServer
White Paper The Benefits of Virtualizing Citrix XenApp with Citrix XenServer This white paper will discuss how customers can achieve faster deployment, higher reliability, easier management, and reduced
Virtual Machines. www.viplavkambli.com
1 Virtual Machines A virtual machine (VM) is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software
MCCCSim A Highly Configurable Multi Core Cache Contention Simulator
MCCCSim A Highly Configurable Multi Core Cache Contention Simulator Michael Zwick, Marko Durkovic, Florian Obermeier, Walter Bamberger and Klaus Diepold Lehrstuhl für Datenverarbeitung Technische Universität
How To Write A Multi Threaded Software On A Single Core (Or Multi Threaded) System
Multicore Systems Challenges for the Real-Time Software Developer Dr. Fridtjof Siebert aicas GmbH Haid-und-Neu-Str. 18 76131 Karlsruhe, Germany [email protected] Abstract Multicore systems have become
Fine-Grained User-Space Security Through Virtualization. Mathias Payer and Thomas R. Gross ETH Zurich
Fine-Grained User-Space Security Through Virtualization Mathias Payer and Thomas R. Gross ETH Zurich Motivation Applications often vulnerable to security exploits Solution: restrict application access
COM 444 Cloud Computing
COM 444 Cloud Computing Lec 3: Virtual Machines and Virtualization of Clusters and Datacenters Prof. Dr. Halûk Gümüşkaya [email protected] [email protected] http://www.gumuskaya.com Virtual
11.1 inspectit. 11.1. inspectit
11.1. inspectit Figure 11.1. Overview on the inspectit components [Siegl and Bouillet 2011] 11.1 inspectit The inspectit monitoring tool (website: http://www.inspectit.eu/) has been developed by NovaTec.
Java Virtual Machine: the key for accurated memory prefetching
Java Virtual Machine: the key for accurated memory prefetching Yolanda Becerra Jordi Garcia Toni Cortes Nacho Navarro Computer Architecture Department Universitat Politècnica de Catalunya Barcelona, Spain
COS 318: Operating Systems. Virtual Machine Monitors
COS 318: Operating Systems Virtual Machine Monitors Kai Li and Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall13/cos318/ Introduction u Have
Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters
Interpreters and virtual machines Michel Schinz 2007 03 23 Interpreters Interpreters Why interpreters? An interpreter is a program that executes another program, represented as some kind of data-structure.
Masters Project Proposal
Masters Project Proposal Virtual Machine Storage Performance Using SR-IOV by Michael J. Kopps Committee Members and Signatures Approved By Date Advisor: Dr. Jia Rao Committee Member: Dr. Xiabo Zhou Committee
Double-Take Replication in the VMware Environment: Building DR solutions using Double-Take and VMware Infrastructure and VMware Server
Double-Take Replication in the VMware Environment: Building DR solutions using Double-Take and VMware Infrastructure and VMware Server Double-Take Software, Inc. 257 Turnpike Road; Suite 210 Southborough,
Virtualization is set to become a key requirement
Xen, the virtual machine monitor The art of virtualization Moshe Bar Virtualization is set to become a key requirement for every server in the data center. This trend is a direct consequence of an industrywide
N-Variant Systems. Slides extracted from talk by David Evans. (provenance in footer) http://www.cs.virginia.edu/evans/sdwest
1 N-Variant Systems Slides extracted from talk by David Evans (provenance in footer) 2 Inevitability of Failure Despite all the best efforts to build secure software, we will still fail (or have to run
BridgeWays Management Pack for VMware ESX
Bridgeways White Paper: Management Pack for VMware ESX BridgeWays Management Pack for VMware ESX Ensuring smooth virtual operations while maximizing your ROI. Published: July 2009 For the latest information,
What Is Specific in Load Testing?
What Is Specific in Load Testing? Testing of multi-user applications under realistic and stress loads is really the only way to ensure appropriate performance and reliability in production. Load testing
Use service virtualization to remove testing bottlenecks
Use service virtualization to remove testing bottlenecks Discover integration faults early by pushing integration testing left in the software lifecycle Contents 1 Complex, interconnected applications
Wiggins/Redstone: An On-line Program Specializer
Wiggins/Redstone: An On-line Program Specializer Dean Deaver Rick Gorton Norm Rubin {dean.deaver,rick.gorton,norm.rubin}@compaq.com Hot Chips 11 Wiggins/Redstone 1 W/R is a Software System That: u Makes
The Benefits of POWER7+ and PowerVM over Intel and an x86 Hypervisor
The Benefits of POWER7+ and PowerVM over Intel and an x86 Hypervisor Howard Anglin [email protected] IBM Competitive Project Office May 2013 Abstract...3 Virtualization and Why It Is Important...3 Resiliency
Understanding Full Virtualization, Paravirtualization, and Hardware Assist. Introduction...1 Overview of x86 Virtualization...2 CPU Virtualization...
Contents Introduction...1 Overview of x86 Virtualization...2 CPU Virtualization...3 The Challenges of x86 Hardware Virtualization...3 Technique 1 - Full Virtualization using Binary Translation...4 Technique
Virtual Machine Learning: Thinking Like a Computer Architect
Virtual Machine Learning: Thinking Like a Computer Architect Michael Hind IBM T.J. Watson Research Center March 21, 2005 CGO 05 Keynote 2005 IBM Corporation What is this talk about? Virtual Machines? 2
Accelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs
SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND -Based SSDs Sangwook Shane Hahn, Sungjin Lee, and Jihong Kim Department of Computer Science and Engineering, Seoul National University,
Hypervisors and Virtual Machines
Hypervisors and Virtual Machines Implementation Insights on the x86 Architecture DON REVELLE Don is a performance engineer and Linux systems/kernel programmer, specializing in high-volume UNIX, Web, virtualization,
BY STEVE BROWN, CADENCE DESIGN SYSTEMS AND MICHEL GENARD, VIRTUTECH
WHITE PAPER METRIC-DRIVEN VERIFICATION ENSURES SOFTWARE DEVELOPMENT QUALITY BY STEVE BROWN, CADENCE DESIGN SYSTEMS AND MICHEL GENARD, VIRTUTECH INTRODUCTION The complexity of electronic systems is rapidly
Virtualization for Cloud Computing
Virtualization for Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF CLOUD COMPUTING On demand provision of computational resources
The Microsoft Windows Hypervisor High Level Architecture
The Microsoft Windows Hypervisor High Level Architecture September 21, 2007 Abstract The Microsoft Windows hypervisor brings new virtualization capabilities to the Windows Server operating system. Its
Virtualization. Clothing the Wolf in Wool. Wednesday, April 17, 13
Virtualization Clothing the Wolf in Wool Virtual Machines Began in 1960s with IBM and MIT Project MAC Also called open shop operating systems Present user with the view of a bare machine Execute most instructions
Driving force. What future software needs. Potential research topics
Improving Software Robustness and Efficiency Driving force Processor core clock speed reach practical limit ~4GHz (power issue) Percentage of sustainable # of active transistors decrease; Increase in #
x86 ISA Modifications to support Virtual Machines
x86 ISA Modifications to support Virtual Machines Douglas Beal Ashish Kumar Gupta CSE 548 Project Outline of the talk Review of Virtual Machines What complicates Virtualization Technique for Virtualization
Achieving QoS in Server Virtualization
Achieving QoS in Server Virtualization Intel Platform Shared Resource Monitoring/Control in Xen Chao Peng ([email protected]) 1 Increasing QoS demand in Server Virtualization Data center & Cloud infrastructure
Optimizing Shared Resource Contention in HPC Clusters
Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs
VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5
Performance Study VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 VMware VirtualCenter uses a database to store metadata on the state of a VMware Infrastructure environment.
OPERATING SYSTEM - VIRTUAL MEMORY
OPERATING SYSTEM - VIRTUAL MEMORY http://www.tutorialspoint.com/operating_system/os_virtual_memory.htm Copyright tutorialspoint.com A computer can address more memory than the amount physically installed
Computer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 6 Fundamentals in Performance Evaluation Computer Architecture Part 6 page 1 of 22 Prof. Dr. Uwe Brinkschulte,
PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts
PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts Dan Connors, Kyle Dunn, and Ryan Bueter Department of Electrical Engineering University of Colorado Denver Denver, Colorado
Introduction to Virtual Machines
Introduction to Virtual Machines Introduction Abstraction and interfaces Virtualization Computer system architecture Process virtual machines System virtual machines 1 Abstraction Mechanism to manage complexity
Virtualization. Jukka K. Nurminen 23.9.2015
Virtualization Jukka K. Nurminen 23.9.2015 Virtualization Virtualization refers to the act of creating a virtual (rather than actual) version of something, including virtual computer hardware platforms,
Facing the Challenges for Real-Time Software Development on Multi-Cores
Facing the Challenges for Real-Time Software Development on Multi-Cores Dr. Fridtjof Siebert aicas GmbH Haid-und-Neu-Str. 18 76131 Karlsruhe, Germany [email protected] Abstract Multicore systems introduce
Resource Utilization of Middleware Components in Embedded Systems
Resource Utilization of Middleware Components in Embedded Systems 3 Introduction System memory, CPU, and network resources are critical to the operation and performance of any software system. These system
Building Applications Using Micro Focus COBOL
Building Applications Using Micro Focus COBOL Abstract If you look through the Micro Focus COBOL documentation, you will see many different executable file types referenced: int, gnt, exe, dll and others.
Mission-Critical Java. An Oracle White Paper Updated October 2008
Mission-Critical Java An Oracle White Paper Updated October 2008 Mission-Critical Java The Oracle JRockit family of products is a comprehensive portfolio of Java runtime solutions that leverages the base
On Benchmarking Popular File Systems
On Benchmarking Popular File Systems Matti Vanninen James Z. Wang Department of Computer Science Clemson University, Clemson, SC 2963 Emails: {mvannin, jzwang}@cs.clemson.edu Abstract In recent years,
Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation
Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely
Course Development of Programming for General-Purpose Multicore Processors
Course Development of Programming for General-Purpose Multicore Processors Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University Richmond, VA 23284 [email protected]
Intel Application Software Development Tool Suite 2.2 for Intel Atom processor. In-Depth
Application Software Development Tool Suite 2.2 for Atom processor In-Depth Contents Application Software Development Tool Suite 2.2 for Atom processor............................... 3 Features and Benefits...................................
COS 318: Operating Systems. Virtual Machine Monitors
COS 318: Operating Systems Virtual Machine Monitors Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall10/cos318/ Introduction Have been around
Hypervisors. Introduction. Introduction. Introduction. Introduction. Introduction. Credits:
Hypervisors Credits: P. Chaganti Xen Virtualization A practical handbook D. Chisnall The definitive guide to Xen Hypervisor G. Kesden Lect. 25 CS 15-440 G. Heiser UNSW/NICTA/OKL Virtualization is a technique
Design and Implementation of the Heterogeneous Multikernel Operating System
223 Design and Implementation of the Heterogeneous Multikernel Operating System Yauhen KLIMIANKOU Department of Computer Systems and Networks, Belarusian State University of Informatics and Radioelectronics,
Enhance Service Delivery and Accelerate Financial Applications with Consolidated Market Data
White Paper Enhance Service Delivery and Accelerate Financial Applications with Consolidated Market Data What You Will Learn Financial market technology is advancing at a rapid pace. The integration of
MAGENTO HOSTING Progressive Server Performance Improvements
MAGENTO HOSTING Progressive Server Performance Improvements Simple Helix, LLC 4092 Memorial Parkway Ste 202 Huntsville, AL 35802 [email protected] 1.866.963.0424 www.simplehelix.com 2 Table of Contents
Parallels Virtuozzo Containers vs. VMware Virtual Infrastructure:
Parallels Virtuozzo Containers vs. VMware Virtual Infrastructure: An Independent Architecture Comparison TABLE OF CONTENTS Introduction...3 A Tale of Two Virtualization Solutions...5 Part I: Density...5
Full System Emulation:
Full System Emulation: Achieving Successful Automated Dynamic Analysis of Evasive Malware Christopher Kruegel Lastline, Inc. [email protected] 1 Introduction Automated malware analysis systems (or sandboxes)
A Unified View of Virtual Machines
A Unified View of Virtual Machines First ACM/USENIX Conference on Virtual Execution Environments J. E. Smith June 2005 Introduction Why are virtual machines interesting? They allow transcending of interfaces
Virtualization Technologies (ENCS 691K Chapter 3)
Virtualization Technologies (ENCS 691K Chapter 3) Roch Glitho, PhD Associate Professor and Canada Research Chair My URL - http://users.encs.concordia.ca/~glitho/ The Key Technologies on Which Cloud Computing
Cloud Computing #6 - Virtualization
Cloud Computing #6 - Virtualization Main source: Smith & Nair, Virtual Machines, Morgan Kaufmann, 2005 Today What do we mean by virtualization? Why is it important to cloud? What is the penalty? Current
Uses for Virtual Machines. Virtual Machines. There are several uses for virtual machines:
Virtual Machines Uses for Virtual Machines Virtual machine technology, often just called virtualization, makes one computer behave as several computers by sharing the resources of a single computer between
Garbage Collection in the Java HotSpot Virtual Machine
http://www.devx.com Printed from http://www.devx.com/java/article/21977/1954 Garbage Collection in the Java HotSpot Virtual Machine Gain a better understanding of how garbage collection in the Java HotSpot
Monitoring Databases on VMware
Monitoring Databases on VMware Ensure Optimum Performance with the Correct Metrics By Dean Richards, Manager, Sales Engineering Confio Software 4772 Walnut Street, Suite 100 Boulder, CO 80301 www.confio.com
Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies
Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Kurt Klemperer, Principal System Performance Engineer [email protected] Agenda Session Length:
