Micro-architectural Characterization of Desktop Cloud Workloads
|
|
|
- Jacob Gilmore
- 10 years ago
- Views:
Transcription
1 Micro-architectural Characterization of Desktop Cloud Workloads Tao Jiang,RuiHou, Lixin Zhang, Ke Zhang, Licheng Chen, Mingyu Chen, Ninghui Sun, State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, China Graduate University of Chinese Academy of Sciences, Beijing, China { jiangtao, hourui, zhanglixin, zhangke, chenlicheng, cmy, snh Abstract Desktop cloud replaces traditional desktop computers with completely virtualized systems from the cloud. It is becoming one of the fastest growing segments in the cloud computing market. However, as far as we know, there is little work done to understand the behavior of desktop cloud. On one hand, desktop cloud workloads are different from conventional data center workloads in that they are rich with interactive operations. Desktop cloud workloads are different from traditional nonvirtualized desktop workloads in that they have an extra layer of software stack hypervisor. On the other hand, desktop cloud servers are mostly built with conventional commodity processors. While such processors are well optimized for traditional desktops and high performance computing workloads, their effectiveness for desktop cloud workloads remains to be studied. As an attempt to shed some lights on the effectiveness of conventional general-purpose processors on desktop cloud workloads, we have studied the behavior of desktop cloud workloads and compared it with that of SPEC CPU2006, TPC-C, PARSEC, and CloudSuite. We evaluate a Xen-based virtualization platform. The performance results reveal that desktop cloud workloads have significantly different characteristics with SPEC CPU2006, TPC-C and PARSEC, but they perform similarly with data center scale-out benchmarks from CloudSuite. In particular, desktop cloud workloads have high instruction cache miss rate (12.7% on average), high percentage of kernel instructions (23%, on average), and low IPC (0.36 on average). And they have much higher TLB miss rates and lower utilization of off-chip memory bandwidth than traditional benchmarks. Our experimental numbers indicate that the effectiveness of existing commodity processors is quite low for desktop cloud workloads. In this paper, we provide some preliminary discussions on some potential architectural and micro-architectural enhancements. We hope that the performance numbers presented in this paper will give some insights to the designers of desktop cloud systems. I. INTRODUCTION Desktop cloud, a.k.a. virtual desktop infrastructure, replaces the traditional desktops and workstations (e.g. [24, 29]) with virtual machines (s) running on centralized data center servers. It is being accepted by more and more entities as the office computer platform and becoming one of the fastest growing segments in the cloud computing market. With desktop cloud, users can access the same desktop with their own applications and data without being tied to a single client device. Desktop cloud allows IT administrators to maintain a centralized environment and quickly respond to the changing needs of the users and business. It allocates resources to users on an as-needed basis and allows users to share resources, reducing the number of hardware units to purchase by an entity, and furthermore reducing the total cost of ownership. The servers providing desktop cloud service are typically built with off-the-shelf commodity processors, most notably general-purpose X86 server processors. The commercial general-purpose processors adopt many optimization techniques, such as out-of-order super-scalar and multi-level cache hierarchy with sophisticated coherence protocols [3, 4], that have greatly improved single-thread performance. As data center applications get more diversified, they have also adopted many additional features, such as hardware assisted virtualization [7], efficient voltage and frequency scaling [19, 20], many core architectures [18], and wider I/O and memory bandwidth [17], to meet the needs of the increasing variety of data center applications. Desktop cloud workloads differ with traditional nonvirtualized applications and data center scale-out applications as they run on virtualized environments and have abundant interactive operations. One key question is whether the server processors designed mostly for traditional applications can support desktop cloud services efficiently. As an attempt to answer the question, we have evaluated typical desktop cloud workloads in a real deployment. Our evaluation captures the behavior of the entire software stack (application, OS, and hypervisor) via source code instrumentation, hardware performance counters, and a custom-made memory trace collection device. To put the performance numbers of desktop cloud workloads in perspective, we have evaluated some traditional benchmarks, including SPEC CPU2006 [23] as single-threaded ILP (Instruction Level Parallelism) benchmarks, TPC-C [26] as a transaction TLP (Thread Level Parallelism) benchmark, PARSEC [15] as parallel TLP benchmarks, and CloudSuite [28] as data center scale-out benchmarks. We find that desktop cloud workloads exhibit significantly different characteristics with SPEC CPU2006, TPC-C, and PARSEC. While they are also different with CloudSuite benchmarks, the difference with CloudSuite is relatively smaller as they share some characteristics. The major insights derived from our evaluation are as follows. Cache hierarchy has uneven performance. Desktop cloud workloads in our experiments exhibit high I-Cache (instruction cache) miss rates, high L2 cache miss rates, but very low LLC (Last-Level Cache, L3 in our ex /12/$ IEEE 1
2 periments) miss rates. These numbers indicate that even though LLC is very effective, I-Cache is not sufficiently large and L2 is extremely ineffective. Both instruction TLB and data TLB have poor performance. Desktop cloud workloads running on the virtualized environment suffer from much higher numbers of instruction and data TLB misses than those running on the non-virtualized environment. These numbers imply that TLBs are inadequate in addressing the extra needs brought upon by virtualization. Off-chip memory bandwidth consumption is quite low. To our surprise, desktop cloud workloads do not have a high number of memory accesses. While this phenomenon might change after addressing the inefficiency in other parts of the system, it does show the potential on managing the memory bandwidth for improved power efficiency. Desktop cloud workloads typically do not require full cache coherence. We have observed that cache-tocache communication pattern of desktop cloud workloads follows a star topology with one central core and point-topoint communication between the central core and each of the rest cores. There is no or little communication among the non-central cores. To the best of our knowledge, no mainline processors use a star topology among its on-chip caches. Their on-chip caches are normally connected through a bus, a ring, or a crossbar that optimizes cache-to-cache communication between any two cores. The rest of the paper is organized as follows. Section II discusses related work. Section III describes the experimental methodology and tools. Section IV shows the experimental results. Section V briefly brings up future work and concludes this paper. II. RELATED WORK Many researchers have studied the performance and resource utilization of data center computer systems with both traditional high-performance benchmarks and cloud computing benchmarks [1, 6, 11, 12]. Ebbers et al. [34] evaluated a virtual Linux desktop service created using eyeos, an open source web desktop, running on an IBM z-series server. They focused on the scalability of the system. Kochut et al. [2] analyzed CPU and memory usage on desktop running on native OS with the intent of optimizing desktop cloud management. Tickoo et al. [14] have demonstrated that one single parallel performance model does not fit for the virtualization environment of data center servers. They have proposed an alternative performance model. They have performed a detailed case study with the vconsolidate benchmark [13] and investigated core and cache contention effects of vconsolidate benchmark. SPECvirt sc2010 [36] isspec s firstbenchmarkaddressing performance evaluation of datacenter servers used in virtualized server consolidation. It includes typical server-side workloads such as SPECweb2005 and SPECmail2009. mark [37] from ware is a virtualization platform benchmark. It includes a variety of platform-level workloads such as dynamic relocation, cloning and deploying of s. These two benchmarks are not used in our study because our experiments focus on client-side workloads. Some researchers have used scale-out benchmarks to study today s dominant system architectures. Ferdman et al. [5] used hardware performance counters to study CloudSuite benchmarks. They discovered that existing systems were inefficient for running these benchmarks. To the best of our knowledge, this paper is the first to attempt to characterize desktop cloud in a real deployment. Our evaluation captures the behavior of the entire software stack via source code instrumentation, hardware performance counters and a custom-made hardware-based memory trace collection tool. We also compare the corresponding results with traditional ILP and TLP benchmarks and data center scale-out benchmarks. III. METHODOLOGY A. Hardware platform Our experiments are performed on a 10-node Supermicro X86 server. Each node contains two Intel Xeon 2.40GHz E5620 processors, 32GB DDR3, and two Gigabit Ethernet ports. Table I lists the major configuration parameters of each node. As in many real systems, large page support is not enabled on our platform. The number of nodes used by each benchmark varies according to the requirement of the workloads and will be disclosed accordingly in the following subsections. We also attempted to perform our study on an ATOM-based platform and an ARM-based platform, but we were unable to get Xen in full-virtualization mode running on these two platforms. Our attempt to get more support from vendors was also unsuccessful. CPU TABLE I: Hardware configuration. Intel Xeon 2.40GHz #Sockets 2 # Cores per Socket 4 # Threads per Core 1 (Hyper-Threading Disabled) L1I L1D L2 LLC (L3) Memory BIOS Configuration 32 KB, 4-way 32 KB, 8-way 256 KB, 8-way 12MB, 16-way 32 GB DDR3 800MHz Hyper-Threading Disabled Turbo-Boost Disabled Hardware-Assisted-Virtualization Enabled Hardware Prefetchers Enabled, unless indicated B. Workloads We evaluate a live desktop cloud and compare it with Cloud- Suite, SPEC CPU2006, TPC-C, and PARSEC benchmarks. 2
3 1) Desktop cloud workloads: There are a number of commercial desktop cloud offerings in market, including Desktone [30], Citrix s Xen Desktop [31], ware s View [33], and IBM Smart Business Desktop Cloud [24]. The platform we use is based on the open-source Xen [21]. It is used as the office computing platform by the students and staff members in our department. s Hypervisor Hardware orag Xen Server Storages Xen Server Management Server Net Work Laptop Fig. 1: Structure of the desktop cloud being tested. The structure of our desktop cloud environment is described in Figure 1. The hardware platform includes a number of servers, several storage servers, and one management server. All of them are connected through a 32-port Gigabit Switch. For the guest OS on Xen, we use CentOS 5.5 with Linux kernel for Domain0 (privileged, responsible for handling I/O operations), CentOS 5.5 with Linux kernel for para-virtualization DomainU (non-privileged virtual machine), and Windows XP for full-virtualization DomainU. Unless noted otherwise, we allocate 4GB memory for each Domain0 and 2GB memory for each DomainU. Each host node with 32GB memory thus can support 14 DomainUs. This empirical configuration ensures that all s get sufficient amount of memory while allowing as many s as possible to share one physical processor. It is a result of a stress test and the corresponding satisfaction measurement of real users. Even though our stress test shows that each node can support up to 42 DomainUs before it comes to crawl, having 14 DomainUs per node is the maximum number of s supported without users complaining constant slowdowns and irresponsiveness. In our platform, each machine has eight cores. Core 0 is dedicated to Domain0 and the rest of cores are used for DomainUs with each core running two DomainUs. Each DomainU is allocated with one virtual CPU (VCPU), 30GB disk space, and one RTL Mb virtual Ethernet card. RDP (Remote Desktop Protocol) is used as the communication protocol between clients and servers for Windows XP desktops, and XDMCP (X Display Manager Control Protocol) is used as the communication protocol for Linux gnome desktops. In order to guarantee the validity of the performance numbers, the evaluation is done without the awareness of desktop users. Table II contains a brief introduction of typical operations performed by desktop users. The most common daily operations recorded in our environment are editing files PC Thin Client Thin Client TABLE II: Common operations of desktop cloud users. Name Watching Video Surfing Web Web Downloading Office Work Browsing PDF Compressing & Decompressing Copying File Anti-virus Action Watching videos online Viewing webpage and processing s Downloading files directly or through P2P Editing Word, Excel, and PowerPoint files Viewing PDF files Decompressing files and compressing files Copying files to another disk partition Running anti-virus software (mostly MS Word and PowerPoint files), viewing (mostly PDF and MS Word)files, surfing the Web, watching videos,copying files, running anti-virus software, downloading files, running clients, and compressing and decompressing files. These workloads are similar to the View Planner benchmark [35] used to evaluate ware VDI deployments. Fig. 2: IPC trends. Despite the various optimizations proposed to accelerate migrations, we have found that migrations are rarely observed in our environment. In addition, we have observed that different physical nodes behave similarly when having the same number of users. Without losing generality, we have randomly selected one node to perform detailed evaluation. We have recorded the IPC of the selected node over a full day, as shown in Figure 2. It can be seen from the figure that the IPC becomes stable (from point A to point B) within six minutes after people start to work at 9:00 AM, and remains stable until 11:30 AM (point C) when people start to head out for lunch. During our experiments, we randomly select three time samples in the stable state and report the arithmetic mean numbers in the paper. During the stable state, the utilization rate of the core running Domain0 is 24.7% and the utilization 3
4 Fig. 3: IPC. Fig. 4: Pipeline stall breakdown. rate of the core running DomainUs is 21.3%. To enable the measurement of cache-to-cache transfers, we set up a special test environment with only one Domain0 and two DomainUs. We modified the BIOS to activate only two cores in each of the two processors within a node. Xen Domain0 uses one processor; while two DomainUs are pinned to two cores in another processor. 2) Data center benchmarks: CloudSuite consists of six benchmarks representing common data center applications of today. They run with the native input. a) Data Serving: A 10GB Yahoo! Cloud Serving Benchmark (YCSB) dataset is used as the input to Cassandra [27], which is configured with a 16GB Java heap and 800MB young generation. The server load is generated by a YCSB client that sends 10,000 operation requests per second following a Zipfian distribution with a 50:50 write to read ratio. b) Media Streaming: 20 Java processes are set up. Each Java process starts 50 client threads to issue requests simultaneously. Meanwhile, to minimize the impact of network, our setup is limited to low bit-rate video streams with GetMediumLow being 30 and GetShortLow being 70. c) Web Serving: One node is used as the web server, and another is used as the database server. 200 concurrent users send requests to the web server with 50 seconds ramp-up time, 1000 seconds steady state time and 30 seconds ramp-down time. d) Data Analytics: A 30GB Wikipedia data set is used. The machine learning library (Mahout) is provided by Apache and is built on top of Hadoop MapReduce framework. The default machine learning algorithm, Bayes, is used. e) Web Search: One node is used as the index processing server with 3GB index and 8GB data. Another one is used as the front-end server, which accepts requests and sends them to the index processing server. f) Software Testing: Cloud9 [22] is deployed on five nodes, including one load balancer and four workers that execute software-testing task automatically. The iteration cycle is set to be 1000 seconds. 3) SPEC CPU2006, TPC-C, and PARSEC benchmarks: All SPEC CPU2006 benchmarks are run with the reference input set and each run is pinned to a specified processor core via the thread affinity commands. TPC-C uses MySQL containing 40 warehouses. The load is generated by 32 clients running on one node. The ramp up time is 60 seconds, which is sufficient for the load to reach a steady state. All PARSEC benchmarks are run with the native input. C. Measurement tools For desktop cloud workloads, we use Xenoprof [10], a system-wide profiler for Xen-based virtualization systems, to profile the Xen hypervisor and the guest OS. We use oprofile [25] to collect processor hardware performance counters. Xeon E5620 processor allows only four performance counters to be collected at one time. We use time-multiplexing to collect the 20+ events we need. One drawback with the hardware performance counters is that they cannot identify the owning virtual machine of a memory reference and they cannot measure the interval between memory references, both of which are important to fully understand the performance of the memory system. To better understand the memory behavior, we use the enhanced Hyper Memory Trace Tracker (HMTT) [8], a platform independent memory trace monitoring system that sits between the memory controller and the DIMM on the board and records all offchip memory references. Due to the limited number of HMTT devices available to us, we install only one dual-ranked DDR3-800 DIMM for each processor when collecting memory access traces. During our experiments, HMTT each time collects memory references for a duration of 180 seconds after the machine reaches the stable state. We instrument the Xen source code with a number of software counters to record the number of events that we want to trace. The overhead is quite low and can be ignored. IV. EVALUATION AND ANALYSIS This section presents the performance numbers that we have collected. Figure 3 shows the IPC of the benchmarks 4
5 Fig. 5: L1 and L2 instruction misses per kilo-instruction. we have tested. IPC is calculated based on the number of architecture instructions, not the number of micro-ops. In this figure, dom0 in X-axis is the IPC of the processor core hosting Xen Domain0. DomU is the arithmetic mean of the IPC of all cores hosting Xen DomainUs. In order to simplify the figure, CloudSuite mean is the arithmetic mean of CloudSuite benchmarks, and SPEC INT mean/spec FP mean represents the arithmetic mean of all SPECCPU INT2006/FP2006 benchmarks. It can be observed from the figure that desktop cloud workloads show lower IPC than traditional benchmarks. For the conventional CPU-intensive applications, the IPC can reach 2 or above. On the other hand, the IPC of desktop cloud workloads is only 0.36 on average. Considering that Xeon E5620 has a theoretic peak IPC of 4.0, an IPC of 0.36 is a strong indication that the processor is ineffective. As the first attempt to understand the cause behind the low IPC, we break down the pipeline stall cycles in Figure 4. This figure shows that register allocation stalls, reservation station full stalls, reorder buffer full stalls and branch misprediction stalls of desktop cloud workloads are actually lower than those of other benchmarks. They are therefore not the cause of low IPC of desktop cloud workloads. The rest of this section will show that caches and TLBs are the real culprits. A. I-Cache The efficiency of instruction fetch directly affects the pipeline utilization. The L1 and L2 instruction reference misses per kilo-instruction (MPKI) are shown in Figure 5, in which domux represents the DomainU hosted by corex. Because domuxs show similar results for most of runs in our experiments, only their arithmetic means are listed in the rest of this section. Figure 5 shows that the I-Cache miss rates of Domain0 and DomainU are much higher than that of SPEC CPU2006 benchmarks. Specifically, the I-Cache miss rate of DomainU (23% on average) is 34 times higher than that of SPEC INT2006, and 100 times higher than that of SPEC FP2006. The I-Cache miss rate of DomainU is also higher than CloudSuite benchmarks. It is anticipated that such high I-Cache miss rates will lead to frequent pipeline stalls, causing significant performance degradation. The L2 instruction reference miss rate of desktop cloud workloads is also 27 times higher than that of traditional ILP benchmarks. Such an high L2 miss rate will lead to high I-Cache miss penalty, causing further performance degradation. TABLE III: Number of context switches. Vmexits per second Domain switches I/O instruction APIC access Others per second Fig. 6: L1 data cache miss rate. While we are unable to measure the content of I-Cache in real machine to fully understand the true cause behind the high number of I-Cache misses, it is our belief that one of the main reasons is the frequent context switch in Xen. Xen hypervisor uses split device driver model. The real device driver is located in Domain0. The device driver of DomainU 5
6 (a) ITLB miss rate (b) DTLB miss rate Fig. 7: ITLB and DTLB miss rate. Fig. 8: L2 cache miss rate. Fig. 9: L2 cache misses per kilo-instruction. Fig. 10: The number of cache-to-cache data transfers. Fig. 11: LLC misses per kilo-instruction. needs to interact with Domain0 to complete I/O requests. For full virtualization, every device access (normally through memory-mapped registers) needs hypervisor to intercept and emulate. A typical I/O operation needs a number of traps to operate a device, leading to a long execution path. In a virtualized environment, the desktop workloads request 6
7 a lot of network and disk operations, which inevitably cause lots of context switches (between hypervisor, Domain0 and DomainU). In addition, when multiple DomainUs running on the same physical core, switching between different DomainUs will further lead to more context switches. We instrument Xen to collect the number of context switches. Table III shows that there are vmexits (switching from DomainU to hypervisor), including 7156 switches caused by I/O instructions and 6607 switches caused by accessing Advanced Programmable Interrupt Controller (APIC), and 1002 inter-domainu switches per second, when running two s on one core. While dedicating one physical core to only one could reduce the number of context switches, it contradicts the essence of virtualization resource sharing. It is anticipated that running more s on one core would cause more context switches. B. L1 D-Cache Figure 6 displays L1 data cache miss rates. It shows that the average L1 data cache miss rate of Xen (10.2%) is a little bit higher than other benchmarks, while the miss rate of Domain0 is similar with other benchmarks. Compared with traditional applications, none of them shows much performance degradation, which indicates that existing L1 data cache works well with desktop cloud workloads. C. TLB The TLB (translation look-aside buffer) is a common hardware structure to accelerate virtual-to-physical address translation. A TLB miss triggers a process called page walk to load the associated translation into the TLB. To better support virtualization, Intel Nehalem architecture introduces a Virtual Processor ID (VPID) in its TLBs. Having VPID avoids TLB entries to be flushed when switching from one to another. Figure 7 presents the ITLB and DTLB miss rates. It shows that desktop cloud workloads have several times higher ITLB and DTLB miss rates than other benchmarks. Babka et al. [16] have shown that the average TLB miss penalty is larger than the average L2 cache miss penalty, even with hardware page walks. A TLB miss rate of 4.5% may seem small, but it could noticeably degrade the overall performance. One of the main reasons for high TLB miss rates is that many live contexts compete for the same set of TLB entries, causing many conflict misses. D. L2 cache Figure 8 displays the L2 miss rates. It shows that the average L2 miss rate of Domain0 is lower than that of other benchmarks. On the contrary, the average L2 miss rate of DomainU (68%) is significantly higher than that of other benchmarks. Figure 9 presents the L2 MPKI. It shows that L2 MPKI of DomainU is much greater than that of traditional ILP and TLP benchmarks. However, L2 MPKI of Domain0 is much lower than that of other benchmarks. The results indicate that while the L2 cache works well with Domain0, it is not adequate for DomainU. E. Cache-to-cache data transfers The percentage of L2 misses that hit another on-chip L2 cache (i.e., cache-to-cache transfers) is used as the metric for measuring core-to-core communication. Figure 10 shows that there are very few cache-to-cache transfers among different DomainUs. Only about 0.02% of L2 cache misses of a DomainU get data from a sibling L2 cache belonged to another DomainU running on the same chip. On the contrary, the coreto-core communication between Domain0 and DomainUs is high and similar with what PARSEC and TPC-C have. F. Last Level Cache The last level cache is the last defense to mitigate the speed gap between the processor and the off-chip memory. Most of today s server processors devote a large chip area to the LLC. Figure 11 shows that LLC misses occur rarely for all benchmarks. These results indicate that the existing (large) LLC works wonderfully for all benchmarks, even though the L1 instruction cache and the L2 cache do not work as well. Fig. 12: Impact of hardware prefetchers. G. Prefetcher In order to reduce the number of cache misses, many processors use hardware prefetchers to load cache lines before they are requested. Such mechanisms have been proven performing well with traditional benchmarks [9]. Figure 12 presents the TLB miss rates, cache miss rates and IPC of DomainU with both the hardware prefetcher on and off. The results in the figure show that enabling prefetcher noticeably increases the miss rate of each cache level and degrades performance. It is a strong indication that the hardware prefetcher fails in its role to reduce the number of cache misses. We speculate that frequent context switching disrupts the access patterns, rendering the prefetcher completely ineffective. H. Off-chip memory references We use HMTT to collect off-chip memory reference traces of desktop cloud workloads and several memory intensive benchmarks from SPEC CPU2006. Figure 13 shows the memory bandwidth consumed by each tested benchmark. In this figure, 4vm on xen means four s (DomainU) running 7
8 Fig. 13: Memory bandwidth consumption. Desktop cloud (left) and SPEC CPU2006 (right). Fig. 15: Memory reference interval of 4s on Xen. I. User/System breakdown on Xen in parallel, while 1vm on xen means only one (DomainU) running on Xen. It shows that even with four s, the Xen-based platform uses only 714MB/s, which is less than one-eighth of the peak bandwidth. Most of SEPC CPU2006 benchmarks use much more memory bandwidth, with the exception of bzip2, h264ref and povray. The average consumed memory bandwidth of the SPEC CPU2006 is three times higher than that of Xen with four s. Fig. 16: User/System breakdown of different benchmarks. Fig. 14: Memory bandwidth of 4s on Xen. To better understand the dynamic behavior of memory references, Figure 14 plots the consumed memory bandwidth over the time. Each point in this figure represents one millisecond. The results show that the consumed memory bandwidth does not fluctuate frequently over the time, with only few spikes. The average is below 1GB/s. Only a few points reach 3GB/s, and the highest is only 4GB/s. These numbers indicate that s do not access the memory frequently and do not consume lots of memory bandwidth. We select 10 microseconds traces and compute the time interval between every two consecutive memory access. Figure 15 shows that memory references are sparse with desktop cloud workloads. Figure 16 shows that the percentage of kernel and application instructions. Desktop cloud workloads have much higher percentage of kernel (Xen and guest OS ) instructions (ranging from 23% to 33%) than traditional ILP benchmarks. In particular, SPEC CPU2006 rarely executes kernel instructions (less than 1%). On the contrary, CloudSuite benchmarks execute mostly in the kernel mode (58% on average). These results demonstrate that the efficiency of kernel code would play a major role in the performance of desktop cloud workloads and CloudSuite benchmarks. J. Desktop workloads on native OS To illustrate the impact of virtualization on desktop workloads, we run a few representative operations of the desktop workloads on a native OS environment. The results are shown in table IV. Compared with running on the virtualized environment, desktop workloads running on the native OS environment have much higher IPC, much lower number of L1I/L2/LLC misses, and much lower TLB miss rates. In particular, virtualization increases the number of L1 instruction 8
9 misses by five times and the TLB miss rate by over 12 times. These numbers are a strong indication that virtualization could have dramatic performance impacts on desktop workloads. TABLE IV: Performance with and without virtualization. Native OS DomU IPC L1I MPKI L2 MPKI LLC MPKI ITLB Miss 0.09% 1.30% DTLB Miss 0.35% 4.36% K. Discussion Our experiments clearly demonstrate that the prevailing processor architecture optimized for traditional desktop and high performance computing applications does not bode well for desktop cloud workloads. Further optimization are needed for running desktop cloud workloads with improved efficiency. For instance, Domain0 and DomainUs have quite different characteristics. Domain0 is in charge of handling all I/O operations while DomainU runs guest OS and user applications. Such functional split between Domain0 and DomainU is common in virtualization environments. Hyper-V [32] also adopts that. This calls for both functional and microarchitectural heterogeneity among on-chip cores. In addition, desktop cloud workloads in our experiments exhibit high I- Cache and L2 cache miss rates, but very low LLC miss rates. These results indicate that opportunities exist for cache hierarchy optimizations. The core-to-core communication pattern of desktop cloud workloads follows the star-like topology with Domain0 as the central core and point-to-point communication between the Domain0 core and each of the DomainU cores, as there is no or little communication between the DomainU cores. Efficient support of this pattern requires a new onchip topology or some special extensions over the existing topologies like bus, ring, cross-bar, and mesh. V. CONCLUSION AND FUTURE WORK As desktop cloud is becoming more and more popular, it is important to understand if the existing systems can handle desktop cloud workloads in an efficient manner. In this paper, we use a large set of performance numbers gathered from a real desktop cloud to show that modern processors are ineffective in many ways for desktop cloud workloads. During our study, we have used instrumented codes, performance counters, and a special device to analyze desktop cloud workloads, SPEC CPU2006, TPC-C, PARSEC and Cloud- Suite benchmarks. We have collected performance numbers about I-Cache, L1 D-Cache, TLB, L2 cache, LLC, cacheto-cache transfers, off-chip memory references, and hardware prefetchers. Our experimental results demonstrate that the processor architecture optimized for traditional benchmarks is not efficient for desktop cloud workloads. Compared to traditional benchmarks, desktop cloud workloads have much higher I-Cache miss rates, much higher L2 miss rates, several times higher ITLB/DTLB miss rates. In addition, desktop cloud workloads have great LLC performance, very few cacheto-cache transfers, and high percentages of kernel instructions. The drawback of measuring a real system is that it is difficult to obtain detailed information to uncover the root cause for some of the problems. We are building a software simulator platform to assist in performing in-depth analysis. We are also in the middle of using the empirical results to guide the design of an efficient desktop cloud system. ACKNOWLEDGMENT We would like to thank many group members for their help and feedbacks in experimental setup and paper writing. We are grateful to anonymous reviewers for their encouraging comments. This work is supported by the Strategic Priority Research Program of the Chinese Academy of Sciences, Grant No. XDA REFERENCES [1] N. E. Jerger, D. Vantrease, and M. Lipasti, An Evaluation of Server Consolidation Workloads for Multi-Core Designs, in Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization (IISWC 07), Washington, DC, USA, 2007, pp [2] A. Kochut, K. Beaty, H. Shaikh, and D. G. Shea, Desktop workload study with implications for desktop cloud resource optimization, in Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW 10), 2010 IEEE International Symposium on, 2010, pp [3] J. L. Hennessy and D. A. Patterson, Computer Architecture - A Quantitative Approach, 4th ed. Morgan Kaufmann, [4] J. Sharkey and D. Ponomarev, Balancing ILP and TLP in SMT Architectures through Out-of-Order Instruction Dispatch, in Proceedings of the 2006 International Conference on Parallel Processing (ICPP 06), Washington, DC, USA, 2006, pp [5] Michael Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware, in 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 12), [6] H. Xi, J. Zhan, Z. Jia, X. Hong, L. Wang, L. Zhang, N. Sun, and G. Lu, Characterization of real workloads of web search engines, in 2011 IEEE International Symposium on Workload Characterization (IISWC 11), 2011, pp [7] K. Adams and O. Agesen, A comparison of software and hardware techniques for x86 virtualization, in Proceedings of the 12th international conference on Architectural Support for Programming Languages and Operating Sys- 9
10 tems (ASPLOS 06), New York, NY, USA, 2006, pp [8]Y.Bao,M.Chen,Y.Ruan,L.Liu,J.Fan,Q.Yuan,B. Song, and J. Xu, HMTT: a platform independent fullsystem memory trace monitoring system, SIGMETRICS Perform. Eval. Rev., vol. 36, no. 1, pp , [9] S. Palacharla and R. E. Kessler, Evaluating stream buffers as a secondary cache replacement, in Proceedings of the 21st annual International Symposium on Computer Architecture (ISCA 94), Los Alamitos, CA, USA, 1994, pp [10] A. Menon, J. R. Santos, Y. Turner, G. (John) Janakiraman, and W. Zwaenepoel, Diagnosing performance overheads in the xen virtual machine environment, in Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments (VEE 05), New York, NY, USA, 2005, pp [11] Rodrigo N. Calheiros, Rajiv Ranjan, Anton Beloglazov, Cesar A. F. De Rose, and Rajkumar Buyya, CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms, Software: Practice and Experience (SPE), Volume 41, Number 1, pp , ISSN: , Wiley Press, New York, USA, January, [12] Rodrigo Calheiros, Rajiv Ranjan and Rajkumar Buyya, Virtual Machine Provisioning Based on Analytical Performance and QoS in Cloud Computing Environments, Proceedings of the 40th International Conference on Parallel Processing (ICPP 11), Taipei, Taiwan, September 13-16, [13] J. Casazza, Redefining Server Performance Characterization for Virtualization Benchmarking, Intel Technology Journal, vol. 10, no. 03, Aug [14] O. Tickoo, R. Iyer, R. Illikkal, and D. Newell, Modeling virtual machine performance: challenges and approaches, SIGMETRICS Perform. Eval. Rev., vol. 37, no. 3, pp , [15] C. Bienia, S. Kumar, J. P. Singh, and K. Li, The PAR- SEC benchmark suite: characterization and architectural implications, in Proceedings of the 17th international conference on Parallel Architectures and Compilation Techniques (PACT 08), New York, NY, USA, 2008, pp [16] V. Babka and P. Tuma, Investigating Cache Parameters of x86 Family Processors, in Proceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking, Berlin, Heidelberg, 2009, pp [17] L. A. Barroso and U. Hlzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Synthesis Lectures on Computer Architecture, Jan [18] P. Gepner and M. F. Kowalik, Multi-Core Processors: New Way to Achieve High System Performance, in Proceedings of the International Symposium on Parallel Computing in Electrical Engineering (PARELEC 06), Washington, DC, USA, 2006, pp [19] Ofri Wechsler. Inside Intel Core Microarchitecture: Setting New Standards for Energy-Efficient Performance. Technology@Intel Magazine, March [20] R. Raghavendra, P. Ranganathan, V. Talwar, Z. Wang, and X. Zhu, No power struggles: coordinated multi-level power management for the data center, in Proceedings of the 13th international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 08), New York, NY, USA, 2008, pp [21] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield, Xen and the art of virtualization, in Proceedings of the nineteenth ACM Symposium on Operating Systems Principles (SOSP 03), New York, NY, USA, 2003, pp [22] L. Ciortea, C. Zamfir, S. Bucur, V. Chipounov, and G. Candea, Cloud9: a software testing service, SIGOPS Oper. Syst. Rev., vol. 43, no. 4, pp. 5-10, [23] SPEC CPU2006 Benchmark Suite. [24] IBM Smart Business Desktop Cloud. [25] OProfile Tools. [26] TPC-C. [27] The Apache Cassandra Project. [28] CloudSuite. [29] Desktop Cloud Solutions of HuaWei. [30] Desktone Desktop Cloud. cloud/. [31] Citrix XenDesktop. [32] Hyper-V. [33] ware View (ware VDI). [34] IBM Redbooks: Performance Test of Virtual Linux Desktop Cloud Services on System Z. [35] ware View Planner. planner. [36] SPECvirt sc2010: sc2010/. [37] mark: 10
Understanding the Behavior of In-Memory Computing Workloads
Understanding the Behavior of In-Memory Computing Workloads Tao Jiang, Qianlong Zhang, Rui Hou, Lin Chai, Sally A. Mckee, Zhen Jia, and Ninghui Sun SKL Computer Architecture, ICT, CAS, Beijing, China University
Performance Comparison of VMware and Xen Hypervisor on Guest OS
ISSN: 2393-8528 Contents lists available at www.ijicse.in International Journal of Innovative Computer Science & Engineering Volume 2 Issue 3; July-August-2015; Page No. 56-60 Performance Comparison of
GUEST OPERATING SYSTEM BASED PERFORMANCE COMPARISON OF VMWARE AND XEN HYPERVISOR
GUEST OPERATING SYSTEM BASED PERFORMANCE COMPARISON OF VMWARE AND XEN HYPERVISOR ANKIT KUMAR, SAVITA SHIWANI 1 M. Tech Scholar, Software Engineering, Suresh Gyan Vihar University, Rajasthan, India, Email:
Modeling Virtual Machine Performance: Challenges and Approaches
Modeling Virtual Machine Performance: Challenges and Approaches Omesh Tickoo Ravi Iyer Ramesh Illikkal Don Newell Intel Corporation Intel Corporation Intel Corporation Intel Corporation [email protected]
Dynamic resource management for energy saving in the cloud computing environment
Dynamic resource management for energy saving in the cloud computing environment Liang-Teh Lee, Kang-Yuan Liu, and Hui-Yang Huang Department of Computer Science and Engineering, Tatung University, Taiwan
VP/GM, Data Center Processing Group. Copyright 2014 Cavium Inc.
VP/GM, Data Center Processing Group Trends Disrupting Server Industry Public & Private Clouds Compute, Network & Storage Virtualization Application Specific Servers Large end users designing server HW
A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing
A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang and Chia-Ying Tseng Department of Computer Science and Engineering,
Full and Para Virtualization
Full and Para Virtualization Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF x86 Hardware Virtualization The x86 architecture offers four levels
Performance Isolation of a Misbehaving Virtual Machine with Xen, VMware and Solaris Containers
Performance Isolation of a Misbehaving Virtual Machine with Xen, VMware and Solaris Containers Todd Deshane, Demetrios Dimatos, Gary Hamilton, Madhujith Hapuarachchi, Wenjin Hu, Michael McCabe, Jeanna
PARALLELS CLOUD SERVER
PARALLELS CLOUD SERVER Performance and Scalability 1 Table of Contents Executive Summary... Error! Bookmark not defined. LAMP Stack Performance Evaluation... Error! Bookmark not defined. Background...
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION A DIABLO WHITE PAPER AUGUST 2014 Ricky Trigalo Director of Business Development Virtualization, Diablo Technologies
Virtual Machine Monitors. Dr. Marc E. Fiuczynski Research Scholar Princeton University
Virtual Machine Monitors Dr. Marc E. Fiuczynski Research Scholar Princeton University Introduction Have been around since 1960 s on mainframes used for multitasking Good example VM/370 Have resurfaced
Virtualization for Cloud Computing
Virtualization for Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF CLOUD COMPUTING On demand provision of computational resources
CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms
CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms Rodrigo N. Calheiros, Rajiv Ranjan, Anton Beloglazov, César A. F. De Rose,
Enabling Technologies for Distributed and Cloud Computing
Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading
Performance Analysis of Large Receive Offload in a Xen Virtualized System
Performance Analysis of Large Receive Offload in a Virtualized System Hitoshi Oi and Fumio Nakajima The University of Aizu, Aizu Wakamatsu, JAPAN {oi,f.nkjm}@oslab.biz Abstract System-level virtualization
Enabling Technologies for Distributed Computing
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies
Chapter 14 Virtual Machines
Operating Systems: Internals and Design Principles Chapter 14 Virtual Machines Eighth Edition By William Stallings Virtual Machines (VM) Virtualization technology enables a single PC or server to simultaneously
Masters Project Proposal
Masters Project Proposal Virtual Machine Storage Performance Using SR-IOV by Michael J. Kopps Committee Members and Signatures Approved By Date Advisor: Dr. Jia Rao Committee Member: Dr. Xiabo Zhou Committee
Models For Modeling and Measuring the Performance of a Xen Virtual Server
Measuring and Modeling the Performance of the Xen VMM Jie Lu, Lev Makhlis, Jianjiun Chen BMC Software Inc. Waltham, MA 2451 Server virtualization technology provides an alternative for server consolidation
Dell Virtualization Solution for Microsoft SQL Server 2012 using PowerEdge R820
Dell Virtualization Solution for Microsoft SQL Server 2012 using PowerEdge R820 This white paper discusses the SQL server workload consolidation capabilities of Dell PowerEdge R820 using Virtualization.
Architecture Support for Big Data Analytics
Architecture Support for Big Data Analytics Ahsan Javed Awan EMJD-DC (KTH-UPC) (http://uk.linkedin.com/in/ahsanjavedawan/) Supervisors: Mats Brorsson(KTH), Eduard Ayguade(UPC), Vladimir Vlassov(KTH) 1
Xen Live Migration. Networks and Distributed Systems Seminar, 24 April 2006. Matúš Harvan Xen Live Migration 1
Xen Live Migration Matúš Harvan Networks and Distributed Systems Seminar, 24 April 2006 Matúš Harvan Xen Live Migration 1 Outline 1 Xen Overview 2 Live migration General Memory, Network, Storage Migration
PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE
PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE Sudha M 1, Harish G M 2, Nandan A 3, Usha J 4 1 Department of MCA, R V College of Engineering, Bangalore : 560059, India [email protected] 2 Department
EC2 Performance Analysis for Resource Provisioning of Service-Oriented Applications
EC2 Performance Analysis for Resource Provisioning of Service-Oriented Applications Jiang Dejun 1,2 Guillaume Pierre 1 Chi-Hung Chi 2 1 VU University Amsterdam 2 Tsinghua University Beijing Abstract. Cloud
Performance Profiling in a Virtualized Environment
Performance Profiling in a Virtualized Environment Jiaqing Du EPFL, Switzerland Nipun Sehrawat IIT Guwahati, India Willy Zwaenepoel EPFL, Switzerland Abstract Virtualization is a key enabling technology
9/26/2011. What is Virtualization? What are the different types of virtualization.
CSE 501 Monday, September 26, 2011 Kevin Cleary [email protected] What is Virtualization? What are the different types of virtualization. Practical Uses Popular virtualization products Demo Question,
COS 318: Operating Systems. Virtual Machine Monitors
COS 318: Operating Systems Virtual Machine Monitors Kai Li and Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall13/cos318/ Introduction u Have
Evaluating HDFS I/O Performance on Virtualized Systems
Evaluating HDFS I/O Performance on Virtualized Systems Xin Tang [email protected] University of Wisconsin-Madison Department of Computer Sciences Abstract Hadoop as a Service (HaaS) has received increasing
VIRTUALIZATION, The next step for online services
Scientific Bulletin of the Petru Maior University of Tîrgu Mureş Vol. 10 (XXVII) no. 1, 2013 ISSN-L 1841-9267 (Print), ISSN 2285-438X (Online), ISSN 2286-3184 (CD-ROM) VIRTUALIZATION, The next step for
Virtual Machine Scalability on Multi-Core Processors Based Servers for Cloud Computing Workloads
Virtual Machine Scalability on Multi-Core Processors Based Servers for Cloud Computing Workloads M. Hasan Jamal, Abdul Qadeer, and Waqar Mahmood Al-Khawarizmi Institute of Computer Science University of
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
Operating System Impact on SMT Architecture
Operating System Impact on SMT Architecture The work published in An Analysis of Operating System Behavior on a Simultaneous Multithreaded Architecture, Josh Redstone et al., in Proceedings of the 9th
Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009
Performance Study Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009 Introduction With more and more mission critical networking intensive workloads being virtualized
Benchmarking the Performance of XenDesktop Virtual DeskTop Infrastructure (VDI) Platform
Benchmarking the Performance of XenDesktop Virtual DeskTop Infrastructure (VDI) Platform Shie-Yuan Wang Department of Computer Science National Chiao Tung University, Taiwan Email: [email protected]
Oracle Database Scalability in VMware ESX VMware ESX 3.5
Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises
Chapter 16: Virtual Machines. Operating System Concepts 9 th Edition
Chapter 16: Virtual Machines Silberschatz, Galvin and Gagne 2013 Chapter 16: Virtual Machines Overview History Benefits and Features Building Blocks Types of Virtual Machines and Their Implementations
Delivering Quality in Software Performance and Scalability Testing
Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,
Virtualization. Jukka K. Nurminen 23.9.2015
Virtualization Jukka K. Nurminen 23.9.2015 Virtualization Virtualization refers to the act of creating a virtual (rather than actual) version of something, including virtual computer hardware platforms,
Optimizing Shared Resource Contention in HPC Clusters
Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs
Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies
Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Kurt Klemperer, Principal System Performance Engineer [email protected] Agenda Session Length:
International Journal of Computer & Organization Trends Volume20 Number1 May 2015
Performance Analysis of Various Guest Operating Systems on Ubuntu 14.04 Prof. (Dr.) Viabhakar Pathak 1, Pramod Kumar Ram 2 1 Computer Science and Engineering, Arya College of Engineering, Jaipur, India.
Virtualization and Cloud Computing. The Threat of Covert Channels. Related Work. Zhenyu Wu, Zhang Xu, and Haining Wang 1
Virtualization and Cloud Computing Zhenyu Wu, Zhang Xu, Haining Wang William and Mary Now affiliated with NEC Laboratories America Inc. Server Virtualization Consolidates workload Simplifies resource management
Heterogeneous Workload Consolidation for Efficient Management of Data Centers in Cloud Computing
Heterogeneous Workload Consolidation for Efficient Management of Data Centers in Cloud Computing Deep Mann ME (Software Engineering) Computer Science and Engineering Department Thapar University Patiala-147004
COS 318: Operating Systems. Virtual Machine Monitors
COS 318: Operating Systems Virtual Machine Monitors Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall10/cos318/ Introduction Have been around
Lecture 2 Cloud Computing & Virtualization. Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu
Lecture 2 Cloud Computing & Virtualization Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Outline Introduction to Virtualization The Major Approaches
Multi-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
Putting it all together: Intel Nehalem. http://www.realworldtech.com/page.cfm?articleid=rwt040208182719
Putting it all together: Intel Nehalem http://www.realworldtech.com/page.cfm?articleid=rwt040208182719 Intel Nehalem Review entire term by looking at most recent microprocessor from Intel Nehalem is code
DELL. Virtual Desktop Infrastructure Study END-TO-END COMPUTING. Dell Enterprise Solutions Engineering
DELL Virtual Desktop Infrastructure Study END-TO-END COMPUTING Dell Enterprise Solutions Engineering 1 THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL
Virtualization. Jia Rao Assistant Professor in CS http://cs.uccs.edu/~jrao/
Virtualization Jia Rao Assistant Professor in CS http://cs.uccs.edu/~jrao/ What is Virtualization? Virtualization is the simulation of the software and/ or hardware upon which other software runs. This
nanohub.org An Overview of Virtualization Techniques
An Overview of Virtualization Techniques Renato Figueiredo Advanced Computing and Information Systems (ACIS) Electrical and Computer Engineering University of Florida NCN/NMI Team 2/3/2006 1 Outline Resource
IOS110. Virtualization 5/27/2014 1
IOS110 Virtualization 5/27/2014 1 Agenda What is Virtualization? Types of Virtualization. Advantages and Disadvantages. Virtualization software Hyper V What is Virtualization? Virtualization Refers to
benchmarking Amazon EC2 for high-performance scientific computing
Edward Walker benchmarking Amazon EC2 for high-performance scientific computing Edward Walker is a Research Scientist with the Texas Advanced Computing Center at the University of Texas at Austin. He received
Cloud Computing Simulation Using CloudSim
Cloud Computing Simulation Using CloudSim Ranjan Kumar #1, G.Sahoo *2 # Assistant Professor, Computer Science & Engineering, Ranchi University, India Professor & Head, Information Technology, Birla Institute
Virtualization Technologies (ENCS 691K Chapter 3)
Virtualization Technologies (ENCS 691K Chapter 3) Roch Glitho, PhD Associate Professor and Canada Research Chair My URL - http://users.encs.concordia.ca/~glitho/ The Key Technologies on Which Cloud Computing
Thread Level Parallelism II: Multithreading
Thread Level Parallelism II: Multithreading Readings: H&P: Chapter 3.5 Paper: NIAGARA: A 32-WAY MULTITHREADED Thread Level Parallelism II: Multithreading 1 This Unit: Multithreading (MT) Application OS
A Comparative Study of Open Source Softwares for Virtualization with Streaming Server Applications
The 13th IEEE International Symposium on Consumer Electronics (ISCE2009) A Comparative Study of Open Source Softwares for Virtualization with Streaming Server Applications Sritrusta Sukaridhoto, Nobuo
Evaluating Task Scheduling in Hadoop-based Cloud Systems
2013 IEEE International Conference on Big Data Evaluating Task Scheduling in Hadoop-based Cloud Systems Shengyuan Liu, Jungang Xu College of Computer and Control Engineering University of Chinese Academy
VDI Without Compromise with SimpliVity OmniStack and Citrix XenDesktop
VDI Without Compromise with SimpliVity OmniStack and Citrix XenDesktop Page 1 of 11 Introduction Virtual Desktop Infrastructure (VDI) provides customers with a more consistent end-user experience and excellent
Analysis on Virtualization Technologies in Cloud
Analysis on Virtualization Technologies in Cloud 1 V RaviTeja Kanakala, V.Krishna Reddy, K.Thirupathi Rao 1 Research Scholar, Department of CSE, KL University, Vaddeswaram, India I. Abstract Virtualization
LeiWang, Jianfeng Zhan, ZhenJia, RuiHan
2015-6 CHARACTERIZATION AND ARCHITECTURAL IMPLICATIONS OF BIG DATA WORKLOADS arxiv:1506.07943v1 [cs.dc] 26 Jun 2015 LeiWang, Jianfeng Zhan, ZhenJia, RuiHan Institute Of Computing Technology Chinese Academy
A Migration of Virtual Machine to Remote System
ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 International Conference
Analyzing PAPI Performance on Virtual Machines. John Nelson
Analyzing PAPI Performance on Virtual Machines John Nelson I. OVERVIEW Over the last ten years, virtualization techniques have become much more widely popular as a result of fast and cheap processors.
I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology
I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology Reduce I/O cost and power by 40 50% Reduce I/O real estate needs in blade servers through consolidation Maintain
Migration of Virtual Machines for Better Performance in Cloud Computing Environment
Migration of Virtual Machines for Better Performance in Cloud Computing Environment J.Sreekanth 1, B.Santhosh Kumar 2 PG Scholar, Dept. of CSE, G Pulla Reddy Engineering College, Kurnool, Andhra Pradesh,
Efficient and Enhanced Load Balancing Algorithms in Cloud Computing
, pp.9-14 http://dx.doi.org/10.14257/ijgdc.2015.8.2.02 Efficient and Enhanced Load Balancing Algorithms in Cloud Computing Prabhjot Kaur and Dr. Pankaj Deep Kaur M. Tech, CSE P.H.D [email protected],
VMware and CPU Virtualization Technology. Jack Lo Sr. Director, R&D
ware and CPU Virtualization Technology Jack Lo Sr. Director, R&D This presentation may contain ware confidential information. Copyright 2005 ware, Inc. All rights reserved. All other marks and names mentioned
Hardware Based Virtualization Technologies. Elsie Wahlig [email protected] Platform Software Architect
Hardware Based Virtualization Technologies Elsie Wahlig [email protected] Platform Software Architect Outline What is Virtualization? Evolution of Virtualization AMD Virtualization AMD s IO Virtualization
Application Performance Isolation in Virtualization
2009 IEEE International Conference on Cloud Computing Application Performance Isolation in Virtualization Gaurav Somani and Sanjay Chaudhary Dhirubhai Ambani Institute of Information and Communication
Chapter 2 Addendum (More on Virtualization)
Chapter 2 Addendum (More on Virtualization) Roch Glitho, PhD Associate Professor and Canada Research Chair My URL - http://users.encs.concordia.ca/~glitho/ More on Systems Virtualization Type I (bare metal)
Performance Impacts of Non-blocking Caches in Out-of-order Processors
Performance Impacts of Non-blocking Caches in Out-of-order Processors Sheng Li; Ke Chen; Jay B. Brockman; Norman P. Jouppi HP Laboratories HPL-2011-65 Keyword(s): Non-blocking cache; MSHR; Out-of-order
Virtualization. Types of Interfaces
Virtualization Virtualization: extend or replace an existing interface to mimic the behavior of another system. Introduced in 1970s: run legacy software on newer mainframe hardware Handle platform diversity
Cloud Computing CS 15-319
Cloud Computing CS 15-319 Virtualization Case Studies : Xen and VMware Lecture 20 Majd F. Sakr, Mohammad Hammoud and Suhail Rehman 1 Today Last session Resource Virtualization Today s session Virtualization
Microkernels, virtualization, exokernels. Tutorial 1 CSC469
Microkernels, virtualization, exokernels Tutorial 1 CSC469 Monolithic kernel vs Microkernel Monolithic OS kernel Application VFS System call User mode What was the main idea? What were the problems? IPC,
Networking for Caribbean Development
Networking for Caribbean Development BELIZE NOV 2 NOV 6, 2015 w w w. c a r i b n o g. o r g Virtualization: Architectural Considerations and Implementation Options Virtualization Virtualization is the
An Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform
An Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform A B M Moniruzzaman 1, Kawser Wazed Nafi 2, Prof. Syed Akhter Hossain 1 and Prof. M. M. A. Hashem 1 Department
Cloud Computing through Virtualization and HPC technologies
Cloud Computing through Virtualization and HPC technologies William Lu, Ph.D. 1 Agenda Cloud Computing & HPC A Case of HPC Implementation Application Performance in VM Summary 2 Cloud Computing & HPC HPC
Can High-Performance Interconnects Benefit Memcached and Hadoop?
Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007
Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer
Parallels Virtuozzo Containers
Parallels Virtuozzo Containers White Paper Virtual Desktop Infrastructure www.parallels.com Version 1.0 Table of Contents Table of Contents... 2 Enterprise Desktop Computing Challenges... 3 What is Virtual
COM 444 Cloud Computing
COM 444 Cloud Computing Lec 3: Virtual Machines and Virtualization of Clusters and Datacenters Prof. Dr. Halûk Gümüşkaya [email protected] [email protected] http://www.gumuskaya.com Virtual
Violin Memory 7300 Flash Storage Platform Supports Multiple Primary Storage Workloads
Violin Memory 7300 Flash Storage Platform Supports Multiple Primary Storage Workloads Web server, SQL Server OLTP, Exchange Jetstress, and SharePoint Workloads Can Run Simultaneously on One Violin Memory
How To Compare Performance Of A Router On A Hypervisor On A Linux Virtualbox 2.5 (Xen) To A Virtualbox 3.5.2 (Xeen) 2.2.5-Xen-Virtualization (X
Performance Evaluation of Virtual Routers in Para-virtual Environment 1. Abhishek Bajaj [email protected] 2. Anargha Biswas [email protected] 3. Ambarish Kumar [email protected] 4.
Knut Omang Ifi/Oracle 19 Oct, 2015
Software and hardware support for Network Virtualization Knut Omang Ifi/Oracle 19 Oct, 2015 Motivation Goal: Introduction to challenges in providing fast networking to virtual machines Prerequisites: What
Affinity Aware VM Colocation Mechanism for Cloud
Affinity Aware VM Colocation Mechanism for Cloud Nilesh Pachorkar 1* and Rajesh Ingle 2 Received: 24-December-2014; Revised: 12-January-2015; Accepted: 12-January-2015 2014 ACCENTS Abstract The most of
Virtualization. Michael Tsai 2015/06/08
Virtualization Michael Tsai 2015/06/08 What is virtualization? Let s first look at a video from VMware http://bcove.me/x9zhalcl Problems? Low utilization Different needs DNS DHCP Web mail 5% 5% 15% 8%
Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1
Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System
Chapter 5 Cloud Resource Virtualization
Chapter 5 Cloud Resource Virtualization Contents Virtualization. Layering and virtualization. Virtual machine monitor. Virtual machine. Performance and security isolation. Architectural support for virtualization.
