Analysis of VDI Workload Characteristics Jeongsook Park, Youngchul Kim and Youngkyun Kim Electronics and Telecommunications Research Institute 218 Gajeong-ro, Yuseong-gu, Daejeon, 305-700, Republic of Korea {jungsp,kimyc,kimyoung}@etri.re.kr ABSTRACT Scale of a VDI system is an important but very difficult issue in its construction. This is caused by several key features of VDI system such as IO storms, dominant small random writes, and unintentional IO blending effects. These can generate a unique and complicated workload characteristics. So its performance analysis or capacity planning mainly depend on real measurement tools such as IOmark. But it requires much cost. In this paper, in order to resolve this kind of limitation we try to analyze VDI workload characterization. It helps us to understand VDI workload characteristics exactly and to estimate its performance easily without any servers or tools. KEYWORDS VDI, self-similarity, workload characterization, performance evaluation, statistical analysis 1 INTRODUCTION Virtual Desktop Infrastructure (VDI) has been recently one of key technologies in the IT industry. VDI is a concept which resources of PC such as CPU, memory and storage are centrally managed in the server side and end users in front of thin/zero clients can access to the server as if they use their own PC. It has many benefits such as acquisition of security by centralized data management, saving energy, and so on. These aspects can appeal to IT businesses and make them consider the introduction of a VDI system to their companies. But it can cause complicated problems caused by centralized management of resources at the same time. Technically, storage performance is known as the most critical performance bottleneck point because almost I/O operations from multiple users have to access to the storage simultaneously. But VDI workload is very complicated and unpredictable. This makes VDI system sizing impossible with simple calculation [1]. So many storage firms mainly depend on the real measurement tools such as IOmark and LoginVSI when they publish the performance of their products [2, 3]. However they are expensive too because they require system construction for testing and tool license. To reduce such cost and effort, we d better introduce workload modeling and performance analysis. Workload modeling is an innovative and effective alternative to reduce cost and duration for performance evaluation. Workload modeling have a number of advantages over traces in the aspects of adjustment, controlled modification, repetitions and stationarity [4]. To achieve this we have to understand VDI workload characteristics first of all. Therefore this paper aims at analysis of VDI workload characteristics. The rest of the paper is organized as follows. Section 2 describes related works of this paper, focusing on general workload modeling methods and existing VDI measurement tools. Section 3 surveys typical VDI workload characteristics. Section 4 analyzes VDI workload characteristics precisely. Section 5 shows that it has self-similarity by opensourced estimation tool. Finally, section 6 summarizes the paper. ISBN: 978-1 -941968-10-9 2015 SDIWC 11
2 RELATED WORKS In this section, we explain related works in measurement tools and workload modeling respectively. 2.1 Measurement tools for VDI system Conventional storage size calculation is generally based on required IOPS per user type, network capacity, disk IOPS by disk RPM, write penalty of RAID type, the ratio of read and write and so on [5]. But this simple method is not proper especially for capacity planning of VDI system because of VDI workload characteristics. Experimental result with IOmark on the system based on this calculation showed that system sizing is not sufficient although IOPS and bandwidth are enough [1]. VDI workload characteristics is not well known except for several key features. So we generally have to depend on measurement tools such as LoginVSI, IOmark and SPC [2, 3, 6]. Among them, IOmark is regarded as the most cost effective tool in terms of cost and workload characteristics as shown in Table 1. Table 1. Comparison among VDI performance tools IOmark LoginVSI RAWC SPC Test range Storage Entire Entire system system Storage Workload 100% real 100% real 100% real VDI VDI VDI Non-VDI Cost VMware Cost to Cost to Cost to partners license & license license only publish Equipment required Low High High Low Setup time Low High High Low IOmark is used to measure storage performance in terms of response time [2]. So it is widely used as a benchmarking tool of storage performance in VDI environment in the absence of standardized tools. It measures average response time with workload specified to VDI system. The IOmark tool consists of multiple virtual machines running IOmark client software and one server running the IOmark main server software. So IOmark tool also requires many system resources in evaluating a large-sized system evaluation, although it needs less resources than LoginVSI. 2.2 Workload modeling approaches As shown in subsection 2.1, IOmark is not the best and simple way too. Workload modeling can be a better and simpler method in this case. Analytical performance modeling is about finding those 10% of the system that explains 90% of its behavior. To do this, understanding of workload characteristics and modeling has to be preceded. Workload modeling is the attempt to create a simple and general model, which can then be used to generate synthetic workloads at will. Feitelson explains about workload modeling methods well [4]. To derive workload modeling correctly, selfsimilarity has to be understood and introduced at first. Self-similarity was pioneered by the work of Leland [7], which proved that Ethernet traffic was self-similar. Now self-similarity can be found almost all of phenomena including network traffic, processor behaviors and file operations [8, 9, 10]. Self-similarity is about scaling. The workload includes bursts of increased activity, and similar-looking bursts appear at many different time scales. Thus the workload appears similar to itself when viewed at different scale. Mathematical definition and more information can be found in Feitelson s book [4]. 3 VDI WORKLOAD CHARACTERISTICS ISBN: 978-1 -941968-10-9 2015 SDIWC 12
Understanding the characteristics of VDI workload is the first step in developing a VDI system because VDI workload has its own unique features. And storage performance deeply depends on workload characteristics of VDI system. Main characteristics are three including small random writes, I/O storms, and blending I/O effects. First point is the fact that VDI workload characteristics is generally known that more than 80% of traffic from VMs, in some case more than 90%, are small random writes. It can degrade storage performance. VDI workload issuing from each VM in the steady state may be trivial, ranging from 3-7 IOPS to 20~30 IOPS [11]. Though loads fluctuate in situations, but 50 IOPS is enough even in the worst case. However, it becomes burden to the system as the number of active users increases because the summation of trivial loads becomes not trivial anymore. Specifically workload from VMs is almost reads or writes to the storage and it makes storage the most critical bottleneck point. It can degrade storage performance. But this will be acceptable if total load to the system in a steady state are even. However, system performance may be deteriorated severely by I/O storms which are short-term and heavy load such as simultaneous logins at the commute time, simultaneous virus scans and Windows updates. Workload in I/O storms may amount to about 4~6 times of in steady state and 10 times in the worst case, so it can influence heavily on system performance deterioration [12]. The last point is blending I/O effects which are caused by mixing traffic from multiple VMs at the hypervisor level. It can increase randomness of workload at the storage. 4 WORKLOAD CHARACTERIZATION 4.1 Data collection Workload modeling always starts with measured data about the workload. But we could not find any VDI traces opened publicly. So we extracted traces from IOmark execution instead. Figure 1 indicates the location which data traces were extracted. As shown in Figure 1, Each VM in the host side generates I/O operations to the storage which is connected via 1Gbps network. It simulates from one user to 32 users. Figure 1. Location of data trace collection Storage in the lower area is based on our own software, VDI-FS. It is a distributed file system which is composed of MDS, DS and clients. VDI-FS client is located on the hypervisor. We captured traces there. Table 2 shows the summary of our VDI trace data attributes. Table 2. Summary of the VDI trace data Data information Value Data generation tool IOmark Location Hypervisor in the host Data size (1 stream, 16 stream) 7,700KB, 61,705KB (inoid, type, size, time) Time duration 1 hour Capturing tool VDI-FS client log Operation types READ, WRITE Number of operations 131,071(one user), 1,107,174(16 users) To get VDI workload characteristics of IOmark, we conducted performance measurement with IOmark. To do this, we constructed the testing system with 29 servers, as shown in Figure 2. Resource limitations required for IOmark execution let us install 8 VM per each host. Each host generates workload from 512 VDI steady state users. ISBN: 978-1 -941968-10-9 2015 SDIWC 13
Figure 2. Testing environment Workload characteristics from IOmark can be explained in Figure 3. It can be expressed in terms of average IOPS, the ratio of R:W, and average bandwidth. 12 IOPS are required for every user and the ratio of READ and WRITE is 5:5. It is not too higher than workload features described in section 3. Positive characteristics are expected. (a) Average IOPS (c) Average bandwidth(mbps) Figure 3. Workload characteristics 4.2 Characteristic analysis of collected data We analyzed basic characteristics of collected data. Figure 4 shows characteristics of one VDI user s behavior. Traces are composed of READ and WRITE operations. We arranged trace data exactly according to the sequence in the log file. So some inter-arrival times have minus values. These values were converted to positive ones for proper time sequences. Graph (a) of Figure 4 indicates data original traces where x-axis is time and y-axis is byte sizes of the data. Traces are spanned from 512B to 128KB. To understand data size distribution we caught the histogram as shown in graph (b). READ data by sizes are distributed a little bit uniformly. But 128KB sized WRITE data are overwhelming. (b) Ratio of READ:WRITE (a) Data trace ISBN: 978-1 -941968-10-9 2015 SDIWC 14
(a) Data trace (b) Data sizes (b) Data size (c) Data aggregation Figure 4. I/O pattern of one VDI user To check if it has self-similarity, we aggregated data into coarser time units, 1 second, 10 seconds, and 30 seconds, respectively. Graph (c) shows aggregated data time series. We are sure intuitively that it has self-similarity because it has still many peaks. Figure 5 shows characteristics of 16 VDI users behaviors. It has similar characteristics of one users, except higher density. (c) Data aggregation Figure 5. I/O patterns of 16 VDI users ISBN: 978-1 -941968-10-9 2015 SDIWC 15
5 ESTIMATION The next step is to verify if the data in section 4 have self-similarity. It is accomplished by trying to estimate the Hurst parameter, H. If it is in the range 0.5 < H < 1, the process is self-similar. The testing method is either time-based or frequency-based. Rescaled-range method and variance-time method is time-based. Frequency-based methods is related to spectral method of analyzing time series, where the data is first transformed to the frequency domain using a Fourier transform [4]. We used Variance-time plot and Whittle s Estimator of the open tool SELFIS which is developed by Karagiannis [13]. We selected variance time method, R/S method and Whittle s estimator as the estimators. Figure 6 shows the estimation result of H value which indicates self-similarity of data series in case of one user. It is proven that the data have self-similarity because that H values were 0.851, 0.926 and 0.936 respectively, in the range of 0.5 < H < 1. Figure 6. Estimation of H value (one user) The same characteristics are also distinct in case of 16 users as shown in Figure 7. Figure 7. Estimation of H value (16 users) Table 3 summarizes the estimation results of self-similarity values. All of values are in the range of 0.5 < H < 1.0 and they are closer to 1, which meaning that the data traces have strong self-similarity. Table 3. Estimation result of self-similarity H value (1 user) H value (16 users) Variance time 0.851 0.843 R/S 0.926 0.877 Whittle 0.936 0.934 6 CONCLUSION In this paper, we analyzed VDI workload characteristics with traces from IOmark. Traces were from IOmark because there are no available public traces. The analysis result showed strong self-similarity, not Markovian process. Self-similarity is prominent in IT environment, including network traffic, file system, and Web server behaviors. This characteristics will be used in workload modeling for performance analysis of VDI system. Performance analysis based on the result of this paper will be a very effective alternative of performance measurement for capacity planning with low price. ACKNOWLEDGEMENT This work was supported by the IT R&D program of MKE/KEIT. [10041730, A Cloud File System Development for Massive Virtual Desktop Infrastructure Service] ISBN: 978-1 -941968-10-9 2015 SDIWC 16
REFERENCES [1] Jeongsook Park, Cheiyol Kim, Youngchul Kim, Sangmin Lee, and Youngkyun Kim, Storage sizing issue of VDI system, SoftTech2014, May 2014. [2] http://www.iomark.org/category/site-content/downlo ads, IOmark Theory of Operations, Evaluator Group Inc., 2012. [3] http://www.loginvsi.com/pdf/documentation/v3/logi n-vsi-37-documentation.pdf, Documentation Login VSI 3.7, Login VSI Inc., 2012. [4] Dror G. Feitelson, Workload modeling for computer systems performance evaluation, Cambridge University Press, 2014. [5] http://www.emc.com/collateral/software/whitepapers/h11096-vdi-sizing-wp.pdf, Sizing EMC VNX Series for VDI Workload, White Paper, EMC, September 2012. [6] http://www.storageperformance.org/specs/spc- 2C_SPC-2CE_v1.4.pdf, SPC Benchmark 2C(SPC- 2C) / Energy Extension(SPC-2C/E), SPC, 2013. [7] Will E. Leland, Murad S. Taqqu, Walter Willinger, and Danial V. Wilson, On the Self-Similar Nature of Ethernet Traffic, IEEE/ACM Transactions on Networking, Vol.2, No.1, February 1994. [8] Steven D. Gribble, Gurmeet S. Kanku, Drew Roselli, Eric A. Brewer, Timothy J. Gibson, and Ethan L. Miller, Self-similarity in file system, In SIGMETRICS Conf. Measurement and Modeling of Computer Systems, pp.141-150, June 1998. [9] Allen B. Downey, A Parallel Workload Model and its Implications for Processor Allocations, Cluster Compuer, 1(1), pp.135-145, 1998. [10] Allen B. Downey, The Structural Cause of File Size Distributions, In 9 th Modeling, Analysis and Simulation of Computing and Telecommunication Systems, August 2001. [11] http://www.tdeig.ch/kvm/vdi/vdi_storage.pdf, VDI & Storage: Deep Impact, PQR, September 2011. [12] http://www.vmware.com/files/pdf/view_storage_con siderations.pdf, Storage Considerati-ons for VMWare Horrizon View 5.2, White Paper, VMware. [13] http://www.cs.ucr.edu/~tkarag/selfis/selfis.html, SELFIS: A Short Tutorial, Thomas Karagiannis, 2002. ISBN: 978-1 -941968-10-9 2015 SDIWC 17