Benchmarks and Comparisons of Performance for Data Intensive Research

Size: px
Start display at page:

Download "Benchmarks and Comparisons of Performance for Data Intensive Research"

Transcription

1 Benchmarks and Comparisons of Performance for Data Intensive Research Saad A. Alowayyed August 23, 2012 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2012

2 Abstract The rapid developments in technology and the increase in the precision of the data captured are the reasons for a significant increase in the amount of data, known as big data. This big data is valuable because it is the outcomes of efforts such as experiments or discoveries, so the interpretation is necessary to obtain information. This information could not be interpreted efficiently without significant methods for storing, retrieving, manipulating and analysing the big data. The modest performance of the secondary storages prevents the approach of these significant ways, in other words, it exposes the I/O bottlenecks that prevent the I/O system from providing the CPU with the right amount of data. The right amount of the I/O provided to the CPU was analysed by Gene Amdahl, who derived a ratio widely known as the Amdahl number, which will be one of the main criteria used in this project to decide whether or not a system is balanced. Two different systems at the Edinburgh University are examined, EDIM1 and Eddie. This project is to decide whether or not EDIM1 and Eddie are balanced and, hence, suitable for data intensive computing. This is achieved by using a number of benchmarks for each component to determine if there is a sustainable bandwidth at the same order of magnitude. Moreover, the Amdahl number would be calculated theoretically and experimentally, using three benchmarks with different I/O patterns. Two of these benchmarks are developed during the scope of this project. From the results, it was found that the two systems are suitable for data intensive computing and can provide a good platform for data intensive programs. The first machine, EDIM1, was designed to be similar to the Amdahl balance blades by attaching more disks to a low power processor. This would increase the I/O throughput considerably and hence, the I/O should cope with the CPU performance. The second machine, Eddie, is not specifically for data intensive research, but it could reduce the gap between the I/O and the CPU in the field of general high performance computations. ii

3 Contents Chapter 1 Introduction... 1 Chapter 2 Background Big Data Data Intensive Computing Balanced systems Raw sequential I/O Multi-scaling Styles of Data Intensive Computing Data-processing pipelines Data warehouse Data centres Amdahl-balanced blades Devices Technical information Machines theoretical values Mythical system...15 Chapter 3 Benchmarks Floating point operations per second FLOPS LINPACK Simple CPU Memory bandwidth Stream Simple memory File system IOZone Simple I/O Balanced systems Amdahl synthetics benchmarks Existing benchmark Modified benchmarks...22 Chapter 4 Benchmarks Results and Analysis CPU flops rate EDIM Eddie Memory flops rate EDIM Eddie Disk s flops rate EDIM Eddie Amdahl synthetics benchmarks...35 iii

4 4.2 Testing...42 Chapter 5 Conclusions...43 Chapter 6 Project analysis Project Post-mortem Process Materials and technology What made the project successful What needs more improvements Lessons learned Deviations from the work plan Risks...48 Appendix A Benchmarks results...50 Appendix B Benchmarks compiling and running...69 References...75 iv

5 List of Tables Table 2-1 Theoretical (Datasheet) Values...10 Table 2-2: Machine theoretical values...14 Table 3-1: The eight kernels in the FLOPS benchmark...16 Table 3-2: The sizes of the array compared to the data used in LINPACK...17 Table 4-1: Benchmark results for the EDIM1 and Eddie CPUs...30 Table 4-2: Benchmark results for the EDIM1 and Eddie main memories...31 Table 4-3: Benchmark results for the EDIM1 and Eddie disks...33 Table 4-4: Summaries of test suites completed in the CUnit...42 Table 6-1: the Gantt Chart for the project...48 Table 6-2: Risk Analysis...49 v

6 List of Figures Figure 2-1: Illustration of the steps in the data-processing pipeline... 6 Figure 2-2: Node Architecture in EDIM Figure 3-1: Illustration of Amdahl synthetics benchmarks...22 Figure 3-2: Different threads compositions throughputs and Amdahl numbers...24 Figure 3-3: Different real program patterns...26 Figure 3-4: Algorithm for real program I/O pattern benchmark...28 Figure 4-1: Summary of the EDIM1 CPU-memory flops rates...31 Figure 4-2: Summary of the Eddie CPU-memory flops rates...32 Figure 4-3: Throughput for different sets of storage...35 Figure 4-4: Amdahl numbers for different sets of storage...36 Figure 4-5: Illustration of Amdahl numbers among different buffer sizes...37 Figure 4-6: Performance of different sets of storage...38 Figure 4-7: Amdahl numbers for a combination of disks...39 Figure 4-8: Amdahl number for different I/O patterns...40 Figure 4-9: Performance for different I/O patterns...41 Figure 4-10: Amdahl number for a pattern using different buffer sizes...41 vi

7 Acknowledgements I am sincerely grateful to Dr Adam Carter; this dissertation would not have been possible without his guidance, encouragement and feedback. I owe thanks to Mr Gareth Francis for helping me with problems related to the cluster. I am truly indebted to Dr Orlando Richards from ECDF for his help in exploring new information. I would like to show my gratitude to my family, with special reference to my father, Dr Abdullah Alowayyed, for giving me their support without which none of this would have happened. vii

8 Chapter 1 Introduction Recently, a great increase in the demand for retrieving, processing, sorting and storing large amounts of raw data has been seen. It is a challenge to deal with this amount of data efficiently, especially with the knowledge of its value, and that it has resulted from considerable efforts, such as experiments or discoveries. Storage, on the other hand, has seen a remarkable increase in size, and costs have decreased significantly. However, storage performance is not in line with CPU performance, because, in most cases, the CPU can process the data, while the storage cannot retrieve or store it at the same speed; this is widely-known as the I/O bottleneck. The I/O bottleneck is a real difficulty for computer experts, programmers, developers and users. Although the algorithm is fast and well defined, the I/O will slow down the overall speed. This reduction in speed decreases sharply with the increased amount of data or an increase in the I/O operations. The precise definition of Data Intensive Computing (DIC) is the reduction of the I/O bottleneck and the performance gap between the CPU and the I/O by organising the system to store/retrieve the data sufficiently quickly to ensure that the CPU does not remain idle. This project takes an in-depth examination of two systems EDIM1 and Eddie to determine whether or not they are suitable for data intensive computing by analysing the theoretical and experimental values. To make that decision, two main objectives were set: first and foremost, to run a set of benchmarks for those components that are involved in the CPU-I/O model. These components are the CPU, main memory, file system and disks. The second objective is to calculate the Amdahl number, because this gives a precise indication of whether or not the system is balanced in terms of the throughput of the I/O system and the performance of the CPU. The Amdahl number would be calculated theoretically for each system and, in addition, pragmatically. In 2011, a student in the EPCC wrote a benchmark to calculate the Amdahl number [1]. In this project, a different set of I/O patterns will be taken into account to make the previous set of benchmarks realistic by making it close to the peak or close to the real applications, which would suffice. Finally, to provide the final decision on whether or not a machine is suitable for data intensive computing, the Amdahl number should be close to 1.0. Moreover, the sustained bandwidth of all components, measured by the 1

9 benchmarks, should be in the same order of magnitude to maintain a similar achievable sustained CPU rate that is independent of the source of the data. Chapter 2 will begin with the elementary definitions of big data and data intensive computing. Next, a review of the related literature in the same field is included to provide a wider image of the project and the data intensive computing field. The chapter ends with an overview of the two systems under study and the calculation of their theoretical values. Chapter 3 will use several theories to go through the set of benchmarks being used with and will determine how to calculate the results. This chapter will categorise the benchmarks, depending on the components that will flow from the CPU to the disks. At the end of this chapter, a detailed description will discuss the different Amdahl synthetic benchmarks being developed in the scope of this project. Chapter 4 illustrates the results obtained from the previous chapter, in addition to the theoretical number taken from Chapter 2. This chapter will gather the numbers obtained to develop some hypotheses and an appropriate conclusion that will answer the research question discussed earlier. In Chapter 5, a final set of conclusions will be drawn to determine whether or not the machines are suitable for data intensive computing, and if programs with the I/O highly-bound will receive benefits from the machine. This chapter will close with recommendations for both systems. Finally, in Chapter 6 the project will be analysed, and deviations from the work plan and the risks mentioned in the project preparation period will be clarified. Appendix A illustrates the benchmarks results and a sample of tests results. Appendix B shows how to obtain the benchmarks source, compile and run it with different configurations. Moreover, one of the tests functions is illustrated. 2

10 Chapter 2 Background 2.1 Big Data Big Data is defined as large and fast-growing datasets. These datasets are captured from several resources, such as experiments and sensors. The technology is developing and becoming more precise, which produces more data. The storage, search, analysis and resultant availability of this Big Data is a problem. This issue doubles every year [2] [3], and perhaps even more so, because of the imminence of the Exascale age. Currently, various resources, such as sensors, search-engines, social networks and experiments are creating 2.5 exabytes of data daily [4]. This is exemplified by Google producing 20 petabytes of data per day and The European Organization for Nuclear Research (CERN), in its ATLAS experiment, a petabyte per second [5], as two examples out of many sources. Obviously, dealing efficiently with this big data is a great demand. Moreover, big data will generate more data movements than before. Unfortunately, present-day secondary storage may not behave so well, which may increase the I/O bottlenecks caused by their modest performance. These bottlenecks lead to most of the time being spent in moving instead of processing data [6]. However, data intensive computing is now being introduced to increase the ability to analyse and understand the data where captured [1]. Moreover, [7] mentioned that the increase in processing capabilities is another reason for big data s growth. 2.2 Data Intensive Computing Data-intensive computing, according to [8], is the fourth basic research paradigm, after the experimental, theoretical and computer simulations and is urgently required for three reasons. The first is the requirement to store and retrieve large amounts of data efficiently. 3

11 The second reason involves saving the energy employed by this great demand by using computational resources more effectively through compiling low-power systems [3]. These systems will be described later. Finally, analysing the large distributed amount of data is not an easy task. The ATLAS experiment, for example, is processing 10 petabytes of data every year [9]. Three main issues [2] should be considered when the DIC is mentioned. These issues are a balanced system, multi-scaling and the raw sequential I/O. The next subsections will describe these terms in depth Balanced systems Balanced systems could be defined as systems that can move data at a reasonable speed to prevent the CPUs from experiencing delays. The performance of a system is limited by its slowest component, when the I/O operations are usually the slowest component, in addition to the main memory and a poorly designed cache [6]. Thus, Amdahl has set up a number of laws [10] that determine whether or not the system is balanced. These laws are: the Throughput Balance Law: A system needs one I/O bit per second per instruction per second; this ratio is known as the Amdahl number. This number could be calculated by dividing the bandwidth how many bytes the system can read per second by the flops rate, which is the number of floating point operations per second. This number will be the main concern in this project. the Memory Law: One byte of memory per one instruction/sec. This ratio provides an indication of whether or not there is sufficient memory to undertake the computation. The ratio is known as the Amdahl memory ratio. the I/O Law: This is the ratio of one I/O operation for every 50,000 instructions. Moreover, this ratio captures the I/O latency when there is less than one I/O operation per 50,000 instructions. Amdahl s experience as a computer architect and his observations of real life applications led to these practical laws [11]. Although these laws are a few decades old now, the kind of computations being carried out over these decades has not changed in any fundamental way. Thus, these laws are still valid and are easily applied to current real-life applications. In addition to the Throughput Balanced law, the sustained bandwidth of the CPU memory and I/O being of the same order of magnitude is another indication of the system in balance. 4

12 2.2.2 Raw sequential I/O When considering a large amount of data, several terabytes for example, the memory and cache should be taken out of the performance equation. The best achievement is to use external schedulers such as in-drive scheduling, a schedule command queue to employ maximum efficiency. Another scheduler is on-board caching, which is a simple cache on modern disks to hold the data without searching on the disk itself [12]. On-board caching is effective in the case of a sequential I/O. The performance of the Hard Disk Drive (HDD) has experienced significant improvements over the years. For example, the first 500 GB hard drive [13] the Hitachi GST had a transfer speed of 3 Gb/s; then, in 2011, Seagate produced the first 4.0 Terabyte hard drive with data transfer boosted up to 80 Gb/sec [14]. However, these improvements in size and data transfer are not sufficient when compared with modern developments in CPUs or the huge increase in data sizes. The problem of the modest I/O sequential read performance could be resolved by an order of magnitude using Amdahl-balanced Blades (to be discussed in Section 3.4) Multi-scaling To solve the slowdown in the performance of the I/O, one of the scaling approaches must be used. These approaches are scaling-up and scaling-out. First, scaling-up means adding more multi-processors and a high-performance disk array [3]. This method is expensive and produces low productivity especially in the case of the sequential I/O but it has less management and partitioning overheads [2]. Second, scaling-out is known as using lower-power processors and attaching more nodes, each node being attached to one or more disks [1]. Scaling-out is cheaper and increases the throughput, thus reducing the bottleneck. On the other hand, this method creates a massive overhead, represented by the disks I/O management and the data partitioning between the disks. [15] illustrates that the scale-up is used in a large symmetric multiprocessing (SMP), while the scale-out are in the form of clusters. A number of research outcomes, such as Gray s law [16], have suggested that the scaleout is the most suitable type for data intensive computing. In this project, these two scaling approaches are similar to the differences between Eddie and EDIM1. In other words, EDIM1 is scale-out while Eddie is more scale-up. 2.3 Styles of Data Intensive Computing There are three styles of Data Intensive Computing that have been developed over recent years [17]. First, data-processing pipelines initialise a pipeline of processes after capturing the data. This pipeline shown in Figure 2-1 is used to significantly reduce the amount of data through well-defined steps. Second, the data ware house is a large, undistributed storage of data, while the third style is distributed storage, known as Data Centres. 5

13 2.3.1 Data-processing pipelines To reach small and easy-to-interpret datasets, data-processing pipelines have three defined main steps. The first is the High-throughput Capture; in this process, the data are captured from the source. After that, the researchers manipulate the data, such as deleting unnecessary content and searching for an easy route for analysis. Finally, the data is stored and is ready for the next step. The second step is the Analytics; in this process, the data were interpreted using a specific, complicated algorithm. This step requires high-performance computers to obtain the results. Finally, the Understanding Step, which is the visual analytical focus area. In this process, the researchers understand the analysis of the results produced from the previous step. The outcomes of the work might finally be visualised. Low High Data volume Understand Information density Time Analytics High High-throughput capture Low Figure 2-1: Illustration of the steps in the dataprocessing pipeline [17] 6

14 2.3.2 Data warehouse The data warehouse could be considered as the main storage location of the data or the main archive that draws the complete image to make a decision [18]. Its main function is to store the data until researchers extract and interpret the information. The Sloan Digital Sky Survey (SDSS) [19] is the most important survey in the astronomy age. All the images and information produced by the SDSS telescope are collected in the SDSS SkyServer. The SDSS SkyServer, as an example of a data warehouse, can host up to 40 terabytes of raw data, to be interpreted later by the astronomers [20]. The SkyServer provides a search tool called the dr6 for the public; this allows the data to be searched after it has been interpreted. The concept of the data warehouse is similar to cloud computing. Alexander Szalay [2] considers this a similarity, because both cloud computing and the data warehouse are accessible to services, both data and computation, in the same shared centralised facility. Szalay argued that a data warehouse design has more advantages than a grid-based one, because of the capability of allowing many users to access the shared datasets. On the other hand, it has been demonstrated that the popularity of using remote grids is vague [21] Data centres The newest data intensive computing style is inspired by the Internet and the concept of data distribution [17]. In this style, the data is stored in any distributed data centre, and a specific programming model, such as MapReduce, is used to process these large datasets. An example of data centres are the data centres that support the academic research provided by the National Science Foundation, Google and IBM. In 2008, they provided 1,600-processor clusters to provide the researchers with access to distributed sources using Hadoop [22]. Another issue that may classify the data centres, as suggested by [2], is that because of the large amount of data, the computation should be brought to where the data is, not vice versa, thus, saving transmission time. Avoiding the I/O problems and possible data loss are the expected outcomes of this change. In addition to the previous suggestion, Jim Gray [16] shows that scientific computing would benefit from the data centres that they are, in fact, becoming more data intensive. 7

15 2.4 Amdahl-balanced blades The increase in the data size yields a corresponding increase in the power consumptions [23]. Moreover, the main target, to achieve a system with fewer I/O bottlenecks, was the reason that convinced [3] to find a blade that is energy efficient and produces a high sequential read throughput. Sequential read is now used, because it is the most appropriate metric for an I/O throughput, because of the intensive usage and the high throughput. The power consumption problem could be improved by using energy-efficient CPUs [23], such as the Intel 1.6 GHz Atom N330 (used in EDIM1 [24] and GrayWulf [2]). Theoretically, and according to [3] the 1.6 GHz CPU is balanced with 32,000 I/O operations per second; and this number is what the Solid State Drives (SSDs) used in GrayWulf can almost produce. Another example of an SSD, the Crucial Real SSD C300 [25] the Solid State Drive used in EDIM1 can carry out up to 45,000 I/O operations per second. Another reason for choosing the SSD is that there is less time elapsed, compared to the HDD. On the other hand, SSDs are much more expensive. The Amdahl-balanced blade, used in GrayWulf [3], consists of a dual-core Atom 330, and the blade is considered to be balanced by using two SSDs. EDIM1 uses a similar architecture as the Amdahl-balanced blades. Instead of using a second SSD, three HDDs are used. After examining the prices of one of the SSDs used in the GrayWulf and the three HDDs used in EDIM1, one can find that the total disks in EDIM1 are cheaper by USD$100. 8

16 2.5 Devices In this project, two supercomputers will be examined. These systems are EDIM1 and Eddie. The following sections will first describe the technical information; and then, the theoretical machine values will be computed and demonstrated. Finally, a wellbalanced mythical system will be described Technical information EDIM1 Edinburgh Data Intensive Machine 1 is a data-intensive experimental cluster, owned by EPCC and the School of Informatics at the University of Edinburgh, containing 120 compute nodes in three racks. Each compute node consists of a dual-core Intel Atom N330 CPU (1.6 GHz) on a Zotac mini-itx motherboard, all having a lowpower consumption. The flops rate has been increased through the use of the NVIDIA ION GPU. EDIM1 has a 4 GB DDR3 shared-main memory, three 2-TB Hitachi Deskstar 7K3000 HDDs and one 256 GB RealSSD C300 SSD [24], as shown in Figure 2-2. Figure 2-2: Node Architecture in EDIM1 [1] 9

17 As noted in the specifications, each node is similar to the Amdahl balance blades described in Section 2.4. Table 2-1 concludes the technical information and clearly shows that the nodes do not rise to the high flops rates of computational clusters such as Eddie but are cheaper, more suitable for data intensive work and have a lower power consumption rate Eddie The Edinburgh Compute and Data Facility (ECDF) has a high performance cluster known as Eddie with 156 nodes, each consisting of two 8-core Intel Xeon E5645 CPUs (2.4 GHz). Each node comes with 24 GB of DDR3 memory and a 250 GB 7200 rpm SATA drive [26]. Moreover, the Eddie is equipped with a special storage system that permits it to supply a massive amount of storage for the University of Edinburgh. The nodes are connected by a 1 Gb Ethernet to 48-port switches, each of which connects back by a 1 x 10 Gb Ethernet to another network switch. There are eight storage servers that connect to that switch (also at 10 Gb). The storage servers connect to the storage via 2 x 8 Gb fibre channel SAN connections [27]. Table 2-1: Theoretical (Datasheet) Values EDIM1 Eddie CPU model Intel Atom N330 Intel Xeon E5645 CPU clock rate 1.6 GHz 2.4 GHz Cores/CPU 2 8 CPUs / node 1 2 Memory / node 4 GB DDR3 24 GB DDR3 Secondary storage / node 256 GB SSD + 6 TB HDD 250 GB HDD + storage system 10

18 2.5.2 Machines theoretical values Theoretical flops rate The theoretical flops rate for a node, widely known as the theoretical peak, could be calculated as the following formula [28] : Theoretical flops rate / = CPU speed number of CPU instructions/ cycle number of cores in CPUs number of CPUs in a node (1) In EDIM1 In Eddie Theoretical flops rate = =6.4 / Theoretical flops rate = =76.8 / Unsurprisingly, this gap in the results between the two machines provides the first indicator that these two machines are built for different purposes. In other words, EDIM1 is a specific for data intensive research and provides a balance between the CPU and the disks, while Eddie is designed for general purpose high-performance computations Theoretical Amdahl Memory Ratio The best indicator determines whether or not there is sufficient memory to undertake the computation. This indicator is based on the Law discussed previously in Section In the case when the memory ratio is exactly one, this theoretically means there is exactly one byte of memory per instruction per second. Which means the system s performance would not decrease, because of the reliance on reading from the disks several times. In other words, the main memory has plenty of space, so there is no need to read from the disk. The Amdahl Memory ratio could be calculated as follows [1]: So, EDIM1's value is: Amdahl Memory Ratio= 11 Memory size per node CPU Rate (2) Amdahl Memory ratio= = 1.28

19 While Eddie's value is: Amdahl Memory Ratio = = 1.28 From the previous results, one can note that there is more than one byte of memory per instruction in both systems. That means both systems nodes offer more memory than Amdahl noted for a real-world program s requirements Theoretical Amdahl IOPS ratio The I/O, in addition to memory, produces a bottleneck, widely known as the I/O bottleneck. In the case of the HPC, the I/O bottleneck is painful and wastes the CPU's time and energy in waiting. This dilemma is exacerbated in the DIC systems, because of the massive number of data and I/O operations in them. Theoretically, it is possible to decide whether or not there is a likelihood of I/O bottlenecks occurring through the Amdahl IOPS ratio [29]. Similar to the Amdahl Memory Ratio, when the number is close to one, it means it is unlikely for I/O bottlenecks to occur and shows the system can do more I/O than calculations. This value is a very important value in DIC systems; reducing as many I/O bottlenecks as possible will save much time and power, which are the criteria to be borne in mind during the purchase or when building the system. To compute the theoretical Amdahl IOPS ratio, it is necessary to calculate the IOPS value before applying the following formula [1]: Amdahl IOPS ratio = (3) To find the IOPS value for the HDD HDD IOPS = 1000 average seek time + average latency (4) According to the technical specification for the RealSSD C300 [25], the random read IOPS is 60,000. Moreover, for each node, there are three Hitachi Deskstar 7K3000 HDDs [30] for which it is stated that the average seek time is 0.5 ms and the average latency is 4.16 ms. From these numbers we get: HDD IOPS = SSD IOPS = = Total Node s IOPS = =

20 Finally, the Amdahl IOPS ratio is: Amdahl IOPS ratio = = For Eddie, there is only one 250 GB HDD per node with an average seek time of 2.0 ms and an average latency of 4.2 ms. Calculating the HDD IOPS as before, we get: HDD IOPS= = Moreover, the ECDF researchers have found a way to store a large amount of data using the storage architecture to retrieve data from two-tier storage. According to [27], the IOPS for Eddie s file system is : Total Nodes IOPS = = Then, the Amdahl IOPS ratio for a single node reading from all disks is: Amdahl IOPS ratio= =0.424 To conclude, the Amdahl IOPS ratios for both systems are not so close to one, but they are similar to the average results noted in [10]. However, knowing that Eddie relies on a special storage system to deal with parallel I/O operations, the Amdahl IOPS ratio is not reliable and then this ratio would be negligible compared to the whole system s I/O performance Theoretical Amdahl number The most important number among Amdahl's numbers is the value that decides the balance between the I/O throughput and the CPU rate. The Balanced System Law ( 2.2.1) argues that, to consider the system is balanced, it should perform exactly one bit of I/O per second per instruction per second. The theoretical Amdahl number is calculated as [1]: Total throughput Amdahl number= CPU Rate According to the data sheet [25], the SSD throughput is 265 MB/s. Moreover, for each node, there are three local Hitachi Deskstar 7K3000 HDDs [30], stating that the throughput for each HDD is 162 MB/s. From these figures it is possible to calculate the throughput as: (5) 13

21 Maximum throughput = =751 / Finally, the theoretical Amdahl number is: Amdahl number= =0.98 For Eddie, on the other hand, the ECDF researchers used a special I/O system using NAS and GPFS to retrieve data from two-tier storage (described in Section ). According to [27], the throughput for Eddie is 3.72 GB/s: Amdahl number = =0.416 EDIM1 is perfectly balanced, because the Amdahl number is so close to 1.0. In the case of Eddie, this ratio is not accurate, owing to the use of a system that depends on multiple components, such as the network and fibre channels, that may have bottlenecks. To compute this ratio in practice, [1] wrote a benchmark to determine the Amdahl number of the machine. Moreover, other synthetic benchmark with different I/O patterns will be written during this project. To see whether the sustained bandwidth of the CPU, memory and I/O are in the same order of magnitude, simple benchmarks will be written during the scope of this project. To make the theoretical numbers clearer, Table 2-2 includes all of these values and compares EDIM1's and Eddie's theoretical numbers visually. Table 2-2: Machine theoretical values EDIM1 Eddie FLOPS rate 6.4 GFLOP/s 76.8 GFLOP/s IOPS / node IOPS IOPS Disks peak throughput/node GB/s GB/s Amdahl Memory ratio Amdahl IOPS ratio Amdahl number

22 2.5.3 Mythical system To make the Amdahl number easier to understand, this mythical system was assumed to show the best balance situation and number that the system could have. Because of the mythical situation of the system, a set of assumptions would be raised. These assumptions are that the CPU s sustain a flops rate that is the theoretical FLOP/s, which in the case of the Atom 330 is 6.4 GFLOP/s, and a mythical budget and space. To obtain the full benefit from the CPU, the I/O should manage to transfer 6.4 GFLOP/s, which means the desired throughput is 51.2 GB/s. 6.4 GFLOP/s 8 =51.2 / Thus, to achieve this throughput, the system needs ~ 321 HDD OR ~ 191 SSD per node. In other words, if it is preferable to use the ratio 3 HDD: 1 SSD used in EDIM1 which means the system should have a 210 HDD and 70 SSD per node. Should the system have this mythical I/O system then that means the throughput is 51.3 GB/s, which is what the system requires. Finally, the Amdahl number for this mythical system is: Amdahl number= =68.85 This is an illustration of how data intensive computation could be, in theory. However, it is very unlikely to find an application that would benefit from this ratio; the reason being that Amdahl s idea of what would make a balanced machine in practice is based on his observations of real-life applications, not theoretical numbers. 15

23 Chapter 3 Benchmarks 3.1 Floating point operations per second The measurement of the computer's Floating Point Operations per Second (FLOP/s) for determining the overall performance is widely used nowadays. For instance, the TOP500 list is compiled using the flops rate from the LINPACK benchmark, and the list also contains the theoretical peak for each system in the list [31] ( ). Although the computer has other components, such as the I/O, memory, cache and communications, the reason for choosing the flops rate to be the main measurement criterion is that the floating-point operations are intensively computed and are used a great deal in the scientific computing field [32]. The author noted that, because the data intensive field is relatively new, certain terminologies are used most inconsistently amongst the authors. For instance, the term FLOPS could refer to the flops rate, the name of the FLOPS benchmark, the unit of floating point operations per second or the abbreviation for floating point operations. Obviously, maintaining consistency in the terminology is important for avoiding confusions FLOPS Before running the LINPACK stress test, a simple single-threaded c program was identified [33]. In this benchmark, there are eight computational kernels. Each kernel carries out several floating-point operations (Table 3-1). Table 3-1: The eight kernels in the FLOPS benchmark [1] Kernel FADD FSUB FMUL FDIV TOTAL

24 Thus, the output from each kernel is the flops rate. From this stage, the average of all the computational kernels would be calculated and would result in the performance of one core in one CPU in the supercomputer. Finally, depending on the machine, the following formula is used to calculate the overall performance [1]: LINPACK flops rate =number of cores in CPU number of CPUs in a node avg.number of flops from FLOPS benchmark In 1979, Jack Dongarra introduced the LINPACK benchmark, which solves a Gaussian elimination using the LU decomposition with partial pivoting [34]. Because of the nature of the algorithm, the benchmark could be considered as a stress test for the CPU, the on-chip caches and the main memory [35]. This provides a good estimate of how fast the computer could solve real problems. In most powerful CPUs, the LINPACK flops rate is close to the theoretical flops rate [36] calculated in Section Unlike the FLOPS benchmark, the LINPACK benchmark uses all the CPUs and cores available in the system. Thus, there is no need to multiply by the number of CPUs and number of cores to obtain the flops rate. The benchmark asks the user for the size of the array to solve a Gaussian elimination, see Table 3-2. [37], comparing the size of the array and the data used. Moreover, the number of run must be precisely specified by the user. Finally, the output of this benchmark will be the average number of the FLOP/s from all runs. (6) Table 3-2: The sizes of the array compared to the data used in LINPACK [37] Size of the Array Data Used GByte GBytes GBytes GBytes 17

25 3.1.3 Simple CPU To help in finding the system s main bottleneck, a simple set of benchmarks will be written. These benchmarks, with the defined sets of benchmarks, would provide a clear image of the device flops rate and the possible bottlenecks, if any. These benchmarks (Sections 3.1.3, and 3.3.2) are similar to the other widely-used benchmarks, but the reason for writing them is to make the benchmark as simple as possible for defining the real number of FLOP/s observed. In this defined set, three simple benchmarks are written. First is a simple CPU benchmark, which is the same concept as the FLOPS benchmark; in this benchmark, a register is defined as a variable of type double that fits in the CPU cache. Then, the time to carry out N floating-point operations will be measured. Finally, the number of FLOP/s will be calculated by: CPU / = Number of operations time From the maximum FLOP/s value, it is possible to calculate the maximum data rate from the memory that derives the full benefit of the CPU rate. Depending on the computer architecture the 64-bit processor used in the example and the maximum number of FLOP/s calculated above, this rate could be calculated as [1]: 64 maximum flops rate / Data rate = The divisor is used to convert the bits to Gbytes. This is the rate at which the bits of data would have to arrive at the processor to ensure that it could maintain its full flops rate. (7) 3.2 Memory bandwidth The next main measurement is the memory bandwidth. It is known that the memory bandwidth determines how fast the memory can maintain the flops from the CPU. Therefore, fast CPUs with a low memory bandwidth lead to a bottleneck. In the following benchmarks, the cache effects are not taken into account, because the size of the problem is larger than the cache. 18

26 3.2.1 Stream Another stress test examines the memory rather than the CPU, as in the previous two benchmarks. The output of this benchmark is the actual data rate that the main memory provides, and is known as the Memory Bandwidth [38]. The stream benchmark was written by John McCalpin and Joe Zagar and uses four operations [39]: 1. Copy: a[i] = b[i] 2. Scale: a[i] = d * b[i] 3. Sum: a[i] = b[i] + c[i] 4. Triad: a[i] = b[i] + d * c[i] As shown above, these four operations are applied to a number of vectors. These vectors should be much larger than the cache. [40] suggests that the size of the vector should be at least: Cache size = Number of CPUs 2 In the case of more than one core in the CPU, the benchmark should be run on all the cores using the OpenMP version of the benchmark. The benchmark runs a number of times, with the best result being taken. As a result of the high reliance on the memory, the Triad kernel will be the benchmark used to quantify memory bandwidth [39]. To determine in practice whether or not the memory can deliver data at a speed that avoids the bottleneck, the balance ratio should be calculated conceptually, it is similar to the Amdahl number. To calculate the balance, one must calculate how many FLOP/s the memory can deliver using the triad bandwidth resulting from the stream benchmark: Memory / = (8) Triad bandwidth / 8 Then the balance ratio will be calculated as described in Section equation 9. 19

27 3.2.2 Simple memory The second simple benchmark is the memory benchmark. In this simple benchmark, a double vector would be defined with a size of the L2 cache 4 [39], by using a number of threads to add 1 to all the elements in the vector; the time will be taken and then compute the following: Memory / = size of vector number of threads time The balance is calculated, as mentioned earlier as the Amdahl Memory Law, by the following formula [41]: CPU / Balance= Memory / (9) The best result is when the balance is exactly 1.0; this shows that the CPU can retrieve the data from the memory without being idle. 3.3 File system For data intensive applications, the I/O is valuable, because these applications are dealing with massive amounts of data retrieved from disks. Thus, these data intensive applications are more likely than others to expose I/O bottlenecks in a computer. For this reason, the I/O tests are important for determining the computers balances IOZone The most popular and generally-used File System benchmark is the IOZone. Using a number of operations, multiple file/record sizes and secondary options, the benchmark determines the file system's performance, the CPU usage and the throughput [42] [43]. To ensure reliability, the entire test should be applied to files larger than the node s main memory [43]. Moreover, the sequential read will be considered. This is because the read would reflect the main throughput, because of the high bandwidth and wide usage [1] [32]. 20

28 3.3.2 Simple I/O The third simple benchmark is a simple I/O benchmark. Basically, this benchmark calculates the aggregate throughput by reading a number of files sequentially using a number of threads. The number of threads should be equal to the number of files, because each thread reads a file. Also, the user sets the appropriate buffer size and the program calculates the throughput by: Disk Throughput / = bytes read time 3.4 Balanced systems Finally, the concept of a balanced system is an important concept in the data intensive computing field. The situation is where there is a balance between the computations and the I/O operations, that is, few I/O bottlenecks. There are a number of laws that classify the definition of balance, one of which is the throughput balance law (Section 2.2.1). The I/O machine balance ratio, better known as the Amdahl number, clarifies the I/O balance as the system requiring one bit of I/O per second per instruction per second (Section ). In this report, the previous theoretical value will be measured using an existing benchmark and a set of modified benchmarks that represent both the real and theoretical I/O patterns. 21

29 3.4.1 Amdahl synthetics benchmarks Existing benchmark Written by Kulkarni [1] as a dissertation at EPCC in 2011, the existing benchmark calculates mainly the practical value of the Amdahl number using the equation 5 in (Section ). Figure 3-1 shows the basic structure of the benchmark, where the main idea is that it declares a set of buffers the same as the number of threads and two kinds of threads: I/O threads, which read from a file to the buffer; and compute threads, responsible for carrying out certain computations on the data they read in. The number of I/O threads is the same as the number of files; here, the number of computing threads should be specified by the user. Figure 3-1: Illustration of Amdahl synthetics benchmarks [1] There are three states of the buffer: free, loading and ready. At the start of the program the buffers are free. Reading the data from the files to the buffers by the I/O threads would change the status of the buffer to loading. When the buffer is filled, the buffer s status changes to ready for computation. On the other hand, the computational threads wait for the buffers to be ready and then carry out the appropriate computations determined by the user turning them to free again when finished. In addition, the benchmark decides whether the system is compute or I/O-bound by calculating the idle time of the threads either the compute or the I/O threads. When it is noticeable that a system has a large idle time with the compute threads waiting for the buffers to be ready for a long time, the system is an I/O-bound. More information about the algorithm used and the details on how to use the benchmark can be found in [1] Modified benchmarks The next set of benchmarks is a modification of the original Amdahl benchmark. In this set, the I/O patterns are changed. This is employed to determine the best number of threads, the best technique used and the best buffer sizes used in a real application to reach to a good balance level between the I/O and compute. 22

30 Synthetic benchmark The synthetic benchmark decides the best composition of the threads to be used to obtain the best Amdahl number (closer to 1.0). This benchmark is based on the theoretical calculations for the output from a simple I/O (Section 3.3.2) and a simple CPU (Section 3.1.3), using different numbers of threads. Here, the main calculation is the calculation for the Amdahl number ( ). Figure 3-2 shows the theoretical calculations of the outputs from a simple CPU and a simple I/O, using multiple threads. In Figure 3-2, a number of four-threads have been used, because any number of threads could be run on the core; however, hyper-threading makes each core in the system appear to the operating system as if it is two virtual cores with a total of two cores in a node. In Figure 3-2, (a) is useless, because no data can reach the application; (b) is also useless, because there are no computations; (c) is a waste of I/O because, for a typical application, it is unlikely to be able to sustain a throughput of GB/s. Moreover, this scenario is limiting the amount of the FLOP/s that should be carried out. Starting from the fourth possibility, (d), the scenarios look quite balanced. In this instance, one could manage to achieve just over 1 FLOP on each byte read in, if the application can maintain a throughput of GB/s. Finally, Figure 3-2 (e) and (f) are the most balanced by using one thread for the I/O operations and three compute threads. Using the HDD in (f) provides the least throughput among the other scenarios and, as a consequence, the most balance. To conclude, by getting closer to 1.0, there is a greater chance that all components will be put to good use for a typical code. On the other hand, if there is a high data intensive code, it benefits from a machine with an Amdahl number greater than one. 23

31 a. Four compute threads Flop/s: 1.04 GFLOP/s Throughput: 0 GB/s Amdahl number: 0 b. Four I/O threads Flop/s: 0 GFLOP/s Throughput GB/s Amdahl number: inf c. One compute thread and 3 I/O threads d. Two compute threads and 2 I/O threads Flop/s: GFLOP/s Throughput: 0.41 GB/s Amdahl number: Flop/s: 0.39 GFLOP/s Throughput: 0.30 GB/s Amdahl number: 6.66 e. Three compute threads and 1 I/O thread (SSD) f. Three compute threads and 1 I/O thread (HDD) Flop/s: 0.95 GFLOP/s Throughput: 0.25 GB/s Amdahl number: 2.24 Flop/s: GFLOP/s Throughput: 0.15 GB/s Amdahl number: 1.34 Figure 3-2: Different threads compositions throughputs and Amdahl numbers A synthetic benchmark is written to bring the previous results from a theoretical manner to one fit for a practical application. The benchmark defines two types of threads: I/O and compute. The number of I/O threads is the same as the number of files, while the compute threads are calculated by subtracting the number of I/O threads from the number of total threads requested. For each I/O thread there is a corresponding buffer, whose size should be entered by the user, defined and filled by the thread until the thread reads the entire corresponding file. The other type of threads, the compute threads, are carrying out an operation floating point adding, for simplification for the number of times specified by the user. 24

32 Real world I/O patterns benchmark The real world I/O pattern benchmark is used to calculate the balance using the Amdahl number and the performance of the system by using I/O patterns similar to those found in real-world applications. The I/O is the medium that connects the application with the real world. In the traditional applications, the I/O operations were only used at the beginning of the programs by reading the data from the file and writing the results to the file at the end of the program. However, the reliability of these programs decreased with the time, because of the increasing faults in the processors. To increase the reliability of these programs, writing the results at some points before the end of the programs is necessary for tolerating these faults; these are widely known as checkpoints. Thus, the main role for the checkpoints is to allow for the re-starting of the application at a given point, if an interesting event happens in the simulation. Moreover, the application could re-start with different parameters for many purposes, such as wishing to simulate from that point on in more detail or with more instrumentation; data dumps for visualisation; partial results to allow computational steering; an output of the system state at a fixed point in simulation time to later measure the dynamic properties. Here, different techniques and effects would be measured to achieve the most balanced and high performance I/O patterns that are recommended for use in the machine. These techniques are Checkpoints and Barriers. Figure 3-3 shows the different I/O patterns that would be considered for the scope of this benchmark. In Figure 3-3, (a) is using the barriers and checkpoints, so it writes twice to the files at the same thread s times; (b) is not using checkpoints, merely barriers. This pattern only writes once to the files. The third possibility, (c), is using the barriers and checkpoints, so it writes twice to the files at the different times between the threads and, hence, finishes at different times. Finally, (d) is not using either barriers or checkpoints, which is the simplest situation among the other scenarios. 25

33 a. With Checkpoint, with Barriers b. Without Checkpoint, With Barrier READ READ Write Write c. With Checkpoint, without Barrier d. Without Checkpoint, without Barrier READ READ Write Write Figure 3-3: Different real program patterns All the patterns begin by reading from the number of files entered by the user. The number of files should be the same as the number of threads. A number of barriers (a and b) and a checkpoint (a and c) will be taken into account as options in this benchmark to differentiate between the patterns. These two options (Barrier and Checkpoint) should be requested by the user by running the program with --f = 1 to add barriers and --c = 1 for checkpoints, otherwise, replace 1 with 0. For example, to get a benchmark that consists of a checkpoint, but without the barriers (scenario c), the user should choose --f = 1 and --c = 0. 26

34 The benchmark begins by reading a buffer size from a file and carrying out a set of operations on the buffer. Next, depending on whether or not the user requests using barriers or checkpoints between the computations, the benchmark will take them into account. Finally, the output will be written in separate files. Each thread writes its own results in a separate file named an output_[thread_id], where the thread_id is the number of the thread, starting from zero. Finally, a set of results would be calculated, one of them being the Amdahl number. Figure 3-4 illustrates the algorithm of the realworld I/O pattern benchmark. In all of the benchmarks developed during the scope of this project, the precision when calculating the time was an important consideration. To achieve an acceptable precision when calculating the times in the benchmarks, the time measured by a loop that carried out a number of floating point operations was subtracted from the time for an empty loop (without optimisations). Moreover, any cache effect was turned off by running the code with special flags. A volatile keyword was assigned to the variable being measured to avoid any optimisation of the variable. The simple benchmarks would be carried out in a number of threads, depending on the machine and the number of times determined by the user. Again, the best time would be taken. 27

35 28 Figure 3-4: Algorithm for real program I/O pattern benchmark

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training

More information

Low-Power Amdahl-Balanced Blades for Data-Intensive Computing

Low-Power Amdahl-Balanced Blades for Data-Intensive Computing Thanks to NVIDIA, Microsoft External Research, NSF, Moore Foundation, OCZ Technology Low-Power Amdahl-Balanced Blades for Data-Intensive Computing Alex Szalay, Andreas Terzis, Alainna White, Howie Huang,

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Windows Server Performance Monitoring

Windows Server Performance Monitoring Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly

More information

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Building a Top500-class Supercomputing Cluster at LNS-BUAP Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad

More information

Energy Efficient MapReduce

Energy Efficient MapReduce Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC

More information

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING OBJECTIVE ANALYSIS WHITE PAPER MATCH ATCHING FLASH TO THE PROCESSOR Why Multithreading Requires Parallelized Flash T he computing community is at an important juncture: flash memory is now generally accepted

More information

Deploying Flash- Accelerated Hadoop with InfiniFlash from SanDisk

Deploying Flash- Accelerated Hadoop with InfiniFlash from SanDisk WHITE PAPER Deploying Flash- Accelerated Hadoop with InfiniFlash from SanDisk 951 SanDisk Drive, Milpitas, CA 95035 2015 SanDisk Corporation. All rights reserved. www.sandisk.com Table of Contents Introduction

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_SSD_Cache_WP_ 20140512 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges...

More information

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1 Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_WP_ 20121112 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD

More information

PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation

PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation 1. Overview of NEC PCIe SSD Appliance for Microsoft SQL Server Page 2 NEC Corporation

More information

Accelerating Server Storage Performance on Lenovo ThinkServer

Accelerating Server Storage Performance on Lenovo ThinkServer Accelerating Server Storage Performance on Lenovo ThinkServer Lenovo Enterprise Product Group April 214 Copyright Lenovo 214 LENOVO PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND, EITHER

More information

Evaluation Report: Supporting Microsoft Exchange on the Lenovo S3200 Hybrid Array

Evaluation Report: Supporting Microsoft Exchange on the Lenovo S3200 Hybrid Array Evaluation Report: Supporting Microsoft Exchange on the Lenovo S3200 Hybrid Array Evaluation report prepared under contract with Lenovo Executive Summary Love it or hate it, businesses rely on email. It

More information

HP SN1000E 16 Gb Fibre Channel HBA Evaluation

HP SN1000E 16 Gb Fibre Channel HBA Evaluation HP SN1000E 16 Gb Fibre Channel HBA Evaluation Evaluation report prepared under contract with Emulex Executive Summary The computing industry is experiencing an increasing demand for storage performance

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW 757 Maleta Lane, Suite 201 Castle Rock, CO 80108 Brett Weninger, Managing Director brett.weninger@adurant.com Dave Smelker, Managing Principal dave.smelker@adurant.com

More information

Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array

Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array Evaluation report prepared under contract with Lenovo Executive Summary Even with the price of flash

More information

With DDN Big Data Storage

With DDN Big Data Storage DDN Solution Brief Accelerate > ISR With DDN Big Data Storage The Way to Capture and Analyze the Growing Amount of Data Created by New Technologies 2012 DataDirect Networks. All Rights Reserved. The Big

More information

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance Hybrid Storage Performance Gains for IOPS and Bandwidth Utilizing Colfax Servers and Enmotus FuzeDrive Software NVMe Hybrid

More information

Technical White Paper. Symantec Backup Exec 10d System Sizing. Best Practices For Optimizing Performance of the Continuous Protection Server

Technical White Paper. Symantec Backup Exec 10d System Sizing. Best Practices For Optimizing Performance of the Continuous Protection Server Symantec Backup Exec 10d System Sizing Best Practices For Optimizing Performance of the Continuous Protection Server Table of Contents Table of Contents...2 Executive Summary...3 System Sizing and Performance

More information

Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2

Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2 Using Synology SSD Technology to Enhance System Performance Based on DSM 5.2 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD Cache as Solution...

More information

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE 1 W W W. F U S I ON I O.COM Table of Contents Table of Contents... 2 Executive Summary... 3 Introduction: In-Memory Meets iomemory... 4 What

More information

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router HyperQ Hybrid Flash Storage Made Easy White Paper Parsec Labs, LLC. 7101 Northland Circle North, Suite 105 Brooklyn Park, MN 55428 USA 1-763-219-8811 www.parseclabs.com info@parseclabs.com sales@parseclabs.com

More information

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION A DIABLO WHITE PAPER AUGUST 2014 Ricky Trigalo Director of Business Development Virtualization, Diablo Technologies

More information

Hadoop on the Gordon Data Intensive Cluster

Hadoop on the Gordon Data Intensive Cluster Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,

More information

Performance Beyond PCI Express: Moving Storage to The Memory Bus A Technical Whitepaper

Performance Beyond PCI Express: Moving Storage to The Memory Bus A Technical Whitepaper : Moving Storage to The Memory Bus A Technical Whitepaper By Stephen Foskett April 2014 2 Introduction In the quest to eliminate bottlenecks and improve system performance, the state of the art has continually

More information

IPRO ecapture Performance Report using BlueArc Titan Network Storage System

IPRO ecapture Performance Report using BlueArc Titan Network Storage System IPRO ecapture Performance Report using BlueArc Titan Network Storage System Allen Yen, BlueArc Corp Jesse Abrahams, IPRO Tech, Inc Introduction IPRO ecapture is an e-discovery application designed to handle

More information

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez IT of SPIM Data Storage and Compression EMBO Course - August 27th Jeff Oegema, Peter Steinbach, Oscar Gonzalez 1 Talk Outline Introduction and the IT Team SPIM Data Flow Capture, Compression, and the Data

More information

Analysis of VDI Storage Performance During Bootstorm

Analysis of VDI Storage Performance During Bootstorm Analysis of VDI Storage Performance During Bootstorm Introduction Virtual desktops are gaining popularity as a more cost effective and more easily serviceable solution. The most resource-dependent process

More information

ioscale: The Holy Grail for Hyperscale

ioscale: The Holy Grail for Hyperscale ioscale: The Holy Grail for Hyperscale The New World of Hyperscale Hyperscale describes new cloud computing deployments where hundreds or thousands of distributed servers support millions of remote, often

More information

1 Storage Devices Summary

1 Storage Devices Summary Chapter 1 Storage Devices Summary Dependability is vital Suitable measures Latency how long to the first bit arrives Bandwidth/throughput how fast does stuff come through after the latency period Obvious

More information

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010 Flash Memory Arrays Enabling the Virtualized Data Center July 2010 2 Flash Memory Arrays Enabling the Virtualized Data Center This White Paper describes a new product category, the flash Memory Array,

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

HP Z Turbo Drive PCIe SSD

HP Z Turbo Drive PCIe SSD Performance Evaluation of HP Z Turbo Drive PCIe SSD Powered by Samsung XP941 technology Evaluation Conducted Independently by: Hamid Taghavi Senior Technical Consultant June 2014 Sponsored by: P a g e

More information

Sun 8Gb/s Fibre Channel HBA Performance Advantages for Oracle Database

Sun 8Gb/s Fibre Channel HBA Performance Advantages for Oracle Database Performance Advantages for Oracle Database At a Glance This Technical Brief illustrates that even for smaller online transaction processing (OLTP) databases, the Sun 8Gb/s Fibre Channel Host Bus Adapter

More information

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance

More information

IT Platforms for Utilization of Big Data

IT Platforms for Utilization of Big Data Hitachi Review Vol. 63 (2014), No. 1 46 IT Platforms for Utilization of Big Yasutomo Yamamoto OVERVIEW: The growing momentum behind the utilization of big in social and corporate activity has created a

More information

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU File System Encryption Kartik Kulkarni and Eugene Linkov GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through

More information

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework

More information

Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator

Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator WHITE PAPER Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com SAS 9 Preferred Implementation Partner tests a single Fusion

More information

Cluster Computing at HRI

Cluster Computing at HRI Cluster Computing at HRI J.S.Bagla Harish-Chandra Research Institute, Chhatnag Road, Jhunsi, Allahabad 211019. E-mail: jasjeet@mri.ernet.in 1 Introduction and some local history High performance computing

More information

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview

More information

Communicating with devices

Communicating with devices Introduction to I/O Where does the data for our CPU and memory come from or go to? Computers communicate with the outside world via I/O devices. Input devices supply computers with data to operate on.

More information

HP Smart Array Controllers and basic RAID performance factors

HP Smart Array Controllers and basic RAID performance factors Technical white paper HP Smart Array Controllers and basic RAID performance factors Technology brief Table of contents Abstract 2 Benefits of drive arrays 2 Factors that affect performance 2 HP Smart Array

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis 1 / 39 Overview Overview Overview What is a Workload? Instruction Workloads Synthetic Workloads Exercisers and

More information

Advanced Knowledge and Understanding of Industrial Data Storage

Advanced Knowledge and Understanding of Industrial Data Storage Dec. 3 rd 2013 Advanced Knowledge and Understanding of Industrial Data Storage By Jesse Chuang, Senior Software Manager, Advantech With the popularity of computers and networks, most enterprises and organizations

More information

Microsoft Exchange Server 2003 Deployment Considerations

Microsoft Exchange Server 2003 Deployment Considerations Microsoft Exchange Server 3 Deployment Considerations for Small and Medium Businesses A Dell PowerEdge server can provide an effective platform for Microsoft Exchange Server 3. A team of Dell engineers

More information

IOmark- VDI. Nimbus Data Gemini Test Report: VDI- 130906- a Test Report Date: 6, September 2013. www.iomark.org

IOmark- VDI. Nimbus Data Gemini Test Report: VDI- 130906- a Test Report Date: 6, September 2013. www.iomark.org IOmark- VDI Nimbus Data Gemini Test Report: VDI- 130906- a Test Copyright 2010-2013 Evaluator Group, Inc. All rights reserved. IOmark- VDI, IOmark- VDI, VDI- IOmark, and IOmark are trademarks of Evaluator

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Overview of HPC Resources at Vanderbilt

Overview of HPC Resources at Vanderbilt Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources

More information

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database WHITE PAPER Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive

More information

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging

More information

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM 152 APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM A1.1 INTRODUCTION PPATPAN is implemented in a test bed with five Linux system arranged in a multihop topology. The system is implemented

More information

Big Data Challenges in Bioinformatics

Big Data Challenges in Bioinformatics Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?

More information

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems 202 IEEE 202 26th IEEE International 26th International Parallel Parallel and Distributed and Distributed Processing Processing Symposium Symposium Workshops Workshops & PhD Forum The Green Index: A Metric

More information

Qsan Document - White Paper. Performance Monitor Case Studies

Qsan Document - White Paper. Performance Monitor Case Studies Qsan Document - White Paper Performance Monitor Case Studies Version 1.0 November 2014 Copyright Copyright@2004~2014, Qsan Technology, Inc. All rights reserved. No part of this document may be reproduced

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

Energy aware RAID Configuration for Large Storage Systems

Energy aware RAID Configuration for Large Storage Systems Energy aware RAID Configuration for Large Storage Systems Norifumi Nishikawa norifumi@tkl.iis.u-tokyo.ac.jp Miyuki Nakano miyuki@tkl.iis.u-tokyo.ac.jp Masaru Kitsuregawa kitsure@tkl.iis.u-tokyo.ac.jp Abstract

More information

SAS Business Analytics. Base SAS for SAS 9.2

SAS Business Analytics. Base SAS for SAS 9.2 Performance & Scalability of SAS Business Analytics on an NEC Express5800/A1080a (Intel Xeon 7500 series-based Platform) using Red Hat Enterprise Linux 5 SAS Business Analytics Base SAS for SAS 9.2 Red

More information

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Can High-Performance Interconnects Benefit Memcached and Hadoop? Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,

More information

Storage Architecture in XProtect Corporate

Storage Architecture in XProtect Corporate Milestone Systems White Paper Storage Architecture in XProtect Corporate Prepared by: John Rasmussen Product Manager XProtect Corporate Milestone Systems Date: 7 February, 2011 Page 1 of 20 Table of Contents

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

Evaluating HDFS I/O Performance on Virtualized Systems

Evaluating HDFS I/O Performance on Virtualized Systems Evaluating HDFS I/O Performance on Virtualized Systems Xin Tang xtang@cs.wisc.edu University of Wisconsin-Madison Department of Computer Sciences Abstract Hadoop as a Service (HaaS) has received increasing

More information

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Oracle Database Scalability in VMware ESX VMware ESX 3.5 Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises

More information

PARALLELS CLOUD STORAGE

PARALLELS CLOUD STORAGE PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...

More information

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,

More information

The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays

The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays Executive Summary Microsoft SQL has evolved beyond serving simple workgroups to a platform delivering sophisticated

More information

www.thinkparq.com www.beegfs.com

www.thinkparq.com www.beegfs.com www.thinkparq.com www.beegfs.com KEY ASPECTS Maximum Flexibility Maximum Scalability BeeGFS supports a wide range of Linux distributions such as RHEL/Fedora, SLES/OpenSuse or Debian/Ubuntu as well as a

More information

POSIX and Object Distributed Storage Systems

POSIX and Object Distributed Storage Systems 1 POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph by Michael Poat, Dr. Jerome

More information

Rackspace Cloud Databases and Container-based Virtualization

Rackspace Cloud Databases and Container-based Virtualization Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many

More information

2009 Oracle Corporation 1

2009 Oracle Corporation 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material,

More information

Architecting High-Speed Data Streaming Systems. Sujit Basu

Architecting High-Speed Data Streaming Systems. Sujit Basu Architecting High-Speed Data Streaming Systems Sujit Basu stream ing [stree-ming] verb 1. The act of transferring data to or from an instrument at a rate high enough to sustain continuous acquisition or

More information

White Paper. Recording Server Virtualization

White Paper. Recording Server Virtualization White Paper Recording Server Virtualization Prepared by: Mike Sherwood, Senior Solutions Engineer Milestone Systems 23 March 2011 Table of Contents Introduction... 3 Target audience and white paper purpose...

More information

Infrastructure Matters: POWER8 vs. Xeon x86

Infrastructure Matters: POWER8 vs. Xeon x86 Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report

More information

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products MaxDeploy Ready Hyper- Converged Virtualization Solution With SanDisk Fusion iomemory products MaxDeploy Ready products are configured and tested for support with Maxta software- defined storage and with

More information

Hadoop: Embracing future hardware

Hadoop: Embracing future hardware Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

D1.2 Network Load Balancing

D1.2 Network Load Balancing D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June ronald.vanderpol@sara.nl,freek.dijkstra@sara.nl,

More information

Can Flash help you ride the Big Data Wave? Steve Fingerhut Vice President, Marketing Enterprise Storage Solutions Corporation

Can Flash help you ride the Big Data Wave? Steve Fingerhut Vice President, Marketing Enterprise Storage Solutions Corporation Can Flash help you ride the Big Data Wave? Steve Fingerhut Vice President, Marketing Enterprise Storage Solutions Corporation Forward-Looking Statements During our meeting today we may make forward-looking

More information

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011 Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis

More information

Microsoft SQL Server 2014 Fast Track

Microsoft SQL Server 2014 Fast Track Microsoft SQL Server 2014 Fast Track 34-TB Certified Data Warehouse 103-TB Maximum User Data Tegile Systems Solution Review 2U Design: Featuring Tegile T3800 All-Flash Storage Array http:// www.tegile.com/solutiuons/sql

More information

GraySort and MinuteSort at Yahoo on Hadoop 0.23

GraySort and MinuteSort at Yahoo on Hadoop 0.23 GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters

More information

Certification Document bluechip STORAGEline R54300s NAS-Server 03/06/2014. bluechip STORAGEline R54300s NAS-Server system

Certification Document bluechip STORAGEline R54300s NAS-Server 03/06/2014. bluechip STORAGEline R54300s NAS-Server system bluechip STORAGEline R54300s NAS-Server system Executive summary After performing all tests, the Certification Document bluechip STORAGEline R54300s NAS-Server system has been officially certified according

More information

HPC performance applications on Virtual Clusters

HPC performance applications on Virtual Clusters Panagiotis Kritikakos EPCC, School of Physics & Astronomy, University of Edinburgh, Scotland - UK pkritika@epcc.ed.ac.uk 4 th IC-SCCE, Athens 7 th July 2010 This work investigates the performance of (Java)

More information

Amazon EC2 Product Details Page 1 of 5

Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of

More information

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Built up on Cisco s big data common platform architecture (CPA), a

More information

Enabling High performance Big Data platform with RDMA

Enabling High performance Big Data platform with RDMA Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery

More information

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture He Huang, Shanshan Li, Xiaodong Yi, Feng Zhang, Xiangke Liao and Pan Dong School of Computer Science National

More information

Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation

Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation Evaluation report prepared under contract with HP Executive Summary The computing industry is experiencing an increasing demand for

More information

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop

More information

Types of Workloads. Raj Jain. Washington University in St. Louis

Types of Workloads. Raj Jain. Washington University in St. Louis Types of Workloads Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/ 4-1 Overview!

More information