Performance Measurement of a High-Performance Computing System Utilized for Electronic Medical Record Management

Size: px
Start display at page:

Download "Performance Measurement of a High-Performance Computing System Utilized for Electronic Medical Record Management"

Transcription

1 Performance Measurement of a High-Performance Computing System Utilized for Electronic Medical Record Management 1 Kiran George, 2 Chien-In Henry Chen 1,Corresponding Author Computer Engineering Program, California State University, 800, N. State College Blvd., Fullerton, CA kgeorge@fullerton.edu 2, Department of Electrical Engineering, Wright State University Dayton, Ohio henry.chen@wright.edu Abstract With the recent mandate requiring all medical records to be computerized, health care providers will face the difficult challenge to quickly and efficiently access and sort through extremely large databases of electronic medical records. In order to minimize the database query response time, an expedited query application on a high-performance system is needed. Today, high performance computing (HPC) certainly allows us to ask questions on a scale that we have not been able to ask before. HPC is a form of computing that utilizes parallel processing techniques in order to run programs more efficiently and quickly on multiple processing units. In this paper, a high-performance computing application, expedited electronic medical record query (E2MRQ) on a GPU cluster for electronic medical record (EMR) databases being used in hospitals around the country, is presented. This application would be capable of quickly and efficiently searching through the vast medical databases used by healthcare professionals. Performance measurement of the E2MRQ application on the GPU based HPC cluster, including timing for data transfer within the cluster and timing performances for basic data operations, was evaluated. Keywords: Electronic medical records; high performance computing; Graphics Processing Units; database query; cluster computing; parallel processing 1. Introduction The Health Information Technology for Economic and Clinical Health Act (HITECH) enacted as part of the American Recovery and Reinvestment Act (ARRA) of 2009 [1] describes the mandate for all medical and health-care providers to switch to a digital medical information system. This mandate will ensure that all medical records in the United States will be converted into electronic medical records (EMRs) by Health Information Exchange (HIE), an electronic mobilization of healthcare and medical information across an organization, community, or hospital system, enables doctors and nurses to easily pass medical records and other clinical information among health care information systems while maintaining the integrity of the information being moved. The overall goal of HIE is to speed up the access and retrieval of medical information so as to provide effective and efficient health care. By having patients EMRs stored in a database, doctors will be able to quickly bring up a patient s medical history, and through the use of complex algorithms detect that if a patient will encounter a certain medical condition based on their past medical history. However, with all the EMRs stored in a data base of some sort, data search could become remarkably time consuming. In order to make record retrieval as quick and efficient as possible, large-scale high-performance computing (HPC) cluster could be used to vastly improve the database query response time. Research has consistently shown that clusters comprising of Graphics processor units (GPU) can significantly outperform the same CPU [2][3] while reducing space, heat, power, and cooling requirements. This makes the GPU clusters much more favorable when compared to CPU clusters of similar cost. However, challenges involved in setting up such a system are: a) having the correct dataflow - making sure the correct information and data are being sent to the correct node/gpu/etc.; b) communication between nodes in a cluster making sure all nodes are in sync and processing the correct information as well as sending necessary information to and from International Journal of Advancements in Computing Technology(IJACT) Volume 7, Number 1, January

2 the head node; and c) making sure that the system is time and cost-efficient so that it maintains a favorable cost-performance ratio. In this paper, a high-performance computing application, expedited electronic medical record query (E2MRQ) on a GPU cluster for EMR databases being used in hospitals around the country is presented. This application would be capable of quickly and efficiently searching through the vast medical databases used by healthcare professionals. Monetary savings could be realized on the basis that timely availability of information to care providers would increase efficiency and accuracy in healthcare; expediting the processing of large data sets would also mean that health patterns could be recognized more quickly across broader groups of patients. 2. High-Performance Computing (HPC) and Cluster Architecture High-Performance Computing (HPC) has gained a lot of popularity in recent years. Its use of parallel processing and cluster computing for running computationally intensive applications more efficiently, reliably, and quickly has attracted scientists, engineers, teachers, and the military. With demand for higher processing speed and power constantly on the rise, HPC systems will soon garner the interest of businesses in every field. The overarching goal of HPC is to solve complex problems with more accuracy and higher speed by offloading the computation intensive tasks to a combination of specialized hardware accelerators such as Nvidia's Tesla Graphics Processor Units (GPUs) and Xilinx FPGAs, while still being exceptionally efficient. These specialized processors can be configured using NVIDIA Compute Unified Device Architecture (CUDA) and Hardware Description Languages (HDL) such as VHDL/Verilog respectively. GPUs have a large number of programmable cores (Single instruction, multiple data (SIMD) architecture), high on-chip memory bandwidth, support for floating-point operations, and ease of programmability using high-level languages and application programming interface (API). All these properties leverage the value of using GPU as a common off the-shelf (COTS) hardware accelerator. Currently, GPUs are used in wide array of applications such as medical imaging [4-6], audio and video processing [7][8], and data mining applications [9][10]. Figure 1(a). Dataflow block diagram of the GPU cluster HPC system utilizes parallel processing, cluster computing, or a combination of the two. Parallel processing is having one or more processing units (CPUs or GPUs) to process information while cluster computing involves including one or more computers and linking them together. The GPU cluster that was utilized for this application comprises of a head master node and 6 compute slave nodes (Figure 1; Table 1). Along with the nodes is a Mellanox InfiniScale IV IS5031 QDR 18-Port InfiniBand switch, IOGEAR GCL1808KITU 8-Port LCD combo KVM switch and APC SUA3000RMT2U 3000 VA 2700 Watts smart-ups in an APC AR U enclosure. Each node is a 2

3 SUPERMICRO SYS-7146GT-TRG 4U server that contains within it two Quad-Core Intel Xeon E5520 Nehalem processors (2.26 GHz, 8M cache), 12 GB DDR3 memory, and two NVIDIA Tesla C2050 GPUs (3 GB GDDR5). The motherboards for all the nodes support 4 full-bandwidth PCIe Gen 2 x16 slots, 2 PCIe Gen 2x4 slots, 1 PCIe Gen 1 x4 slot, and 2 PCI 33 MHz slots. Figure 1(b). GPU Cluster comprising of a head node and 6 compute nodes in an APC AR U enclosure Table 1. Components and Capability of the GPU Cluster SUPERMICRO SYS-7046GT-TRF 4U server Head and (OS: CentOS; motherboard supports: 4 full-bandwidth PCIe compute node Gen 2 x16 slot; 2 PCIe Gen 2 x4 slots; 1 PCIe Gen 1 x4 slot; systems and 2 PCI 33 MHz slots) CPU for nodes Quad-core Intel Xeon E5520 Nehalem processors (8M Cache; 2.26 GHz; 5.86 GT/s Intel QPI) CPUs per node 2 Head and compute node memory (GB) GPU for compute nodes GPUs per compute node 12 NVIDIA Tesla C2050 (448 cores; 1.15 GHz; Memory Bandwidth: 144 GB/sec; Dedicated Memory: 3GB GDDR5) 2 Interconnect Mellanox InfiniBand QDR (ConnectX-2 VPI InfiniBand) No: of compute nodes 6 3

4 2.1. Processors NVIDIA Tesla C2050 GPUs are utilized in all the nodes. These cards have a total of 14 multiprocessors, with each multiprocessor having a total of 32 CUDA cores, giving a grand total of 448 CUDA cores per GPU. Each one of these cores has a clock frequency of 1.15 GHz. The cards are capable of reaching up to 515 Gigaflops for double-precision floating point performance, and up to 1.03 Teraflops for single-precision floating point performance. Each card comes with 3 MB of GDDR5 memory at 1.5 GHz. The C2050 has a 384-bit memory interface and a memory bandwidth of 144 GB/s [11]. The CPUs used in both head and compute nodes include a pair of Intel Xeon E5520 Nehalem 2.26 GHz Quad-Core CPUs [12]. The processors have a QuickPath Interconnect (QPI) to the PCI Express interfaces at a rate of 5.86 GT/s. Table 2. Operating System and other Software On Cluster Operating System CentOS 5.5 GPU Software NVIDIA Developer Drivers Compute Software NVIDIA CUDA and SDK 4.2 InfiniBand Software Mellanox OFED MPI Software MVAPICH DBMS Software PostgreSQL Software (Table 2) NVIDIA CUDA is the software architecture that allows the utilization of the Tesla and other NVIDIA video cards for parallel processing by writing high level code languages such as Mellanox InfiniBand adapters, tools to connect the InfiniBand network and Message Passing Interface (MPI) software. MPI is the component of E2MRQ application that passes information from the head node to the computing nodes and vice versa. PostgreSQL is an object-oriented Database Management System (DBMS) that uses B-trees to store its elements. The data that E2MRQ will be processing will come from a PostgreSQL DBMS. Although PostgreSQL is installed on all of the nodes for code compilation, only the head node will contain the database, thus it will be the only node that will perform the necessary DBMS operations. 3. E2MRQ Application The E2MRQ application is designed to enhance database query performance by integrating GPUs into the toolset of DBMS. The GPUs are proven to be capable of handling massively parallel matrix manipulations. The basic process is as follows: the data set is pulled from its storage, conceptualized as matrices, and processed on the GPU. However with E2MRQ, the application will be pulling a large data set from a PostgreSQL DBMS and then distributing the data from the main node down into the compute nodes using MPI and the InfiniBand interface. The compute nodes will then pass on the data from the CPU to the GPU where it will be processed. Following that, the GPUs will pass the processed data back to the respective CPUs in the nodes, and then (using MPI and InfiniBand) each node will send its data back to the head node. 4

5 3.1. Process (Figure 2) The PostgreSQL database table is set up on the head node before processing. Since head node is the only one that carries the PostgreSQL database, E2MRQ retrieves the database and save it into that node s memory, but as a single-dimensional array. For GPU memory management, we had to generate tables on the host side (CPU side) as single-dimension arrays. However before that we need to use MPI to divide the information between the number of computing nodes that will be used for processing. For example, if four compute nodes Figure 2. Data Management Analysis will be used, then the head node will distribute the array, by MPI_Send(), into four equal-sized portions to each of the compute nodes. This means each compute node will have an array based on having the entire 1024 columns, but with 256 rows. After the data has been successfully passed to the nodes, the array will then be copied to the GPUs. To do this we utilize CUDA memory operation to copy it to the device (GPU) memory using cudamemcpy command. This happens only once at the beginning of the application. After this initial memory copy, all of the functions performed on the GPU are performed using the data that was copied to the device memory. A second memory copy is done from the device to the host memory after each function for further CPU analysis as well as for sending the data back to the head node. Some of the functions in E2MRQ application that retrieves data from a PostgreSQL database and process it across one or several GPUs includes: a) search for a key value in the database; b) sum vector (column or row); c) sort database based on a vector; and d) pattern search. 1) Key search within column The application begins by searching for a pre-defined key value on a column of the database. As the database has been divided between equal rows, each node will be searching on the same column of the database. A separate, equally sized, array will be allocated to store the results when searching through the array. When the key has been found on any cell, the result array will mark true on that same location. After the processing, the result array is copied to the host. 2) Column and Row Summation Summation of a column/row is frequently used to assess user counts, inventories, and location data. In this function, values within a column or row are added together returning the result in a dynamic floating point array data structure. It starts the process by dividing the array in half, adding each half and storing the result in the first half. The second half is discarded and the summation procedure is repeated until all of the values have been added to the first value of the array. When the summation is completed, the first compute node will retrieve the sum results from the other compute nodes using MPI and add the results to get the final result. 3) Radix Sort In this function, the portion of the database is loaded into the device memory and it is sorted using the Thrust library [13] in the CUDA architecture software. 5

6 4) Quick Sort In this function, the host CPU will perform the Quick Sort function on the array. This sorting procedure is done to compare the result of Radix Sort on the GPU. 5) Pattern Search During the execution of this function, database is loaded into the device memory and pattern search with key characteristics is conducted. The characteristics are searched one at a time; first characteristic is searched through the array and when found it is recorded; next characteristic is then searched in the adjacent index. This process will continue until all the characteristics are found. Table 3. Timing Results for Basic Integer Operations Timing (ms) Key search within column Column Summation Row Summation Radix Sort Quick Sort Pattern Search (w/five key characteristics) Table 4. Comparison of Timing Results for Integer Queries Integer Queries Transfer Time (ms) GPU time (ms) Total Time (ms) [14] E 2 MRQ Performance 1) Data Transfer and Timing The timing reflecting the data transfer of data elements and processing time from head node to compute node(s) is given in Figure 3. The time it takes for the head node to transfer data to the host RAM is 1.90 sec. Next, depending on the number of the nodes used in each configuration, the transfer time to the memory of the compute node varies from 3.19 ms for single node configuration to 3.66 ms for a four node configuration. The computation time required for pattern search with five key characteristics also varies based on the number of nodes in the configuration; the computation time ranges from 1.83 ms for a single node with two GPU s to 0.83 ms for a four node configuration with 8 GPU s. 2) Timing Performances for Basic Data Operations The timing results of basic operations of E2MRQ application on the cluster configuration with four nodes are given Table 3. Tests have conducted to obtain the time to search through the entire database comprising of data elements. Comparison of E2MRQ application with a similar project [14], with searching as its primary objective, is discussed next. The systems utilized in both projects are similar accept that in [14] only a single GPU is used for computation; whereas the cluster utilized for E2MRQ application had 4 compute nodes each with two GPUs. Both [14] and E2MRQ application copied the necessary data to the graphics 6

7 cards before the computations are executed. As integer query is the only comparison where the two projects truly line up; table 4 compares the timing results for E2MRQ application implemented on single GPU. As it can be observed E2MRQ application provided over 66% improvement in performance when processing the array. Figure 3. GPU cluster block diagram illustrating the data flow and timing 4. Conclusion Performance measurement of GPU based HPC cluster utilized for an electronic medical record database query application was presented; timing for data transfer within the cluster and timing performances for basic data operations were evaluated. Based on the number of compute nodes utilized to execute a query, the data transfer time and computation time required for pattern search with five key characteristics improved. Furthermore E2MRQ application demonstrated over 66% improvement in timing compared to a similar project that involved integer query. 7

8 5. Future Work This application can be further expanded upon by: a) allowing it to work on any number of nodes; currently, the application only runs on a number of nodes that is equal to a power of 2; b) processing queries with an FPGA to see if there are any performance benefits. 6. References [1] American Medical Software: [2] J. Canny and H. Zhao, Big data analytics with small footprint: squaring the cloud, Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, August 2013 [3] M. C. Altinigneli, C. Plant, and C. Böhm, Massively parallel expectation maximization using graphics processing units Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, August [4] T. Idzenga, E. Gaburov, W. Vermin, J. Menssen, and C. De Korte, Fast 2-D ultrasound strain imaging: the benefits of using a GPU, IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, Vol. 61, no. 1, pp , [5] M. Tanter and M. Fink, Ultrafast imaging in biomedical ultrasound, IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, vol. 61, no. 1, pp , [6] J.R. Ferreira, M. Costa Oliveira and A. Lage Freitas, Performance Evaluation of Medical Image Similarity Analysis in a Heterogeneous Architecture, IEEE 27th International Symposium on Computer-Based Medical Systems, pp , [7] S. Momcilovic, A. Ilic, N. Roma and L. Sousa, Dynamic Load Balancing for Real-Time Video Encoding on Heterogeneous CPU+GPU Systems, IEEE Transactions on Multimedia, vol. 16, no. 1, [8] J.A. Belloch, B. Bank, L. Savioja, A. Gonzalez and V. Valimaki, Multi-channel IIR filtering of audio signals using a GPU, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, pp , [9] B. Liu, C. Yu, D.Z. Wang, Cheung, R.C.C. Hong Yan, Design Exploration of Geometric Biclustering for Microarray Data Analysis in Data Mining, IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 10, pp , [10] M. Hyun Jo and W.W. Ro, DPM: Data Partitioning Method for pipelined MapReduce on GPU, 18th IEEE International Symposium on Consumer Electronics, pp. 1-3, [11] NVIDIA Tesla 2050 GPU: [12] Intel Xeon 5520 CPU: GTs-Intel-QPI [13] Thrust Library: [14] P. Bakkum and K. Skadron, Accelerating SQL Database Operations on a GPU with CUDA, GPGPU Conference,

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Technology brief Introduction... 2 GPU-based computing... 2 ProLiant SL390s GPU-enabled architecture... 2 Optimizing

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Seeking Opportunities for Hardware Acceleration in Big Data Analytics Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who

More information

CFD Implementation with In-Socket FPGA Accelerators

CFD Implementation with In-Socket FPGA Accelerators CFD Implementation with In-Socket FPGA Accelerators Ivan Gonzalez UAM Team at DOVRES FuSim-E Programme Symposium: CFD on Future Architectures C 2 A 2 S 2 E DLR Braunschweig 14 th -15 th October 2009 Outline

More information

Accelerating CFD using OpenFOAM with GPUs

Accelerating CFD using OpenFOAM with GPUs Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide

More information

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents

More information

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management Enhancing Cloud-based Servers by GPU/CPU Virtualiz Management Tin-Yu Wu 1, Wei-Tsong Lee 2, Chien-Yu Duan 2 Department of Computer Science and Inform Engineering, Nal Ilan University, Taiwan, ROC 1 Department

More information

LS DYNA Performance Benchmarks and Profiling. January 2009

LS DYNA Performance Benchmarks and Profiling. January 2009 LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The

More information

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,

More information

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance

More information

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER Tender Notice No. 3/2014-15 dated 29.12.2014 (IIT/CE/ENQ/COM/HPC/2014-15/569) Tender Submission Deadline Last date for submission of sealed bids is extended

More information

HP ProLiant SL270s Gen8 Server. Evaluation Report

HP ProLiant SL270s Gen8 Server. Evaluation Report HP ProLiant SL270s Gen8 Server Evaluation Report Thomas Schoenemeyer, Hussein Harake and Daniel Peter Swiss National Supercomputing Centre (CSCS), Lugano Institute of Geophysics, ETH Zürich schoenemeyer@cscs.ch

More information

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

~ Greetings from WSU CAPPLab ~

~ Greetings from WSU CAPPLab ~ ~ Greetings from WSU CAPPLab ~ Multicore with SMT/GPGPU provides the ultimate performance; at WSU CAPPLab, we can help! Dr. Abu Asaduzzaman, Assistant Professor and Director Wichita State University (WSU)

More information

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.

More information

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 g_suhakaran@vssc.gov.in THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Can High-Performance Interconnects Benefit Memcached and Hadoop? Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,

More information

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25

FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25 FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25 December 2014 FPGAs in the news» Catapult» Accelerate BING» 2x search acceleration:» ½ the number of servers»

More information

Parallel Computing with MATLAB

Parallel Computing with MATLAB Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of

More information

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures 11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the

More information

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS) PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters from One Stop Systems (OSS) PCIe Over Cable PCIe provides greater performance 8 7 6 5 GBytes/s 4

More information

Pedraforca: ARM + GPU prototype

Pedraforca: ARM + GPU prototype www.bsc.es Pedraforca: ARM + GPU prototype Filippo Mantovani Workshop on exascale and PRACE prototypes Barcelona, 20 May 2014 Overview Goals: Test the performance, scalability, and energy efficiency of

More information

Evaluation of CUDA Fortran for the CFD code Strukti

Evaluation of CUDA Fortran for the CFD code Strukti Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center

More information

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age Xuan Shi GRA: Bowei Xue University of Arkansas Spatiotemporal Modeling of Human Dynamics

More information

RDMA over Ethernet - A Preliminary Study

RDMA over Ethernet - A Preliminary Study RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Outline Introduction Problem Statement

More information

FLOW-3D Performance Benchmark and Profiling. September 2012

FLOW-3D Performance Benchmark and Profiling. September 2012 FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute

More information

FPGA-based Multithreading for In-Memory Hash Joins

FPGA-based Multithreading for In-Memory Hash Joins FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded

More information

ECLIPSE Performance Benchmarks and Profiling. January 2009

ECLIPSE Performance Benchmarks and Profiling. January 2009 ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster

More information

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 Direct Increased Performance, Scaling and Resiliency July 2012 Motti Beck, Director, Enterprise Market Development Motti@mellanox.com

More information

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:

More information

FPGA-based MapReduce Framework for Machine Learning

FPGA-based MapReduce Framework for Machine Learning FPGA-based MapReduce Framework for Machine Learning Bo WANG 1, Yi SHAN 1, Jing YAN 2, Yu WANG 1, Ningyi XU 2, Huangzhong YANG 1 1 Department of Electronic Engineering Tsinghua University, Beijing, China

More information

GPGPU accelerated Computational Fluid Dynamics

GPGPU accelerated Computational Fluid Dynamics t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers Information Technology Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers Effective for FY2016 Purpose This document summarizes High Performance Computing

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and

More information

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket

More information

How To Build An Ark Processor With An Nvidia Gpu And An African Processor

How To Build An Ark Processor With An Nvidia Gpu And An African Processor Project Denver Processor to Usher in a New Era of Computing Bill Dally January 5, 2011 http://blogs.nvidia.com/2011/01/project-denver-processor-to-usher-in-new-era-of-computing/ Project Denver Announced

More information

www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING GPU COMPUTING VISUALISATION XENON Accelerating Exploration Mineral, oil and gas exploration is an expensive and challenging

More information

Mellanox Academy Online Training (E-learning)

Mellanox Academy Online Training (E-learning) Mellanox Academy Online Training (E-learning) 2013-2014 30 P age Mellanox offers a variety of training methods and learning solutions for instructor-led training classes and remote online learning (e-learning),

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it t.diamanti@cineca.it Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate

More information

Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions

Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions 64% of organizations were investing or planning to invest on Big Data technology

More information

Accelerate Cloud Computing with the Xilinx Zynq SoC

Accelerate Cloud Computing with the Xilinx Zynq SoC X C E L L E N C E I N N E W A P P L I C AT I O N S Accelerate Cloud Computing with the Xilinx Zynq SoC A novel reconfigurable hardware accelerator speeds the processing of applications based on the MapReduce

More information

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices WS on Models, Algorithms and Methodologies for Hierarchical Parallelism in new HPC Systems The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

Retargeting PLAPACK to Clusters with Hardware Accelerators

Retargeting PLAPACK to Clusters with Hardware Accelerators Retargeting PLAPACK to Clusters with Hardware Accelerators Manuel Fogué 1 Francisco Igual 1 Enrique S. Quintana-Ortí 1 Robert van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores.

More information

Cloud Data Center Acceleration 2015

Cloud Data Center Acceleration 2015 Cloud Data Center Acceleration 2015 Agenda! Computer & Storage Trends! Server and Storage System - Memory and Homogenous Architecture - Direct Attachment! Memory Trends! Acceleration Introduction! FPGA

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

Introduction to GPU Computing

Introduction to GPU Computing Matthis Hauschild Universität Hamburg Fakultät für Mathematik, Informatik und Naturwissenschaften Technische Aspekte Multimodaler Systeme December 4, 2014 M. Hauschild - 1 Table of Contents 1. Architecture

More information

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get

More information

Intel Xeon Processor E5-2600

Intel Xeon Processor E5-2600 Intel Xeon Processor E5-2600 Best combination of performance, power efficiency, and cost. Platform Microarchitecture Processor Socket Chipset Intel Xeon E5 Series Processors and the Intel C600 Chipset

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads Gen9 Servers give more performance per dollar for your investment. Executive Summary Information Technology (IT) organizations face increasing

More information

Main Memory Data Warehouses

Main Memory Data Warehouses Main Memory Data Warehouses Robert Wrembel Poznan University of Technology Institute of Computing Science Robert.Wrembel@cs.put.poznan.pl www.cs.put.poznan.pl/rwrembel Lecture outline Teradata Data Warehouse

More information

Embedded Systems: map to FPGA, GPU, CPU?

Embedded Systems: map to FPGA, GPU, CPU? Embedded Systems: map to FPGA, GPU, CPU? Jos van Eijndhoven jos@vectorfabrics.com Bits&Chips Embedded systems Nov 7, 2013 # of transistors Moore s law versus Amdahl s law Computational Capacity Hardware

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Overview of HPC Resources at Vanderbilt

Overview of HPC Resources at Vanderbilt Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources

More information

Oracle Database In-Memory The Next Big Thing

Oracle Database In-Memory The Next Big Thing Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes

More information

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs A general-purpose virtualization service for HPC on cloud computing: an application to GPUs R.Montella, G.Coviello, G.Giunta* G. Laccetti #, F. Isaila, J. Garcia Blas *Department of Applied Science University

More information

Resource Scheduling Best Practice in Hybrid Clusters

Resource Scheduling Best Practice in Hybrid Clusters Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Resource Scheduling Best Practice in Hybrid Clusters C. Cavazzoni a, A. Federico b, D. Galetti a, G. Morelli b, A. Pieretti

More information

SGI High Performance Computing

SGI High Performance Computing SGI High Performance Computing Accelerate time to discovery, innovation, and profitability 2014 SGI SGI Company Proprietary 1 Typical Use Cases for SGI HPC Products Large scale-out, distributed memory

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

Designed for Maximum Accelerator Performance

Designed for Maximum Accelerator Performance Designed for Maximum Accelerator Performance A dense, accelerated cluster supercomputer that offers 250 teraflops in one rack. This power- and space-efficient system can be combined with Cray s optimized

More information

ultra fast SOM using CUDA

ultra fast SOM using CUDA ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A

More information

2009 Oracle Corporation 1

2009 Oracle Corporation 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material,

More information

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism Jianqiang Dong, Fei Wang and Bo Yuan Intelligent Computing Lab, Division of Informatics Graduate School at Shenzhen,

More information

Overview of HPC systems and software available within

Overview of HPC systems and software available within Overview of HPC systems and software available within Overview Available HPC Systems Ba Cy-Tera Available Visualization Facilities Software Environments HPC System at Bibliotheca Alexandrina SUN cluster

More information

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Building a Top500-class Supercomputing Cluster at LNS-BUAP Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad

More information

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Technical white paper HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Scale-up your Microsoft SQL Server environment to new heights Table of contents Executive summary... 2 Introduction...

More information

The Foundation for Better Business Intelligence

The Foundation for Better Business Intelligence Product Brief Intel Xeon Processor E7-8800/4800/2800 v2 Product Families Data Center The Foundation for Big data is changing the way organizations make business decisions. To transform petabytes of data

More information

GPU Parallel Computing Architecture and CUDA Programming Model

GPU Parallel Computing Architecture and CUDA Programming Model GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel

More information

Advancing Applications Performance With InfiniBand

Advancing Applications Performance With InfiniBand Advancing Applications Performance With InfiniBand Pak Lui, Application Performance Manager September 12, 2013 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server and

More information

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs Optimizing a 3D-FWT code in a cluster of CPUs+GPUs Gregorio Bernabé Javier Cuenca Domingo Giménez Universidad de Murcia Scientific Computing and Parallel Programming Group XXIX Simposium Nacional de la

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:

More information

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework

More information

Scaling from Workstation to Cluster for Compute-Intensive Applications

Scaling from Workstation to Cluster for Compute-Intensive Applications Cluster Transition Guide: Scaling from Workstation to Cluster for Compute-Intensive Applications IN THIS GUIDE: The Why: Proven Performance Gains On Cluster Vs. Workstation The What: Recommended Reference

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

NVIDIA GPUs in the Cloud

NVIDIA GPUs in the Cloud NVIDIA GPUs in the Cloud 4 EVOLVING CLOUD REQUIREMENTS On premises Off premises Hybrid Cloud Connecting clouds New workloads Components to disrupt 5 GLOBAL CLOUD PLATFORM Unified architecture enabled by

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

NVIDIA GeForce GTX 580 GPU Datasheet

NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines

More information

1 DCSC/AU: HUGE. DeIC Sekretariat 2013-03-12/RB. Bilag 1. DeIC (DCSC) Scientific Computing Installations

1 DCSC/AU: HUGE. DeIC Sekretariat 2013-03-12/RB. Bilag 1. DeIC (DCSC) Scientific Computing Installations Bilag 1 2013-03-12/RB DeIC (DCSC) Scientific Computing Installations DeIC, previously DCSC, currently has a number of scientific computing installations, distributed at five regional operating centres.

More information

Parallel Firewalls on General-Purpose Graphics Processing Units

Parallel Firewalls on General-Purpose Graphics Processing Units Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering

More information

SMB Direct for SQL Server and Private Cloud

SMB Direct for SQL Server and Private Cloud SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server

More information

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband A P P R O I N T E R N A T I O N A L I N C Steve Lyness Vice President, HPC Solutions Engineering slyness@appro.com Company Overview

More information

Infrastructure Matters: POWER8 vs. Xeon x86

Infrastructure Matters: POWER8 vs. Xeon x86 Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report

More information

Xeon+FPGA Platform for the Data Center

Xeon+FPGA Platform for the Data Center Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system

More information

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com Best Practises for LabVIEW FPGA Design Flow 1 Agenda Overall Application Design Flow Host, Real-Time and FPGA LabVIEW FPGA Architecture Development FPGA Design Flow Common FPGA Architectures Testing and

More information

A Powerful solution for next generation Pcs

A Powerful solution for next generation Pcs Product Brief 6th Generation Intel Core Desktop Processors i7-6700k and i5-6600k 6th Generation Intel Core Desktop Processors i7-6700k and i5-6600k A Powerful solution for next generation Pcs Looking for

More information

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES 1 MYOUNGJIN KIM, 2 CUI YUN, 3 SEUNGHO HAN, 4 HANKU LEE 1,2,3,4 Department of Internet & Multimedia Engineering,

More information

Vocera Voice 4.3 and 4.4 Server Sizing Matrix

Vocera Voice 4.3 and 4.4 Server Sizing Matrix Vocera Voice 4.3 and 4.4 Server Sizing Matrix Vocera Server Recommended Configuration Guidelines Maximum Simultaneous Users 450 5,000 Sites Single Site or Multiple Sites Requires Multiple Sites Entities

More information

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Copyright 2013, Oracle and/or its affiliates. All rights reserved. 1 Oracle SPARC Server for Enterprise Computing Dr. Heiner Bauch Senior Account Architect 19. April 2013 2 The following is intended to outline our general product direction. It is intended for information

More information