Evaluation of Java Message Passing in High Performance Data Analytics
|
|
|
- Chester Griffith
- 10 years ago
- Views:
Transcription
1 Evaluation of Java Message Passing in High Performance Data Analytics Saliya Ekanayake, Geoffrey Fox School of Informatics and Computing Indiana University Bloomington, Indiana, USA {sekanaya, Abstract In the last few years, Java gain popularity in processing big data mostly with Apache big data stack a collection of open source frameworks dealing with abundant data, which includes several popular systems such as Hadoop, Hadoop Distributed File System (HDFS), and Spark. Efforts have been made to introduce Java to High Performance Computing (HPC) as well in the past, but were not embraced by the community due to performance concerns. However, with continuous improvements in Java performance an increasing interest has been placed on Java message passing support in HPC. We support this idea and show its feasibility in solving real world data analytics problems. Index Terms Message passing, Java, HPC M I. INTRODUCTION ESSAGE passing has been the de-facto model in realizing distributed memory parallelism [] where Message Passing Interface (MPI) with its implementations such as MPICH [] and OpenMPI [] have been successful in producing high performance applications []. We use message passing with threads [] in our data analytics applications on Windows High Performance Computing (HPC) environments [6] using [7] and Microsoft s Task Parallel Library (TPL) [8]. However, the number of available Windows HPC systems are limited and our attempts to run these on traditional Linux based HPC clusters using Mono a cross platform.net development framework have been unsuccessful due to poor performance. Therefore, we decided to migrate our applications to Java for reasons ) productivity offered by Java and its ecosystem; and ) emerging success of Java in HPC [9]. In this paper, we present our experience in evaluating the performance of our parallel deterministic annealing clustering program (DAVS) [] and micro benchmark results for two Java message passing frameworks OpenMPI with Java binding and FastMPJ [] compared against native OpenMPI and. The results show clear improvement of application performance over C# with and near native performance in micro benchmarks. II. RELATED WORK Message passing support for Java can be classified as pure Java implementations or Java bindings for existing native MPI libraries (i.e. wrapper implementations). Pure Java implementations advocate portability, but may not be as efficient as Java bindings that call native MPI (see III.C and []). There are two proposed Application Programming Interfaces (API) for Java message passing mpijava. [] and Java Grande Forum (JGF) Message Passing interface for Java (MPJ) []. However, there are implementations that follow custom API as well [9]. Performance of Java MPI support has been studied with different implementations and recently in [, ]. The focus of these studies is to evaluate MPI kernel operations and little or no information given on applying Java MPI to scientific applications. However, there is an increasing interest [] on using MPI with large scale data analytics frameworks such as Apache Hadoop [6]. MR+ [7] is a framework with similar intent, which allows Hadoop MapReduce [8] programs to run on any cluster under any resource manager while providing capabilities of MPI as well. III. TECHNICAL EVALUATION The DAVS code is about k lines of C# code and to evaluate performance on Java we used a combination of commercially available code converter [9] and carefully inspected manual rewrites to port the C# code to Java. Furthermore, we performed a series of serial and parallel tests to confirm correctness is preserved during the migration, prior to evaluating performance. Our interest in this experiment was to compare application performance when run on Linux based HPC clusters against results on Windows HPC environments. We noticed from initial runs that two of the MPI operations allreduce, and send and receive contribute to the most of inter-process communication. Therefore, we extended the evaluation by performing micro benchmarks for these, which further supported our choice to use Java.
2 B 6B 6B 6B KB KB 6KB 6KB 6KB MB MB Average time (us) Input: maxmsgsize // maximum message size in bytes msgsizel = 89 // messages to be considered large numitr = // iterations for small messages numitrl = //iterations for large messages skip = // skip this many for small messages skipl = // skip this many for large messages comm = MPI_COMM_WORLD me = MPI_Comm_rank (comm) size = MPI_Comm_size (comm) sbuff[maxmsgsize/] // float array initialized to. rbuff[maxmsgsize/] // float array initialized to. For i = to i maxmsgsize/ If i > msgsizel skip = skipl numitr = numitrl duration =. For j = to j < numitr + skip t = MPI_Wtime ( ) MPI_Allreduce (sbuff,rbuff,i MPI_SUM,comm) If j skip duration+=mpi_wtime ( ) t j = j + latency = duration/numitr avgtime = MPI_Reduce (latency, MPI_SUM,) If me == Print (i, avgtime) i = i A. Computer Systems Fig. Pseudo code for allreduce benchmark We used two Indiana University clusters, Madrid and Tempest, and one FutureGrid [] cluster India, as described below. Tempest: nodes, each has Intel Xeon E7 CPUs at.ghz with 6 cores, totaling cores per node; 8 GB node memory and Gbps Infiniband (IB) network connection. It runs Windows Server 8 R HPC Edition version 6. (Build 76: Service Pack ). Madrid: 8 nodes, each has AMD Opteron 86 at.ghz with cores, totaling 6 cores per node; 6GB node memory and Gbps Ethernet network connection. It runs Red Hat Enterprise Linux Server release 6. FutureGrid (India): 8 nodes, each has Intel Xeon X CPUs at.66ghz with cores, totaling 8 cores per node; GB node memory and Gbps IB network connection. It runs Red Hat Enterprise Linux Server release.. B. Software Environments We used.net. runtime and.. for C# based tests. DAVS Java version uses a novel parallel tasks library called Habanero Java (HJ) library from Rice University [, ], which requires Java 8. Therefore, we used an early access (EA) release build.8.-ea-b8. Early access releases may not be well optimized, but doing a quick test with threading switched off we could confirm DAVS performs equally well on stable Java 7 and Java 8 EA. There have been several message passing frameworks for Java [], but due to the lack of support for IB network and to other drawbacks discussed in [], we decided to evaluate OpenMPI with its Java binding and FastMPJ, which is a pure Java implementation of mpijava. [] specification. OpenMPI s Java binding [] is an adaptation from the original mpijava library []. However, OpenMPI community has recently introduced major changes to its API, and internals, especially removing MPI.OBJECT type and adding support for direct buffers in Java. These changes happened while we were evaluating DAVS, thus we tested OpenMPI Java binding in one of its original (nightly snapshot version.9ar888) and updated forms (source tree revision ). We will refer to these as and for simplicity. C. MPI Micro Benchmarks We based our experiments on Ohio MicroBenchmark (OMB) suite [6], which is intended to test native MPI implementations performance. Therefore, we implemented the selected allreduce, and send and receive tests in all three Java MPI flavors and, in order to test Java and C# MPI implementations. Fig. presents the pseudo code for OMB allreduce test. Note the syntax of MPI operations do not adhere to a particular language and the number of actual parameters are cut short for clarity. Also, depending on the language and MPI framework used, the implementation details such as data structures used and buffer allocation were different from one another. C# in Tempest FastMPJ Java in FG Java FG Java FG C FG C FG Message size (bytes) Fig.. Performance of MPI allreduce operation
3 B 6B 6B 6B KB KB 6KB 6KB 6KB MB MB Average Time (us) B B 8B B 8B B KB 8KB KB 8KB KB Average time (us) Fig. shows the results of allreduce benchmark for different MPI implementations with message size ranging from bytes (B) to 8 megabytes (MB). These are averaged values over patterns xx8, xx8, and xx8 where pattern format is number of threads per process x number of processes per node x number of nodes (i.e. ). The best performance came with C versions of OpenMPI, but interestingly Java performance overlaps on these indicating its near zero overhead. The older, Java performance is near as well, but shows more overhead than its successor. FastMPJ performance is better than, but slower than OpenMPI versions. The slowest performance came with, which may be improved with further tuning, but as our focus was to evaluate Java versions we did not proceed in this direction. We experienced a similar pattern with MPI send and receive benchmark (Fig. ) as shown in Fig.. Input: maxmsgsize // maximum message size in bytes msgsizel = 89 // messages to be considered large numitr = // iterations for small messages numitrl = //iterations for large messages skip = // skip this many for small messages skipl = // skip this many for large messages comm = MPI_COMM_WORLD me = MPI_Comm_rank (comm) size = MPI_Comm_size (comm) sbuff[maxmsgsize] // byte array initialized to. rbuff[maxmsgsize] // byte array initialized to. For i = to i maxmsgsize If i > msgsizel skip = skipl numitr = numitrl duration =. If me == For j = to j < numitr + skip If j == skip t = MPI_Wtime ( ) MPI_Send (sbuff, i,,) MPI_Recv (rbuff, i,,) duration+=mpi_wtime ( ) t Else If me == For j = to j < numitr + skip MPI_Recv (rbuff, i,,) MPI_Send srbuff, i,,) If me == latency = duration/ numitr Print (i, latency) i = i ==? i Fig. Pseudo code for send and receive benchmark Note message sizes were from B to MB for this test, where B time corresponds to average latency. We performed tests in Fig. and Fig. on Tempest and FutreGrid-India as these were the two clusters with Infiniband interconnect. However, after deciding as the best performer, we conducted these two benchmarks on Madrid as well to compare performance against usual Ethernet connection. Fig. and Fig. 6 show the clear advantage of using Infiniband with performance well above times for smaller messages and around -8 times for larger messages compared to Ethernet. D. Application Performance with MPI We decided to apply DAVS to the same peak-matching problem [] that its C# variant used to solve, so we could verify accuracy and compare performance. We performed clustering of the LC-MS data [] in two modes Charge and Charge where the former processed 6 points and found on average.k clusters. Charge mode handled 677 points producing an average of 8k clusters. These modes exercises different execution flows in DAVS where Charge is more intense in both computation and communication than Charge. DAVS supports threading too which, we have studied separately in section E. Also note tests based on OMPI Java versions were done on FutureGrid-India and C# on Tempest in following figures. C# in Tempest FastMPJ Java in FG Java FG Java FG C FG Message size (bytes) Fig. Performance of MPI send and receive operations C Madrid Java Madrid C FG Java FG Message Size (bytes) Fig. Performance of MPI allreduce on Infiniband and Ethernet
4 Speedup B B 8B B 8B B KB 8KB KB 8KB KB Average Time (us) Speedup C Madrid Java Madrid C FG Java FG 6 xx xx xx xx xx xx8 xx xx Fig. DAVS Charge speedup Message Size (bytes) Fig. 6 Performance of MPI send and receive on Infiniband and Ethernet xx xx xx xx xx xx8 xx xx x8x Fig. 7 DAVS Charge performance xx xx xx xx xx xx8 xx xx x8x Fig. 8 DAVS Charge speedup Fig. 7 and Fig. 8 show Charge performance and speedup under different MPI libraries. happens to give the best performance and it achieves nearly double the performance in all cases compared to. Charge performance and speedup show similar results as given in Fig. 9 and Fig.. Note we could not test pattern x8x due to insufficient memory. E. Application Performance with Threads DAVS has the option to utilize threads for its pleasingly parallel shared memory computations. These code segments follow a fork-join style forall loops, which are well supported in HJ library. It is worth noting that DAVS semantics for parallelism with threads and message passing are different such that they complement rather replace one another. Therefore, comparing performance of N-way MPI processes versus N-way threads does not make sense. Instead, what is studied in this paper is the performance variance with threads for different number of MPI processes. xx8 xx8 8xx8 xx8 xx8 xx8 xx8 x8x8 Fig. DAVS Charge performance with threads.... xx xx xx xx xx xx8 xx xx Fig. 9 DAVS Charge performance xx8 xx8 8xx8 xx8 xx8 xx8 xx8 Fig. DAVS Charge performance with threads
5 Time (s) Time (s) Fig. and Fig. show the performance of DAVS application with threads in its Charge and Charge modes respectively. We encountered intermittent errors with OMPItrunk when running 8 processes per node and are currently working for a resolution with help from OpenMPI community. Overall and versions show similar performance and are about two times better than. OpenMPI is continuously improving its process binding support, which affects the behavior of threads. Therefore, we still need to perform additional testing to fully comprehend the behavior with threads. F. Single Node Application Performance While it is important to understand behavior with MPI and threads on multiple nodes, it is essential to study the single node performance of DAVS in different environments as a baseline..... xx Fig. DAVS Charge performance on single node Madrid FG Tempest xx Fig. DAVS Charge6 performance on single node 8 6 Madrid FG Tempest Madrid FG Tempest xx Fig. DAVS Charge6 performance on single node with multiple processes Fig. shows Charge serial performance on Madrid, FutureGrid-India, and Tempest. Note, we have shown performance with only for Java cases as this was the best performer and xx does not use any MPI. Fig. shows similar performance for a different mode Charge6 in DAVS, again run as xx. Madrid used to be a Windows HPC cluster and we had results from an earlier run for Charge6, which tested performance of xx. We compare that with other clusters and MPI frameworks in Fig.. SUMMARY Scientific applications written in C, C++, and Fortran have embraced MPI since its inception and various attempts have been made over the years to establish this relationship for applications written in Java. However, only few implementations such as OpenMPI and FastMPJ are in active development with support for fast interconnect systems. OpenMPI in particular has recently introduced improvements to its Java binding to close the gap between Java and native performance. The kernel benchmarks we performed agree with this and depicts latest OpenMPI Java binding as the best among selected Java MPI implementations. Our aim of this effort has been to migrate existing C# based code to Java in hope of running on traditional HPC clusters while utilizing the rich programming environment of Java. The initial runs of DAVS show promising performance and we expect to complement this work by adding support for threads in near future. ACKNOWLEDGEMENT We like to express our sincere gratitude to Prof. Vivek Sarkar and his team at Rice University for giving us access and continuous support for HJ library, which made it possible to incorporate threads in DAVS. We are equally thankful to Prof. Guillermo López Taboada for giving us free unrestricted access to commercially available FastMPJ MPI library. We are also thankful to FutureGrid project and its support team for their support with HPC systems. REFERENCES [] LUSK, E. and YELICK, K. LANGUAGES FOR HIGH-PRODUCTIVITY COMPUTING: THE DARPA HPCS LANGUAGE PROJECT. Parallel Processing Letters, 7, 7), 89-. [] Laboratory, A. N. MPICH,. City. [] Gabriel, E., Fagg, G., Bosilca, G., Angskun, T., Dongarra, J., Squyres, J., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R., Daniel, D., Graham, R. and Woodall, T. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation. Springer Berlin Heidelberg, City,. [] Gropp, W. Learning from the Success of MPI. In Proceedings of the Proceedings of the 8th International Conference on High Performance Computing (). Springer-Verlag, [insert City of Publication],[insert of Publication]. [] Ekanayake, S. Survey on High Productivity Computing Systems (HPCS) Languages. Pervasive Technology Institute, Indiana University, Bloomington,. [6] Qiu, J., Beason, S., Bae, S.-H., Ekanayake, S. and Fox, G. Performance of Windows Multicore Systems on Threading and MPI. In Proceedings of the Proceedings of the th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (). IEEE Computer Society, [insert City of Publication],[insert of Publication]. [7] Gregor, D. and Lumsdaine, A. Design and implementation of a highperformance MPI for C\# and the common language infrastructure. In Proceedings of the Proceedings of the th ACM SIGPLAN Symposium on Principles and practice of parallel programming (Salt Lake City, UT, USA, 8). ACM, [insert City of Publication],[insert 8 of Publication]. [8] Daan Leijen, J. H. Optimize Managed Code For Multi-Core Machines. City.
6 [9] Taboada, G. L., Touri, J., #, Ram, # and Doallo, n. Java for high performance computing: assessment of current research and practice. In Proceedings of the Proceedings of the 7th International Conference on Principles and Practice of Programming in Java (Calgary, Alberta, Canada, 9). ACM, [insert City of Publication],[insert 9 of Publication]. [] Fox, G., Mani, D. R. and Pyne, S. Parallel deterministic annealing clustering and its application to LC-MS data analysis. IEEE, City,. [] Expósito, R., Ramos, S., Taboada, G., Touriño, J. and Doallo, R. FastMPJ: a scalable and efficient Java message-passing library. Cluster Computing(//6 ), -. [] Bryan Carpenter, G. F., Sung-Hoon Ko and Sang Lim. mpijava.: API Specification [] Carpenter, B., Getov, V., Judd, G., Skjellum, A. and Fox, G. MPJ: MPIlike message passing for Java. Concurrency: Practice and Experience,, ), 9-8. [] Taboada, G. L., Tourino, J. and Doallo, R. Performance analysis of Java message-passing libraries on fast Ethernet, Myrinet and SCI clusters. City,. [] Squyres, J. Resurrecting MPI and Java. City,. [6] Tom White Hadoop: The Definitive Guide. Yahoo Press; Second Edition edition,. [7] Ralph H. Castain, W. T. MR+ A Technical Overview.. [8] Dean, J. and Ghemawat, S. MapReduce: Simplified Data Processing on Large Clusters. Sixth Symposium on Operating Systems Design and Implementation), 7-. [9] Inc., T. S. S. C# to Java Converter. City. [] von Laszewski, G., Fox, G. C., Fugang, W., Younge, A. J., Kulshrestha, A., Pike, G. G., Smith, W., Vo, x, ckler, J., Figueiredo, R. J., Fortes, J. and Keahey, K. Design of the FutureGrid experiment management framework. City,. [] Cav, V., #, Zhao, J., Shirako, J. and Sarkar, V. Habanero-Java: the new adventures of old X. In Proceedings of the Proceedings of the 9th International Conference on Principles and Practice of Programming in Java (Kongens Lyngby, Denmark, ). ACM, [insert City of Publication],[insert of Publication]. [] Sarkar, V. a. I., Shams Mahmood HJ Library. City. [] Taboada, G. L., Ramos, S., Exp, R. R., #, sito, Touri, J., #, Ram, # and Doallo, n. Java in the High Performance Computing arena: Research, practice and experience. Sci. Comput. Program., 78, ), -. [] Project, T. O. M. FAQ: Where did the Java interface come from?, City. [] Baker, M., Carpenter, B., Fox, G., Hoon Ko, S. and Lim, S. mpijava: An object-oriented java interface to MPI. Springer Berlin Heidelberg, City, 999. [6] Laboratory, T. O. S. U. s. N.-B. C. and (NBCL) OMB (OSU Micro- Benchmarks). City. 6
HPC ABDS: The Case for an Integrating Apache Big Data Stack
HPC ABDS: The Case for an Integrating Apache Big Data Stack with HPC 1st JTC 1 SGBD Meeting SDSC San Diego March 19 2014 Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox [email protected] http://www.infomall.org
A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks
A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks Xiaoyi Lu, Md. Wasi- ur- Rahman, Nusrat Islam, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng Laboratory Department
PMI: A Scalable Parallel Process-Management Interface for Extreme-Scale Systems
PMI: A Scalable Parallel Process-Management Interface for Extreme-Scale Systems Pavan Balaji 1, Darius Buntinas 1, David Goodell 1, William Gropp 2, Jayesh Krishna 1, Ewing Lusk 1, and Rajeev Thakur 1
MPJ Express Meets YARN: Towards Java HPC on Hadoop Systems
Procedia Computer Science Volume 51, 2015, Pages 2678 2682 ICCS 2015 International Conference On Computational Science MPJ Express Meets YARN: Towards Java HPC on Hadoop Systems Hamza Zafar 1, Farrukh
Overlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm [email protected] [email protected] Department of Computer Science Department of Electrical and Computer
RDMA over Ethernet - A Preliminary Study
RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Outline Introduction Problem Statement
Key words: cloud computing, cluster computing, virtualization, hypervisor, performance evaluation
Hypervisors Performance Evaluation with Help of HPC Challenge Benchmarks Reza Bakhshayeshi; [email protected] Mohammad Kazem Akbari; [email protected] Morteza Sargolzaei Javan; [email protected]
Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
Can High-Performance Interconnects Benefit Memcached and Hadoop?
Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
Parallel Computing. Benson Muite. [email protected] http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite [email protected] http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
benchmarking Amazon EC2 for high-performance scientific computing
Edward Walker benchmarking Amazon EC2 for high-performance scientific computing Edward Walker is a Research Scientist with the Texas Advanced Computing Center at the University of Texas at Austin. He received
Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer
Res. Lett. Inf. Math. Sci., 2003, Vol.5, pp 1-10 Available online at http://iims.massey.ac.nz/research/letters/ 1 Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer
Scalable Cloud Computing Solutions for Next Generation Sequencing Data
Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of
LS DYNA Performance Benchmarks and Profiling. January 2009
LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The
Building an Inexpensive Parallel Computer
Res. Lett. Inf. Math. Sci., (2000) 1, 113-118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University
Measuring MPI Send and Receive Overhead and Application Availability in High Performance Network Interfaces
Measuring MPI Send and Receive Overhead and Application Availability in High Performance Network Interfaces Douglas Doerfler and Ron Brightwell Center for Computation, Computers, Information and Math Sandia
Auto-Tunning of Data Communication on Heterogeneous Systems
1 Auto-Tunning of Data Communication on Heterogeneous Systems Marc Jordà 1, Ivan Tanasic 1, Javier Cabezas 1, Lluís Vilanova 1, Isaac Gelado 1, and Nacho Navarro 1, 2 1 Barcelona Supercomputing Center
HPCHadoop: A framework to run Hadoop on Cray X-series supercomputers
HPCHadoop: A framework to run Hadoop on Cray X-series supercomputers Scott Michael, Abhinav Thota, and Robert Henschel Pervasive Technology Institute Indiana University Bloomington, IN, USA Email: [email protected]
Performance Analysis of a Hybrid MPI/OpenMP Application on Multi-core Clusters
Performance Analysis of a Hybrid MPI/OpenMP Application on Multi-core Clusters Martin J. Chorley a, David W. Walker a a School of Computer Science and Informatics, Cardiff University, Cardiff, UK Abstract
MPI / ClusterTools Update and Plans
HPC Technical Training Seminar July 7, 2008 October 26, 2007 2 nd HLRS Parallel Tools Workshop Sun HPC ClusterTools 7+: A Binary Distribution of Open MPI MPI / ClusterTools Update and Plans Len Wisniewski
Accelerating Spark with RDMA for Big Data Processing: Early Experiences
Accelerating Spark with RDMA for Big Data Processing: Early Experiences Xiaoyi Lu, Md. Wasi- ur- Rahman, Nusrat Islam, Dip7 Shankar, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng Laboratory Department
Twister4Azure: Data Analytics in the Cloud
Twister4Azure: Data Analytics in the Cloud Thilina Gunarathne, Xiaoming Gao and Judy Qiu, Indiana University Genome-scale data provided by next generation sequencing (NGS) has made it possible to identify
HPC performance applications on Virtual Clusters
Panagiotis Kritikakos EPCC, School of Physics & Astronomy, University of Edinburgh, Scotland - UK [email protected] 4 th IC-SCCE, Athens 7 th July 2010 This work investigates the performance of (Java)
SR-IOV: Performance Benefits for Virtualized Interconnects!
SR-IOV: Performance Benefits for Virtualized Interconnects! Glenn K. Lockwood! Mahidhar Tatineni! Rick Wagner!! July 15, XSEDE14, Atlanta! Background! High Performance Computing (HPC) reaching beyond traditional
Big Application Execution on Cloud using Hadoop Distributed File System
Big Application Execution on Cloud using Hadoop Distributed File System Ashkan Vates*, Upendra, Muwafaq Rahi Ali RPIIT Campus, Bastara Karnal, Haryana, India ---------------------------------------------------------------------***---------------------------------------------------------------------
A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM
A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM Ramesh Maharjan and Manoj Shakya Department of Computer Science and Engineering Dhulikhel, Kavre, Nepal [email protected],
Multi-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN
1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction
LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
Performance Analysis of Lucene Index on HBase Environment
Performance Analysis of Lucene Index on HBase Environment Anand Hegde & Prerna Shraff [email protected] & [email protected] School of Informatics and Computing Indiana University, Bloomington B-649
LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.
LS-DYNA Scalability on Cray Supercomputers Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WP-LS-DYNA-12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability
Cloud Computing Where ISR Data Will Go for Exploitation
Cloud Computing Where ISR Data Will Go for Exploitation 22 September 2009 Albert Reuther, Jeremy Kepner, Peter Michaleas, William Smith This work is sponsored by the Department of the Air Force under Air
Java for High Performance Cloud Computing
Java for Computer Architecture Group University of A Coruña (Spain) [email protected] October 31st, 2012, Future Networks and Distributed SystemS (FUNDS), Manchester Metropolitan University, Manchester (UK)
FLOW-3D Performance Benchmark and Profiling. September 2012
FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute
Evaluating HDFS I/O Performance on Virtualized Systems
Evaluating HDFS I/O Performance on Virtualized Systems Xin Tang [email protected] University of Wisconsin-Madison Department of Computer Sciences Abstract Hadoop as a Service (HaaS) has received increasing
Enabling High performance Big Data platform with RDMA
Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery
NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB
bankmark UG (haftungsbeschränkt) Bahnhofstraße 1 9432 Passau Germany www.bankmark.de [email protected] T +49 851 25 49 49 F +49 851 25 49 499 NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB,
McMPI. Managed-code MPI library in Pure C# Dr D Holmes, EPCC [email protected]
McMPI Managed-code MPI library in Pure C# Dr D Holmes, EPCC [email protected] Outline Yet another MPI library? Managed-code, C#, Windows McMPI, design and implementation details Object-orientation,
High Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan
Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan Abstract Big Data is revolutionizing 21st-century with increasingly huge amounts of data to store and be
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed
Comparing the performance of the Landmark Nexus reservoir simulator on HP servers
WHITE PAPER Comparing the performance of the Landmark Nexus reservoir simulator on HP servers Landmark Software & Services SOFTWARE AND ASSET SOLUTIONS Comparing the performance of the Landmark Nexus
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, [email protected] Abstract 1 Interconnect quality
- An Essential Building Block for Stable and Reliable Compute Clusters
Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative
Modern Platform for Parallel Algorithms Testing: Java on Intel Xeon Phi
I.J. Information Technology and Computer Science, 2015, 09, 8-14 Published Online August 2015 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijitcs.2015.09.02 Modern Platform for Parallel Algorithms
High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand
High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand Hari Subramoni *, Ping Lai *, Raj Kettimuthu **, Dhabaleswar. K. (DK) Panda * * Computer Science and Engineering Department
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009
Performance Study Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009 Introduction With more and more mission critical networking intensive workloads being virtualized
Multicore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
Parallel and Distributed Computing Programming Assignment 1
Parallel and Distributed Computing Programming Assignment 1 Due Monday, February 7 For programming assignment 1, you should write two C programs. One should provide an estimate of the performance of ping-pong
Benchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
A Flexible Cluster Infrastructure for Systems Research and Software Development
Award Number: CNS-551555 Title: CRI: Acquisition of an InfiniBand Cluster with SMP Nodes Institution: Florida State University PIs: Xin Yuan, Robert van Engelen, Kartik Gopalan A Flexible Cluster Infrastructure
Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique
Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Mahesh Maurya a, Sunita Mahajan b * a Research Scholar, JJT University, MPSTME, Mumbai, India,[email protected]
A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures
11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the
Performance of the Cloud-Based Commodity Cluster. School of Computer Science and Engineering, International University, Hochiminh City 70000, Vietnam
Computer Technology and Application 4 (2013) 532-537 D DAVID PUBLISHING Performance of the Cloud-Based Commodity Cluster Van-Hau Pham, Duc-Cuong Nguyen and Tien-Dung Nguyen School of Computer Science and
Accelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
Improved LS-DYNA Performance on Sun Servers
8 th International LS-DYNA Users Conference Computing / Code Tech (2) Improved LS-DYNA Performance on Sun Servers Youn-Seo Roh, Ph.D. And Henry H. Fong Sun Microsystems, Inc. Abstract Current Sun platforms
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input
High Performance Computing in Java and the Cloud
High Performance Computing in Java and the Cloud Computer Architecture Group University of A Coruña (Spain) [email protected] January 21, 2013 Universitat Politècnica de Catalunya, Barcelona Outline Java
Sun Constellation System: The Open Petascale Computing Architecture
CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical
OpenMP Programming on ScaleMP
OpenMP Programming on ScaleMP Dirk Schmidl [email protected] Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign
Design and Implementation of the Workflow of an Academic Cloud
Design and Implementation of the Workflow of an Academic Cloud Abhishek Gupta, Jatin Kumar, Daniel J Mathew, Sorav Bansal, Subhashis Banerjee and Huzur Saran Indian Institute of Technology, Delhi {cs1090174,cs5090243,mcs112576,sbansal,suban,saran}@cse.iitd.ernet.in
BlobSeer: Towards efficient data storage management on large-scale, distributed systems
: Towards efficient data storage management on large-scale, distributed systems Bogdan Nicolae University of Rennes 1, France KerData Team, INRIA Rennes Bretagne-Atlantique PhD Advisors: Gabriel Antoniu
International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 8, August 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Performance Monitoring of Parallel Scientific Applications
Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure
Performance Evaluation of InfiniBand with PCI Express
Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Server Technology Group IBM T. J. Watson Research Center Yorktown Heights, NY 1598 [email protected] Amith Mamidala, Abhinav Vishnu, and Dhabaleswar
MapReduce Evaluator: User Guide
University of A Coruña Computer Architecture Group MapReduce Evaluator: User Guide Authors: Jorge Veiga, Roberto R. Expósito, Guillermo L. Taboada and Juan Touriño December 9, 2014 Contents 1 Overview
Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)
Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania) Outline Introduction EO challenges; EO and classical/cloud computing; EO Services The computing platform Cluster -> Grid -> Cloud
Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
Fast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009
ECLIPSE Best Practices Performance, Productivity, Efficiency March 29 ECLIPSE Performance, Productivity, Efficiency The following research was performed under the HPC Advisory Council activities HPC Advisory
MPI Application Development Using the Analysis Tool MARMOT
MPI Application Development Using the Analysis Tool MARMOT HLRS High Performance Computing Center Stuttgart Allmandring 30 D-70550 Stuttgart http://www.hlrs.de 24.02.2005 1 Höchstleistungsrechenzentrum
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,
marlabs driving digital agility WHITEPAPER Big Data and Hadoop
marlabs driving digital agility WHITEPAPER Big Data and Hadoop Abstract This paper explains the significance of Hadoop, an emerging yet rapidly growing technology. The prime goal of this paper is to unveil
Big Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid 5000 Testbed
Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid 5000 Testbed Sébastien Badia, Alexandra Carpen-Amarie, Adrien Lèbre, Lucas Nussbaum Grid 5000 S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum
An Open MPI-based Cloud Computing Service Architecture
An Open MPI-based Cloud Computing Service Architecture WEI-MIN JENG and HSIEH-CHE TSAI Department of Computer Science Information Management Soochow University Taipei, Taiwan {wjeng, 00356001}@csim.scu.edu.tw
The Performance Characteristics of MapReduce Applications on Scalable Clusters
The Performance Characteristics of MapReduce Applications on Scalable Clusters Kenneth Wottrich Denison University Granville, OH 43023 [email protected] ABSTRACT Many cluster owners and operators have
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul
Implementing MPI-IO Shared File Pointers without File System Support
Implementing MPI-IO Shared File Pointers without File System Support Robert Latham, Robert Ross, Rajeev Thakur, Brian Toonen Mathematics and Computer Science Division Argonne National Laboratory Argonne,
Big Data and Natural Language: Extracting Insight From Text
An Oracle White Paper October 2012 Big Data and Natural Language: Extracting Insight From Text Table of Contents Executive Overview... 3 Introduction... 3 Oracle Big Data Appliance... 4 Synthesys... 5
Processing Large Amounts of Images on Hadoop with OpenCV
Processing Large Amounts of Images on Hadoop with OpenCV Timofei Epanchintsev 1,2 and Andrey Sozykin 1,2 1 IMM UB RAS, Yekaterinburg, Russia, 2 Ural Federal University, Yekaterinburg, Russia {eti,avs}@imm.uran.ru
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
System Software for High Performance Computing. Joe Izraelevitz
System Software for High Performance Computing Joe Izraelevitz Agenda Overview of Supercomputers Blue Gene/Q System LoadLeveler Job Scheduler General Parallel File System HPC at UR What is a Supercomputer?
