Cluster Computing in a College of Criminal Justice



Similar documents
The Assessment of Benchmarks Executed on Bare-Metal and Using Para-Virtualisation

1 Bull, 2011 Bull Extreme Computing

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

Cluster Computing at HRI

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer

BSC - Barcelona Supercomputer Center

Linux Cluster Computing An Administrator s Perspective

Linux clustering. Morris Law, IT Coordinator, Science Faculty, Hong Kong Baptist University

High Performance Computing in CST STUDIO SUITE

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

MOSIX: High performance Linux farm

SR-IOV: Performance Benefits for Virtualized Interconnects!

The CNMS Computer Cluster

High Performance Computing. Course Notes HPC Fundamentals

Overview of HPC systems and software available within

Building an Inexpensive Parallel Computer

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

PERFORMANCE CONSIDERATIONS FOR NETWORK SWITCH FABRICS ON LINUX CLUSTERS

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

Building Clusters for Gromacs and other HPC applications

64-Bit versus 32-Bit CPUs in Scientific Computing

Trends in High-Performance Computing for Power Grid Applications

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

Microsoft Windows Server 2003 with Internet Information Services (IIS) 6.0 vs. Linux Competitive Web Server Performance Comparison

Mississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC

HPC Software Requirements to Support an HPC Cluster Supercomputer

Lecture 1: the anatomy of a supercomputer

Parallel Computing. Introduction

Performance Evaluation of Amazon EC2 for NASA HPC Applications!

Supercomputing Status und Trends (Conference Report) Peter Wegner

OpenMP Programming on ScaleMP

Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster

Shared Parallel File System

Improved LS-DYNA Performance on Sun Servers

Cluster Implementation and Management; Scheduling

System Requirements Table of contents

ANALYSIS OF SUPERCOMPUTER DESIGN

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

benchmarking Amazon EC2 for high-performance scientific computing

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center

Multicore Parallel Computing with OpenMP

Advanced Techniques with Newton. Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011

Enabling Technologies for Distributed Computing

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin.

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems

ASPI Performance Analysis - A Practical Model

Overlapping Data Transfer With Application Execution on Clusters

SERVER CLUSTERING TECHNOLOGY & CONCEPT

Simplest Scalable Architecture

A PERFORMANCE COMPARISON USING HPC BENCHMARKS: WINDOWS HPC SERVER 2008 AND RED HAT ENTERPRISE LINUX 5

Department of Computer Sciences University of Salzburg. HPC In The Cloud? Seminar aus Informatik SS 2011/2012. July 16, 2012

Business white paper. HP Process Automation. Version 7.0. Server performance

A Flexible Cluster Infrastructure for Systems Research and Software Development

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

High Performance Computing, an Introduction to

QUADRICS IN LINUX CLUSTERS

Client/Server and Distributed Computing

Achieving Performance Isolation with Lightweight Co-Kernels

Enabling Technologies for Distributed and Cloud Computing

- An Essential Building Block for Stable and Reliable Compute Clusters

On-Demand Supercomputing Multiplies the Possibilities

Large Scale Simulation on Clusters using COMSOL 4.2

Clusters: Mainstream Technology for CAE

CLUSTER APPROACH TO HIGH PERFORMANCE COMPUTING

The GRID according to Microsoft

Making A Beowulf Cluster Using Sun computers, Solaris operating system and other commodity components

FLOW-3D Performance Benchmark and Profiling. September 2012

Current Trend of Supercomputer Architecture

Main Memory Data Warehouses

Cloud Computing through Virtualization and HPC technologies

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

High Performance Computing

CAS2K5. Jim Tuccillo

Very Large Enterprise Network, Deployment, Users

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

New Storage System Solutions

Cluster performance, how to get the most out of Abel. Ole W. Saastad, Dr.Scient USIT / UAV / FI April 18 th 2013

A Very Brief History of High-Performance Computing

Network Performance in High Performance Linux Clusters

Some Myths in High Performance Computing. William D. Gropp Mathematics and Computer Science Argonne National Laboratory

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

LS DYNA Performance Benchmarks and Profiling. January 2009

High Performance Computing with Linux Clusters

Mathematical Libraries and Application Software on JUROPA and JUQUEEN

Kashif Iqbal - PhD Kashif.iqbal@ichec.ie

Fast Setup and Integration of ABAQUS on HPC Linux Cluster and the Study of Its Scalability

MPICH FOR SCI-CONNECTED CLUSTERS

Peter Senna Tschudin. Performance Overhead and Comparative Performance of 4 Virtualization Solutions. Version 1.29

Performance Monitoring of Parallel Scientific Applications

Performance of the Cloud-Based Commodity Cluster. School of Computer Science and Engineering, International University, Hochiminh City 70000, Vietnam

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Architecture of Hitachi SR-8000

Transcription:

Cluster Computing in a College of Criminal Justice Boris Bondarenko and Douglas E. Salane Mathematics & Computer Science Dept. John Jay College of Criminal Justice The City University of New York 2004 USENIX Annual Technical Conference Boston, MA July 2, 2004

Outline Importance of cluster computing (HPC) in a college whose focus is criminal justice and public administration Cluster computing projects in progress and planned (research and instruction) Issues that arise in building and managing clusters in organizations with limited resources and staff Cluster, Linux, and open source developments

Institutional Background John Jay College/CUNY College: Specialized Liberal Arts College within CUNY ( 13,000 students including 2000 graduate students). Degrees: Law and Police Science, Public Management, Fire Science, Security, Forensic Science, Computer Information Systems, M.S. in Forensic Computing (2004), Ph.D. in Criminal Justice. Mission: Advance the practice of criminal justice and public administration through research and by providing a professional workforce.

High Performance Computing at John Jay College I Fire standards and codes for buildings (Computational Fluid Dynamics - NIST Fire Dynamics Simulator and Smoke View) Latent Semantic Indexing (Principal Component Analysis Singular Value Decomposition) Toxicology (molecular modeling Gaussian) FBI s National Incident-Based Reporting System (NIBRS database analysis and data mining)

High Performance Computing at John Jay College II Aircraft control systems ( Parallel computation of Schur Form for rapid solution of Riccati Equation) Research and Instruction in mathematical software (ScaLAPACK, HPL Benchmark) Instruction in systems areas of computing, parallel algorithms, and distributed algorithms (NASA CIPA) Password Cracking (Teracrack SDSC)

Cluster Computing Facilities Computational Cluster (Beowulf Cluster): worldnode, 12 compute nodes (24 Pentium IV XEON (1.8 and 2.4 GHz processors, 1 GB RAM, 512K L2 cache), 20 GB local disk, Gigabit Ethernet, MPICH over TCP/IP, NFS File server, Linux 2.4.20-8smp Database Cluster: 4 nodes - remote access server, web server, Microsoft SQL and Oracle 10g Distributed Computing Laboratory: Computing Laboratory with 30 Linux Workstations (partnership with Science Dept.)

Cluster Design Considerations I Architecture Vendor supported blade/rack system or pile of PCs Cluster Software cluster distribution software (OSCAR - ORNL, NPAIC ROCKS - SDSC, or Scyld Beowulf) vs. self-configuration (Kickstart+ shell scripts) File System NFS; Andrew; GFS Sistina Systems; Lustre CFS, Inc.; PVS ANL, GPFS - IBM

Cluster Design Considerations II Interconnect Gigabit Ethernet, Myrinet, Quadrics, InfiniBand Message passing MPICH over TCP/IP Monitoring Ganglia UC Berkeley, Supermon - LANL, direct console access Testing Netpipe AMES Laboratory, BLACS, MPI Testers

ScaLAPACK Dense matrix computations in a distributed memory environment (clusters and MPP machines) Linear systems, least squares, eigenvalues, matrix decompositions (e.g., LU, QR, SVD) Reliable software with good error reporting facilities Not easy to use. User must write code to distribute the matrix over the process grid. User must set algorithmic parameters (e.g., block size, process array dimensions)

Basic Linear Algebra Communications Subroutines (BLACS) Setup/teardown process topologies (Array of processes most common) Point-to-point & broadcast send/receive of rectangular and trapezoidal matrices Miscellaneous routines (e.g., barrier, matrix element wise sum, max and min) Test routines to ensure reliable communications

Using BLACS to Detect Errors Broadcast testing routine: generates matrix on selected process, broadcasts it, receiving routines test for correct transmission. Process (0,1) reports errors, invalid element at A(12,16): Expected -.2417943949438026 Received -.2417638773656776

Basic Linear Algebra Subroutines (BLAS) Perform scalar, matrix vector and matrix matrix operations. Block algorithms to take advantage of memory hierarchies. Must be optimized for a specific processor. Three versions: Intel Math Kernel Library (MKL), ATLAS Generated, and KGoto. Multithreaded and single threaded versions.

BLAS matrix multiply routine DGEMM C = alpha*ab + beta*c, alpha and beta are scalars, A,B and C are matrices Critical for performance of many ScaLAPACK routines and HPL (e.g. HPL benchmark on Livermore MCR Cluster raised from 5.69 to 7.63 TFLOPS) Best results on Pentium IV: KGoto BLAS (special coding to minimize cache and TLB misses)

Performance of DGEMM (SMP 1.8 Mhz P4, SSE2, 512k L2 cache) 6000 5000 4000 MFLOPS 3000 2000 1000 0 KGOTO_PT KGOTO ATLAS ATLAS_PT 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Matrix Size (A,B,C)

HPL Benchmark Results (from Top 500) R Site CPUs R max R peak Earth 1 Japan 5,120 35,860 40,960 Simulator MCR Linux 12 LLNL 2,304 7,634 11,060 Network X 2.4Ghz X HP Alpha 67 NASA 1392 2,164 2,784 Server GSFC E*Trade X 499 E*Trade Financial 290 2.4Ghz X 634 1,392 JJ Cluster John Jay 24 2.4 & 47 98 1.8Ghx X R rank in Top 500 Super Computers list Rmax Linpack Benchmark (GFLOPS) Rpeak Theoretical Highest Performance (GFLOPS)

FBI National Incident Based Reporting System (NIBRS) Develop an Oracle database version of NIBRS and make it available to criminal justice research community Support online analysis and data mining through a web portal Provide mechanism for automatic updates Employ cluster/grid computing to provide high throughput and availability

NIBRS Data warehouse: Oracle 10G database on Linux Red Hat AS 3 Server 13 segments (flat files), 6 Main segments (administrative/incident, offense, property, victim, offender, arrestee), largest 3.2 million records, 100 to 200 bytes per record, 39 reference tables 2000/2001 data 1.29 Gbyte, expect about 10 Gbyte for 1995 to present

Cluster Developments Single System Image (cluster monitoring, OS version skew, single process space) Commodity low latency interconnect technology that provides unified I/O (Remote Direct Memory Access, InfiniBand?) Nodes that consume less power Cluster applications that provide error checking

Collaborators NIBRS Peter Shenkin, Raul Cabrera, Atiqual Mondal, and Samra Vlasnovec; Math and Computer Science Dept. Parallel Schur Decomposition Mythilli Mantharam, Math and Computer Science Dept. Fire and Smoke Simulation Glenn Corbet, Fire Science Dept. Molecular Modeling Ann Marie Sapse and Robert Rothchild, Science Dept.

Contact Information Douglas E. Salane NASA CIPA Cluster Computing Project dsalane@jjay.cuny.edu web.math.jjay.cuny.edu Bibliography available

Credits NASA Curriculum Partnership Improvement Award Graduate Research and Technology Initiative of CUNY (01,02,03) Open Source and freely available software (Linux, GNU compilers and languages, Apache, PHP, Oracle Academic License)