Big Data and Little Clusters Rejuvenating old hardware to process large data sets



Similar documents
Use of Hadoop File System for Nuclear Physics Analyses in STAR

Enterprise Edition. Hardware Requirements

Quantifying Hardware Selection in an EnCase v7 Environment

VTrak SATA RAID Storage System

FastForward I/O and Storage: ACG 6.6 Demonstration

GraySort on Apache Spark by Databricks

FLOW-3D Performance Benchmark and Profiling. September 2012

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Cost-Effective Business Intelligence with Red Hat and Open Source

Brainlab Node TM Technical Specifications

Using Hadoop to Expand Data Warehousing

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

Recommended hardware system configurations for ANSYS users

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Business white paper. HP Process Automation. Version 7.0. Server performance

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

SUN ORACLE DATABASE MACHINE

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Education. Servicing the IBM ServeRAID-8k Serial- Attached SCSI Controller. Study guide. XW5078 Release 1.00

Establishing Applicability of SSDs to LHC Tier-2 Hardware Configuration

N /150/151/160 RAID Controller. N MegaRAID CacheCade. Feature Overview

Terms of Reference Microsoft Exchange and Domain Controller/ AD implementation

PARALLELS CLOUD STORAGE

R at the front end and

PARALLELS SERVER 4 BARE METAL README

System Requirements for Netmail Archive

Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array

Sun Microsystems Special Promotions for Education and Research January 9, 2007

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Hadoop Size does Hadoop Summit 2013

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

SUN ORACLE EXADATA STORAGE SERVER

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

Professional and Enterprise Edition. Hardware Requirements

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

Managing Storage Space in a Flash and Disk Hybrid Storage System

SAN TECHNICAL - DETAILS/ SPECIFICATIONS

SUN ORACLE DATABASE MACHINE

USB Flash Drives as an Energy Efficient Storage Alternative

HP Proliant BL460c G7

HP reference configuration for entry-level SAS Grid Manager solutions

High Availability Databases based on Oracle 10g RAC on Linux

IOmark- VDI. HP HP ConvergedSystem 242- HC StoreVirtual Test Report: VDI- HC b Test Report Date: 27, April

Configuration Maximums

Enabling Technologies for Distributed and Cloud Computing

Installation and Configuration Guide for Cluster Services running on Microsoft Windows 2000 Advanced Server using Acer Altos Servers

IMPLEMENTING GREEN IT

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications

A virtual SAN for distributed multi-site environments

FUJITSU Enterprise Product & Solution Facts

The Hardware Dilemma. Stephanie Best, SGI Director Big Data Marketing Ray Morcos, SGI Big Data Engineering

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Copyright by Parallels Holdings, Ltd. All rights reserved.

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Cluster Computing and Network Marketing Systems

Building All-Flash Software Defined Storages for Datacenters. Ji Hyuck Yun Storage Tech. Lab SK Telecom

Minimum Hardware Configurations for EMC Documentum Archive Services for SAP Practical Sizing Guide

Lab Evaluation of NetApp Hybrid Array with Flash Pool Technology

HTTP-FUSE PS3 Linux: an internet boot framework with kboot

SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

QUESTIONS & ANSWERS. ItB tender 72-09: IT Equipment. Elections Project

How To Store Data On A Server Or Hard Drive (For A Cloud)

Minimum Hardware Specifications Upgrades

POSIX and Object Distributed Storage Systems

Modernizing Servers and Software

Parallels Cloud Server 6.0 Readme

Minimum Hardware Specifications Upgrades

Microsoft Exchange Server 2003 Deployment Considerations

Performance brief for Oracle Enterprise Financial Management 8.9 (Order-to-Cash Counter Sales) on HP Integrity BL870c server blades

Dualog Connection Suite Hardware and Software Requirements

Parallels Cloud Server 6.0

Storage Architectures for Big Data in the Cloud

Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays

NEXTGEN v5.8 HARDWARE VERIFICATION GUIDE CLIENT HOSTED OR THIRD PARTY SERVERS

TheraDoc v4.6.1 Hardware and Software Requirements

Arrow ECS sp. z o.o. Oracle Partner Academy training environment with Oracle Virtualization. Oracle Partner HUB

Enabling Technologies for Distributed Computing

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Server Recommendations August 28, 2014 Version 1.22

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION

Staying afloat in today s Storage Pools. Bob Trotter IGeLU 2009 Conference September 7-9, 2009 Helsinki, Finland

One-click Hadoop Cluster Deployment on OpenPOWER Systems Pradeep K Surisetty IBM. #OpenPOWERSummit

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Investigation of Storage Systems for use in Grid Applications

Benchmarking the Availability and Fault Tolerance of Cassandra

Minimum Software and Hardware Requirements

IBM System x family brochure

This guide specifies the required and supported system elements for the application.

An Application of Hadoop and Horizontal Scaling to Conjunction Assessment. Mike Prausa The MITRE Corporation Norman Facas The MITRE Corporation

Implementing Enterprise Disk Arrays Using Open Source Software. Marc Smith Mott Community College - Flint, MI Merit Member Conference 2012

Price/performance Modern Memory Hierarchy

Flash Storage Roles & Opportunities. L.A. Hoffman/Ed Delgado CIO & Senior Storage Engineer Goodwin Procter L.L.P.

SLIDE 1 Previous Next Exit

SPC BENCHMARK 1 EXECUTIVE SUMMARY

Transcription:

Big Data and Little Clusters Rejuvenating old hardware to process large data sets Andrew S. Gardner, Lunar and Planetary Laboratory asg@lpl.arizona.edu 1

Agenda BLOC configurations 2005, 2013, 2014 Workload characteristics Benchmark results General principles 2

Up-front summary For your cluster Remember that time and software are your friends. Wipe the slate clean. Resist the urge to upgrade. For your software Evaluate where your development effort will be best spent. Reuse software even if it isn t buzzword compliant. Results, 2013: Reorganized hard drives into 4- drive RAID 6 on node1, updated to Debian 7. Doubled MCNPX performance, 13 hours to 7 on billion-particle sim. Results, 2014: 160 GB usable storage to 8 TB. Went from 5 days to 45 minutes to produce polar orbital averages for entire LEND mission. 3

BLOC 2005 Configuration Boynton Large Opteron Cluster Built in 2005 for Dr. William Boynton s group at LPL. Modeling the neutron emissivity of Martian soil models studied by 2001 Mars Odyssey NS. MCPNX Simulation of nuclear processes and particles, developed at LANL. Fortran, relatively small input and output data sets. 4

BLOC 2005 Configuration 16 nodes, Fedora Core 2 2, single-core Opteron 244 1.8 GHz, 64K/64K/1M caches Tyan Thunder K8S (S2882) 2 x 1Gb Ethernet 4 x SATA1 USB 1, 64b PCI-X 2GB RAM: 4 x 512MB DDR-400 Seagate 7200 RPM 80 GB HDD 1 in worker nodes 2 in root node and each of two hot spares Total of 19 in cluster 5

BLOC 2013 Configuration Goal to improve MCNPX performance, enable other processing. Stretch goal: process data from the Lunar Reconnaissance Orbiter (LRO) Lunar Exploration Neutron Detector (LEND). Refresh Debian GNU/Linux 7 No hot spares, RAID-6 in node1. 6

BLOC 2013 Configuration No hot spares Four disks in node1, RAID-6 One disk in worker nodes, ext4. Only the OS and swap are on the worker node drives. Debian GNU/Linux 7 NFS4, Kerberos, GCC 4.7, distcc apt-cacher, dnsmasq, scripted installer. Doubled MCNPX performance. LEND data won t fit. 11 detectors, 16 channels, 1 sample per second, every second, since 2009. Raw science data is approximately 150 GB; complete NAIF SPICE archive for spatial data is now ~315 GB. Processing requires creates datasets of similar or larger size (spatial data) and reductions. 7

BLOC 2014 Configuration Four, 4-TB hard drives, RAID-6 Seagate ST4000DM000, 4TB, 64 MB cache, 5900 RPM RAID-6 in the root node MATLAB R2014a Data sizes ~350 GB of initial and postprocessed data providing algorithm options to science team. ~315 GB of SPICE kernels. LEND orbital averages 24K orbits, ~2K samples per orbit. 5 days on shared SPARC-T3 with data in Oracle database. 4 hours in initial code running on all nodes in torque with arbitrary data formats on disk. 45 min on node1 in MATLAB with data stored in HDF5 and.mat formats. 8

BLOC 2015 Goals Extend to process data from MESSENEGER mission to Mercury; X- and gamma-ray data since 11. Spatial recalculation after final SPICE kernels released after the end of mission. X-ray spectrometer integration footprint recalculation using high resolution at lower altitudes to support mapping products. 9

Why this software and this hardware? Why not Hadoop? NAIF SPICE is written in Fortran. Single-threaded, has C, Java, MATLAB, IDL interfaces. Builds a database of dynamical data in memory, and can tolerate only one in any process. Mature code that isn t in Java Spatial processing software for orbital dynamics was written in C, C++, Oracle Pro*C. Analysis tools in MATLAB. Why not El Gato, AWS, Google Compute Cloud, etc.? Your old cluster is probably free to use today, minus the cost of your time. You don t have to share your old cluster and you don t have to pay for data transiting in and out. But, you wouldn t build this cluster today; you d probably use a cloud provider or El Gato. 10

Summary For your cluster Remember that time and software are your friends. Wipe the slate clean. Resist the urge to upgrade. For your software Evaluate where your development effort will be best spent. Reuse software even if it isn t buzzword compliant. Results, 2013: Reorganized hard drives into 4- drive RAID 6 on node1, updated to Debian 7. Doubled MCNPX performance, 13 hours to 7 on billion-particle sim. Results, 2014: 160 GB usable storage to 8 TB. Went from 5 days to 45 minutes to produce polar orbital averages for entire LEND mission. 11