Big Data and Little Clusters Rejuvenating old hardware to process large data sets Andrew S. Gardner, Lunar and Planetary Laboratory asg@lpl.arizona.edu 1
Agenda BLOC configurations 2005, 2013, 2014 Workload characteristics Benchmark results General principles 2
Up-front summary For your cluster Remember that time and software are your friends. Wipe the slate clean. Resist the urge to upgrade. For your software Evaluate where your development effort will be best spent. Reuse software even if it isn t buzzword compliant. Results, 2013: Reorganized hard drives into 4- drive RAID 6 on node1, updated to Debian 7. Doubled MCNPX performance, 13 hours to 7 on billion-particle sim. Results, 2014: 160 GB usable storage to 8 TB. Went from 5 days to 45 minutes to produce polar orbital averages for entire LEND mission. 3
BLOC 2005 Configuration Boynton Large Opteron Cluster Built in 2005 for Dr. William Boynton s group at LPL. Modeling the neutron emissivity of Martian soil models studied by 2001 Mars Odyssey NS. MCPNX Simulation of nuclear processes and particles, developed at LANL. Fortran, relatively small input and output data sets. 4
BLOC 2005 Configuration 16 nodes, Fedora Core 2 2, single-core Opteron 244 1.8 GHz, 64K/64K/1M caches Tyan Thunder K8S (S2882) 2 x 1Gb Ethernet 4 x SATA1 USB 1, 64b PCI-X 2GB RAM: 4 x 512MB DDR-400 Seagate 7200 RPM 80 GB HDD 1 in worker nodes 2 in root node and each of two hot spares Total of 19 in cluster 5
BLOC 2013 Configuration Goal to improve MCNPX performance, enable other processing. Stretch goal: process data from the Lunar Reconnaissance Orbiter (LRO) Lunar Exploration Neutron Detector (LEND). Refresh Debian GNU/Linux 7 No hot spares, RAID-6 in node1. 6
BLOC 2013 Configuration No hot spares Four disks in node1, RAID-6 One disk in worker nodes, ext4. Only the OS and swap are on the worker node drives. Debian GNU/Linux 7 NFS4, Kerberos, GCC 4.7, distcc apt-cacher, dnsmasq, scripted installer. Doubled MCNPX performance. LEND data won t fit. 11 detectors, 16 channels, 1 sample per second, every second, since 2009. Raw science data is approximately 150 GB; complete NAIF SPICE archive for spatial data is now ~315 GB. Processing requires creates datasets of similar or larger size (spatial data) and reductions. 7
BLOC 2014 Configuration Four, 4-TB hard drives, RAID-6 Seagate ST4000DM000, 4TB, 64 MB cache, 5900 RPM RAID-6 in the root node MATLAB R2014a Data sizes ~350 GB of initial and postprocessed data providing algorithm options to science team. ~315 GB of SPICE kernels. LEND orbital averages 24K orbits, ~2K samples per orbit. 5 days on shared SPARC-T3 with data in Oracle database. 4 hours in initial code running on all nodes in torque with arbitrary data formats on disk. 45 min on node1 in MATLAB with data stored in HDF5 and.mat formats. 8
BLOC 2015 Goals Extend to process data from MESSENEGER mission to Mercury; X- and gamma-ray data since 11. Spatial recalculation after final SPICE kernels released after the end of mission. X-ray spectrometer integration footprint recalculation using high resolution at lower altitudes to support mapping products. 9
Why this software and this hardware? Why not Hadoop? NAIF SPICE is written in Fortran. Single-threaded, has C, Java, MATLAB, IDL interfaces. Builds a database of dynamical data in memory, and can tolerate only one in any process. Mature code that isn t in Java Spatial processing software for orbital dynamics was written in C, C++, Oracle Pro*C. Analysis tools in MATLAB. Why not El Gato, AWS, Google Compute Cloud, etc.? Your old cluster is probably free to use today, minus the cost of your time. You don t have to share your old cluster and you don t have to pay for data transiting in and out. But, you wouldn t build this cluster today; you d probably use a cloud provider or El Gato. 10
Summary For your cluster Remember that time and software are your friends. Wipe the slate clean. Resist the urge to upgrade. For your software Evaluate where your development effort will be best spent. Reuse software even if it isn t buzzword compliant. Results, 2013: Reorganized hard drives into 4- drive RAID 6 on node1, updated to Debian 7. Doubled MCNPX performance, 13 hours to 7 on billion-particle sim. Results, 2014: 160 GB usable storage to 8 TB. Went from 5 days to 45 minutes to produce polar orbital averages for entire LEND mission. 11