NERSC Data Efforts Update Prabhat Data and Analytics Group Lead February 23, 2015-1 -
A little bit about myself Computer Scien.st Brown, IIT Delhi Real- 3me Graphics, Virtual Reality, HCI Computa3onal Research Division Scien3fic Visualiza3on, HPC Data Management, Parallel I/O Sta3s3cs, Machine Learning, Big Data Climate Scien.st Pursuing PhD in UC Berkeley w/ Bill Collins Broadly interested in: Physics (Par3cle Physics, Astronomy) Chemistry (Material Science) Biology (Neuroscience, BioImaging) - 2 -
Data and Analytics Services (DAS) Team DAS Team Member Shreyas Cholia Yushu Yao AnneUe Greiner Joaquin Correa Burlen Loring Jeff Porter Oliver Ruebel Dani Ushizima Technology Areas Gateways, Web, Grid Databases, Analy3cs UI, Web Imaging, Machine Learning Vis Data Management Vis, Analy3cs Imaging, R R. K. Owen NIM Michael Urashka Web - 3 -
DAS Team Goal: Enable Data-Centric Science at Scale Big Data So=ware Broad ecosystem of capabili3es and technologies Research, evaluate, deploy and maintain so\ware Customize and op3mize for NERSC/HPC pla^orms Engaging NERSC Users Broad user base support 1-1 in- depth engagement - 4 -
How to get help? Documenta.on: hup://www.nersc.gov/users/so\ware/data- visualiza3on- and- analy3cs/ Rou.ne startup/troubleshoo.ng ques.ons: Trouble 3cket system In- depth 1-1 collabora.ons: e- mail prabhat@lbl.gov - 5 -
DOE Facilities are Facing a Data Deluge Genomics Climate Astronomy Physics Light Sources 6
We currently deploy separate Compute Intensive and Data Intensive Systems Compute Intensive Data Intensive Carver Genepool - 7 - PDSF
Cori: Unified architecture for HPC and Big Data 64 Cabinets of Cray XC System 50 cabinets Knights Landing manycore compute nodes 10 cabinets Haswell compute nodes for data par44on ~4 cabinets of Burst Buffer 14 external login nodes Aries Interconnect (same as on Edison) Lustre File system 28 PB capacity, 432 GB/sec peak performance NVRAM Burst Buffer for I/O accelera.on Significant Intel and Cray applica.on transi.on support Delivery in mid- 2016; installa.on in new LBNL CRT - 8 -
Popular features of a data intensive system can be supported on Cori Data Intensive Workload Need Local Disk Large memory nodes Cori Solu.on NVRAM burst buffer 128 GB/node on Haswell; Op3on to purchase fat (1TB) login node Massive serial jobs Complex workflows Communicate with databases from compute nodes Stream Data from observa3onal facili3es Easy to customize environment Policy Flexibility NERSC serial queue prototype on Edison; MAMU More (14) external login nodes; CCM mode for now Proposed Compute Gateway Node COE Proposed Compute Gateway Node COE Proposed User Defined Images COE Improvements coming with Cori: Rolling upgrades, CCM, MAMU, above COEs would also contribute - 9 -
Big Data Software Portfolio Capabili.es Technology Areas Tools, Libraries Data Transfer + Access Globus, Grid Stack, Authen3ca3on Portals, Gateways, RESTful APIs Globus Online, Grid FTP Drupal/Django, NEWT Data Processing Workflows Swi\, Fireworks, Data Management Formats, Models Databases Storage, I/O, Movement HDF5, NetCDF SRM Data Analy3cs Sta3s3cs, Machine Learning python, R, ROOT Data Visualiza3on Backend Infrastructure Imaging SciVis InfoVis Analy3cs Stack Databases Virtualiza3on - 10 - OMERO, Fiji, VisIt, Paraview BDAS SciDB, MySQL, PostgreSQL, MongoDB Docker
Analytics Software Strategy Science Applica3ons Climate, Cosmology, Kbase, Materials, BioImaging, Analy3cs Capabili3es Sta3s3cs, Machine Learning Image Processing Graph Analy3cs Database Opera3ons Tools + Libraries R, python, MLBase MATLAB GraphX SQL Run3me Framework MPI Spark SciDB, PostGres Resource Management Filesystems (Lustre), Batch/Queue Systems Hardware SandyBridge/KNL chipset, Burst Buffers, Aries Interconnect
Top NERSC Production Workflows Advanced Light Source SPOT suite Real 3me reconstruc3on, experimental steering Materials Project Cosmology Supernovae/Transient classifica.on pipeline JGI (Genome Sequencing, Assembly) - 12 -
Concluding Remarks Vision: NERSC is the hub for Scien.fic Big Data Hardware So\ware, Services People (staff, vendors, researchers) We welcome your feedback! - 13 -
Ques3ons? Contact: prabhat@lbl.gov - 14 -