SOSCIP Platforms SOSCIP Platforms 1
SOSCIP HPC Platforms Blue Gene/Q Cloud Analytics Agile Large Memory System 2
SOSCIP Platforms Blue Gene/Q Platform 3
top500.org Rank Site System Cores Rmax (TFlop/s) Rpeak (TFlop/s) Power (kw) 1 National Super Computer Center in Guangzhou, China Tianhe-2 (MilkyWay-2)- TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P, NUDT 2 DOE/SC/Oak Ridge National Laboratory, United States Titan- Cray XK7, Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x, Cray Inc. 3 DOE/NNSA/LLNL, United States Sequoia- BlueGene/Q, Power BQC 16C 1.60 GHz, Custom, IBM 3120000 33862.7 54902.4 17808 560640 17590 27112.5 8209 1572864 17173.2 20132.7 7890 4 RIKEN Advanced Institute for Computational Science (AICS), Japan K computer, SPARC64 VIIIfx 2.0GHz, Tofu interconnect, Fujitsu 705024 10510 11280.4 12660 5 DOE/SC/Argonne National Laboratory, United States Mira- BlueGene/Q, Power BQC 16C 1.60GHz, Custom, IBM 786432 8586.6 10066.3 3945 6 DOE/NNSA/LANL/SNL, United States Trinity- Cray XC40, Xeon E5-2698v3 16C 2.3GHz, Aries interconnect, Cray Inc. 301056 8100.9 11078.9 7 Swiss National Supercomputing Centre (CSCS), Switzerland 8 HLRS - Höchstleistungsrechenzentrum Stuttgart, Germany 9 King Abdullah University of Science and Technology, Saudi Arabia 10 Texas Advanced Computing Center/Univ. of Texas, United States 4 Piz Daint- Cray XC30, Xeon E5-2670 8C 2.600GHz, Aries interconnect, NVIDIA K20x, Cray Inc. Hazel Hen- Cray XC40, Xeon E5-2680v3 12C 2.5GHz, Aries interconnect, Cray Inc. Shaheen II- Cray XC40, Xeon E5-2698v3 16C 2.3GHz, Aries interconnect, Cray Inc. Stampede- PowerEdge C8220, Xeon E5-2680 8C 2.700GHz, Infiniband FDR, Intel Xeon Phi SE10P, Dell 115984 6271 7788.9 2325 185088 5640.2 7403.5 196608 5537 7235.2 2834 462462 5168.1 8520.1 4510
SOSCIP's Blue Gene/Q Specifications 64k core cluster 1.6 GHz cores 64 TB RAM 5-D torus interconnect Performance Measured: 716 Tflops Efficiency: 2.1 Gflops/W 5
Typical Applications Disaster planning and mitigation Molecular modeling Protein folding Drug discovery Computational fluid dynamics Nuclear fusion Genomics Brain modeling Climate and weather 6 Complex infrastructure simulation
Suitable Applications Large-scale, massively parallel and distributed 1,024 cores or more Need low-latency, high-bandwidth communication for ( ) { } Written in C/C++, Fortran, or Python // // GPL // // // Me 2015 // MPI_Init(...)... Sendrecv(... )... Custom and open-source software Use MPI and OpenMP 7
Ocean Mixing Simulation H. Salehipour & W.R. Peltier (University of Toronto) 8
SOSCIP Platforms Cloud Analytics Platform 9
Big Data Challenges Three V's: Volume, Velocity, Variety Processing power Storage and data locality Analytics frameworks 10
Cloud Platform Hardware ps- & hs-series blades POWER servers Powerful x86 and IBM Power Systems servers GPFS storage system (1 petabyte) Infiniband and 10 GbE networks x86 NeXtScale x86 idataplex 11
Software Available IBM InfoSphere Streams ILOG CPLEX IBM InfoSphere BigInsights 12 Plus almost anything from the IBM Academic Initiative catalog!
Typical Applications Real-time Medical Data Collection and Analytics Text Analytics Cybersecurity Document Filtering Energy Systems Data Analytics Image Processing Machine Learning Social Media Analytics Medical Records Analytics 13
Suitable Applications Require IBM analytics software Require other commercial software Small number of cores (< 100) Small clusters Big data storage (1-100 TB) 14
Analyzing Geospatial Patterns Neil Banerjee (Western University) Multidirectional edge detection algorithm for mineral exploration and mining application Detected edges highly correlate with known mineral deposits IBM BigInsights for high level of automation 16 Know Mineral Locations Hill shading Feature detection
SOSCIP Platforms Large Memory System 17
Big Data and Memory CPU ~5-8 GB/s Latency: > 1 ms Disk Storage Pros - Inexpensive - Large Capacity Cons -Slow! 50-200 GB/s Latency: < 100 ns System Memory (DRAM) Pros - Fast! Cons - Cost per GB - Limited capacity per server 18
LMS Specification 3-nodes acting as single system 64 x86 cores vsmp 4.5 TB RAM (1.5 TB per node) Shared memory programming 19
Suitable Applications Need to keep all data in RAM for speed Generate large amount of intermediate data Need > 128 GB of RAM Require medium number of cores (< 64) Need shared memory programming paradigm Use commercial software 20
SOSCIP Platforms Agile Computing Platform 21
The Case for FPGAs CPU Scaling ICT Power Consumption 22 FPGA acceleration offers: Algorithms in re-configurable circuitry High performance parallelism High power efficiency
Agile Computing Platform Development Environment Runtime Environment Fast x86 servers Simulate, debug, build Tools x86 and POWER8 Stratix V FPGAs 10 GbE FPGA Network on POWER8 23 Development Kit
Suitable Applications Real-time or time-critical processing Compute intensive Exploit parallelism in depth and/or width Wide vectorization Big data 0 3 0 6 Non-traditional data types 24
Typical Applications Health/Medical systems Image/Video Processing Machine Learning Signal Processing Data Security 25 Big Data Analytics
Real-time fmri Brain Analytics Mark Daley (Western University) The problem: brain activity scans take days to analyze The solution: an FPGA-accelerated real-time analytics engine FPGA replaces 48 x86 cores and implements superior motion correction algorithm IBM InfoSphere Streams on POWER constructs graphs of brain networks 40x faster than single process on x86 26 Graph updates every 0.6-0.8s Results in seconds instead of days!
When am I ready for SOSCIP? Scaling computing power Scaling storage size Unique technology needs Software needs 28
Platform Summary Platform CPU Operating Systems Commercial Software Languages Support Blue Gene/Q PowerPC Linux No C, C++, Python, Fortran SciNet, IBM specialist Large Memory System x86 Linux Yes All HPCVL Cloud Analytics x86, POWER Linux, Windows, AIX Yes All SHARCNET, IBM specialist Agile x86, POWER8 Linux Yes All + OpenCL, Verilog/VHDL SHARCNET, IBM specialist 29
Notices: POWER8, Power Systems logo, InfoSphere, InfoSphere Streams logo, InfoSphere BigInsights, InfoSphere BigInsights logo, SPSS, Cognos, Rational, and the IBM logo are trademarks or registered trademarks of International Business Machines Corp. Altera, Quartus II and Stratix are trademarks of Altera Corp. ModelSim is a trademark of Mentor Graphics OpenCL and the OpenCL logo are trademarks of Apple Inc. Matlab and the Matlab logo are trademarks of Mathworks Inc. Python and the Python logo are trademarks of the Python Software Foundation Nallatech is a trademark of Interconnect Systems Inc. 30
31 backup
What are FPGAs? FPGA = Field Programmable Gate Array Multiply Configure groups of logic elements to construct function blocks FPGA chips have 100,000's of configurable logic circuits Load Add Add Store Connect several blocks into data pipeline Logic Element Simultaneously work on different elements of a data stream at each stage on every clock cycle 32 Interconnect Onboard RAM
OpenCL TM attribute ((num_simd_work_items(4))) attribute ((reqd_work_group_size(64,1,1))) attribute ((num_compute_units(2))) kernel void vectoradd( global const int *x, global const int *y, global int *restrict z) { int index = get_global_id(0); } Kernel code z[index] = x[index] + y[index]; OpenCL Compiler Programming file Host code clenqueuendrangekernel(queue, kernel, dim, offset, size, local_size,...); 33
CAPI Coherent Accelerator Processor Interface SMP links PSL PCIe CAPP CAPI Protocol Features Shared Virtual Address Space HW Managed Cache Coherence RAM CAPP: Coherent Attached Processor Proxy, PSL: POWER Service Layer 34