:Introducing Star-P The Open Platform for Parallel Application Development Yoel Jacobsen E&M Computing LTD yoel@emet.co.il
The case for VHLLs Functional / applicative / very high-level languages allow the programmer.to focus on what is needed vs. how to compute it :Productivity matters Shorter development cycles Shorter code Easier debugging Focus on the algorithms.interactivity is a key point
The Vision Our goal is to bridge the gap between easy-to-use desktop modeling, simulation and development tools with the power, scalability and low cost of parallel computer systems, clusters and grids MATLAB is a registered trademark of The MathWorks, Inc. ISC's products are not sponsored or.endorsed by The Mathworks, Inc. or by any other trademark owner referred to in this document
100X performance increase with 10% of the typical programming effort Familiar desktop tools: MATLAB, Python, R, others Hardware abstraction knowledge of parallel system not necessary
The Value Proposition of Star-P Collapse development times (eliminate ( batch workflow Improve productivity via familiar interface MATLAB Python language R language Other very high level languages ((VHLLs :Broaden use and access of HPC Systems, Clusters, Grids ClearSpeed, GPUs, FPGAs Increase performance Drive scalability of applications
?Is Star-P for You Solving Large & Complex?Problems Prefer MATLAB, Python, and R?over C/Fortran/MPI?Working with large data sets Running out of steam on the?desktop Parallel programming taking?too long?batch runs taking too long
Plug in open source & commercial libraries Easily integrate specialized packages via API 400 Linear System Solves Uses IMSL library Function through Task-Parallel SDK (Each System = 400x400 array) Run Time (sec.) 100.00 10.00 1.00 1 2 4 CPUs 8 16
Integration w/ HW Accelerators Task-Parallel Matrix Multiplication 40 35 Time (seconds) 30 25 20 15 10 5 0 4 Number of devices cores CS X620 Star-P & ClearSpeed Accelerator Boards Star-P & GPUs Star-P & SGI FPGA
Recent Product Enhancements Platform & Infrastructure SunGrid Workload Manager Vista Operating System Multi-language Support Star-P for Python Early visibility: R language Performance Performance profiling tools Code optimization tools Ease of Use Status dashboard enhancements Configuration and event logging
The Bottom Line Go from algorithm development to production deployment in days, not months Eliminate costly, time consuming, and inflexible re-programming of desktop algorithms More accurate simulations More sophisticated models
Computing with Star-P
First example 2.3GB Matrix ( a =<< rand(17000,17000*p = a ddense object: 17000-by-17000p ppwhos << :Your variables are Name Size Bytes Class a 17000x17000p 2.312000e+09 ddense array Grand total is 289000000 elements using 2.312000e+09 bytes
What Happens Under the Hood
Programming for Best Performance :For best performance Some program segments are best run in Serial mode Some in Task Parallel mode Some in Data Parallel mode tag triggers automaticp* Data Parallel computation ppeval triggers automatic Task Parallel computation Serial Task Parallel Data Parallel
Serial Computing in Star-P Use MATLAB File Editor Profiler Debugger Array Editor Desktop Visualization Small Calculations Computations taking less than.5 seconds
Programming for Best Performance Vectorization for faster computing :Example performance tips Vectorization example % clear Vectorize for-loops inside of the ppeval call n = 1000; Avoid trailing singleton dimensions x = zeros(1,n); Avoid assignments to structure-elements y = rand(3,n); Avoid ND indexing z = rand(3,n); % :Advanced topics % The unvectorized code See Star-P Tutorials & Application Notes for i = 1:n if z(1,i) >= 0 x(i) = y(1,i)*z(1,i) + y(2,i)*z(2,i) + y(3,i)*z(3,i); else x(i) = y(1,i)/z(1,i) + y(2,i)/z(2,i) + y(3,i)/z(3,i); end end % % The vectorized code for the for-loop above: indx = z(1,:) >= 0; x(indx) = sum(y(:,indx).*z(:,indx),1); indx = indx == 0; Vectorized TimeSerialLoop ;(x(indx) = sum(y(:,indx)./z(:,indx),1 Execution Execution Speed-upIndex sec 0.0002 sec 19 0.0038100 sec 0.0003 sec 101 0.03311,000 sec 0.0018 sec 273 0.484510,000
Star-P for MATLAB
Star-P for MATLAB
Star-P for Python
Star-P demonstrations Buffon-Laplace needle problem
Star-P demonstrations SVD calculation
Star-P architecture
Star-P Functional Overview
Familiar Desktop Tools
Star-P Client Connects to server Redirects library calls Optimizes serial code
Star-P Interactive Engine Server resource management User & session management Workload management
Star-P Computation Engine Data-Parallel Computations.1 Task-Parallel Computations.2 OpenConnect Library API Link.3
Data-Parallel Computations Global array syntax Operations on large distributed data sets World-class parallel libraries
Task-Parallel Computations Multiple independent calculations Simple, intuitive w/star-p s abstraction Plug in popular computation engines
Star-P OpenConnect Library API Link Leverage data- and task-parallel libraries, solvers Commercial and open source Enable access through desktop VHLLs
Star-P OpenConnect Library API Leverage data- and task-parallel libraries, solvers Commercial and open source Enable access through desktop VHLLs
Hardware Accelerators Embed compute- intensive algorithms.fpgas, GPUs, etc Library functions, called from desktop apps
Development Utilities Debugging, profiling, monitoring Built in, and interfaces to popular tools Interactively explore and optimize code
High-speed I/O Native parallel I/O Direct transfer between disk and server CPUs Eliminate client/server data transfer No need to manually break up files
Check Install - Check again - Go Verify Star-P Compatibility 1 Install Star-P Admin manager software 5 Star-P distribution CD / download 2 Start Star-P 6 Install Star-P Server software 3 Check Star-P installation success 7 Install Star-P Client software 4!GO
Working with Workload Managers
What s Coming in Star-P 2.6
Star-P 2.6: Areas of Focus Platform & Infrastructure Sun Grid Engine Workload Manager (Vista Operating System (client Performance Low-level profiling Start-time choice of BLAS libraries Ease of Use Status dashboard enhancements Configuration and event log enhancements Core Global Array Functionality Improved management of distributed matrix allocations ((page trashing prevention New Python Capability with Star-P Direct Parallel File I/O Parallel SciPy Installed Python, version check Expanded SDK support for MATLAB and Python STAR-P and the "star" logo are registered trademarks of Interactive Supercomputing, Inc. MATLAB is a registered trademark of The MathWorks, Inc. Other product or brand names are trademarks or registered trademarks of their respective holders. ISC's products are not sponsored or endorsed by The MathWorks, Inc. or by any other trademark 39.owner referred to in this document
Expanding Platform & Infrastructure
Workload Manager Integration Adding support for Sun Grid Engine workload manager Enabling deployment of Star-P at large supercomputing centers where jobs must be channeled through a workload manager (PBS (Pro, LSF or SGE Star-P jobs can be restricted to specific sets of computational resources 41
Support for Microsoft Vista Adds support for Star-P Client on desktops and laptops with Microsoft Vista operating systems (in addition to Linux (and Microsoft XP 42
Improving Performance
Low-Level Profiler New PPBench Tool System-level profiler Visibility into system performance ((below Star-P Processor, Memory, Cache, Network Type, Size, Speed Quickly identify performance expectations and bounds for your system 44
Improving Ease of Use
Star-P Status Dashboard Star-P Dashboard for parallel computing Server Heartbeat indicator :Four-color indicator (Server Starting (gray Server Ready for parallel (computing (green (Server Busy (blue (Server Disconnected (red 46
Configuration and Event Log Command Line argument checking, warnings, error messages Management tool for configuring Star-P Client Management tool for configuring Star-P Server Enabling/Disabling user access to Star-P Server Support for distribution / installation of Star-P Client software from Star-P Server 47
Core Mathematical & Global Array Functionality
Adding New Global Array Functionality Radar Algorithm Scaling Improved management of distributed matrix allocations Page trashing prevention Radar Algorithm Scaling-16 and 128-CPU Servers (16- and 128-CPU Servers, 4 different sample sizes) different sample sizes 4 100.0 Time (sec.) 1.6 GB (100 Msamples) 4.8 GB (300 Msamples) 12.8 GB (800 Msamples) 32.0 GB (2 Gsamples) 10.0 1.0 0.1 16 CPUs 49 128 CPUs
Support for Python
Expanding Support for Python Parallel File I/O for Python Task parallel computing with SciPy Data parallel computing with 70+ Python functions Star-P working with pre- installed Python, after a version compatibility check 51
Python Early Adopter Program!Python Users: Raise Your Hand ISC invites Python users to participate in a range of our Star-P Python early adopter program. The ways to participate and contribute to the broad Python computing :community include Contributing interesting codes Helping steer the Star-P for Python product Contributing to publications Participating in beta testing programs If you re interested in contributing to parallel technical :computing in Python, please contact us at python-isc@interactivesupercomputing.com 52
Expanding Star-P Connect support for MATLAB and Python scripts
Star-P Connect Expansion Adding support for calling Star-P Connect functions from Python programs 54
Thank you