Parallel application development

Size: px
Start display at page:

Download "Parallel application development"

Transcription

1 Jan Ciesko HPC Consultant Microsoft Microsoft HPC landscape Introduction Microsoft HPC Server 2008 Demo Parallel application development Microsoft Parallel Programming initiative MPI / OMP / TPL Demos 1

2 Microsoft HPC landscape Introduction Microsoft HPC Server 2008 Demo Parallel application development Microsoft Parallel Programming initiative MPI / OMP / TPL Demo Riggid body fluid interaction. FEM. 2

3 CFD, chemical transport. FinDif. Electro magneto dynamics, FEM 3

4 Financial services and computing, parametric sweep Trend: Observation: HPC is becoming a decisive factor for successful businesses. 4

5 Reducing application development complexity Providing development tools Supporting parallelization paradigms and applications scenarios Offering libraries (abstraction) Providing an efficient HPC platform usage and administration Deployment Integration with infrastructure and policies Job submission and scheduling Diagnostics and fail over redundancy Providing an high performance computing platform Supporting shared / distributed memory systems Supporting fast interconnects Enhancing the operating system for high performance OS + HPC Tools + Front End + Back End + Dev 5

6 Microsoft HPC landscape Introduction Microsoft HPC Server 2008 Demo Parallel application development Microsoft Parallel Programming initiative MPI / OMP / TPL Demos 6

7 High availability or fail over clusters Server Cluster in WS2003 EE (up to 8 nodes) Provides availability for mission critical apps (DB, ERP, CRM) Load balancing clusters Network Load Balancing in every version of WS2003 (up to 32 nodes) Provides high availability for scale out apps (Web servers) High performance computing (HPC) clusters Improve performance by splitting computational task across many different nodes No imposed node limit Grid computing Closely related to cluster computing but operates more like a computing utility than like a single computer Support heterogeneous collections of computers that do not always trust each other 7

8 New System Center UI PowerShell for CLI Management High Availability for Head Nodes Windows Deployment Services Diagnostics/Reporting Support for Operations Manager Support for SOA and WCF Granular resource scheduling Improved scalability for larger clusters New Job scheduling policies Interoperability via HPC Profile NetworkDirect (RDMA) for MPI Improved Network Configuration Wizard Shared Memory MS MPI for multi core MS MPI integrated with Windows Event Tracing Improved isci SAN Support in Win2008 Improved Server Message Block ( SMB v2) New 3rd party parallel system file support for Windows New Memory Cache Vendors 16 8

9 Support for larger clusters Create new designs for clusters of size, including heterogeneous clusters Scale deployment and administration technologies Provide interfaces for those accustomed to *nix Improve interoperability with existing IT infrastructure Interoperability with existing job schedulers High speed file I/O through native support for parallel and clustered file systems Broader application support Simplify the integration of new applications with the job scheduler Addressing needs of in house and open source developers Platform Support Built for Windows Server 2008 Cluster nodes with different hardware / software Priorities Comparable with hardware optimized MPI stacks Focus on MPI Only Solution for version 2 Verbs based design for close fit with native, high perf networking interfaces Coordinated w/ Win Networking team s long term plans Implementation MS MPIv2 capable of 4 networking paths: Shared Memory between processors on a motherboard TCP/IP Stack ( normal Ethernet) Winsock Direct (and SDP) for sockets based RDMA New RDMA networking interface HPC team partners with networking IHVs to develop/distribute drivers for this new interface TCP/Ethernet Networking Socket Based App Windows Sockets (Winsock + WSD) TCP IP NDIS Networking Networking Mini port Hardware Hardware Driver Networking Hardware Networking Hardware Hardware Driver MPI App MS MPI Networking Networking WinSock Hardware Hardware NetworkDirect Networking Direct Provider Provider Hardware Networking Hardware Networking Hardware User Mode Access Layer Networking Hardware Networking Hardware Networking Hardware RDMA Networking Kernel By Pass User Mode Kernel Mode (ISV) App CCP Component OS Component IHV Component 9

10 10 Private Network Public Network Highly Available Head Node WCF Brokers Head node Failover Head node [ ] 1. User submits job. 2. Session Manager assigns WCF Broker node for client job 3. HN Provides WCF Broker node 5. Requests 4. Client connects to Broker and submits requests 7. Responses return to client Compute Nodes Workstation Workstation Workstation 6. Responses 20 Track service resource usage Run service as the user Restart upon failure Write Tracing Balance the requests Grow & shrink service pool Provide WS Interoperability Node heatmaps Perfmon Event logs Job status Service usage report Tracing logs What Backend does What admin sees What user runs Control Path Data path

11 What is it? A draft OGSA (Open Grid Services Architectures) interoperability standard for batch job scheduler task submission and management Based on web services standards (HTTP, XML, SOAP) What is its value? Enables integration of HPC applications executing on different platforms and schedulers via web services standards What s the Status? Passed the public comment period Working on new extensions LSF / PBS / SGE / Condor Linux, AIX, Solaris HPUX, Windows Windows Cluster Windows Center Window Center demo 11

12 Loca%on Hardware Machines Hardware Networking Number of Compute Nodes Total Number of Cores Total Memory Par%culars of for current Linpack Runs Best Linpack ra%ng so far Best cluster efficiency so far For Comparison Linpack ra%ng from June2007 Top500 run (#106) on the same hardware Cluster efficiency from June2007 Top500 run (#106) on the same hardware Typical Top500 efficiency for Clovertown motherboards w/ IB regardless of Opera%ng System Tukwila, WA 256 Dual CPU, quad core Intel 5320 Clovertown 1.86GHz processors and 8GB RAM Private & Public: Broadcom GigE MPI: Cisco Infiniband SDR incl/ 34 IB switches in leaf&node configura%on TB of RAM TeraFLOPS 77.1% 8.99 TeraFLOPS 59% 65 77% (2 instances of 79%) 30% improvement in efficiency on the same hardware. Release: Nov 07 Beta 1 Summer 08 RTM Less than 2 hours to deploy Microsoft HPC landscape Introduction Microsoft HPC Server 2008 Demo Parallel application development Microsoft Parallel Programming initiative MPI / OMP / TPL Demos 12

13 Microsoft HPC landscape Introduction Microsoft HPC Server 2008 Demo Parallel application development Microsoft Parallel Programming initiative MPI / OMP / TPL Demos Development and Parallel debugging in Visual Studio 3 rd party Compilers, Debuggers, Runtimes etc.. available Emerging Technologies Parallel Framework LINQ/PLINQ natural OO language for SQL queries in.net C# Futures way to explicitly make loops parallel For the future: Parallel Computing Initiative (PCI) Triple investment with a new engineering team Compilers Profilers and Tracers Debuggers Focused on common tools for developing multi core codes from desktops to clusters Visual Studio Intel C++ Gcc PGI Fortran Intel Fortran Absoft Fortran Fujitsu PerfMon ETW (for MS MPI) VSPerf /VSCover CLRProfiler Vampir (Being ported to Windows) Intel Collector/ Analyzer(Runs on CCS w Intel MPI) Vtune & CodeAnalyst Marmot (Being ported to Visual Studio WinDbg DDT Runtimes and Libraries MPI OpenMP C# Futures MPI.C++ and MPI.Net PLINQ 13

14 MPI Standard MPICH + Windows related enhancements + RDMA (v2) MPICH2 based with Windows enhancements (w RDMA in CCSV2) OpenMP Available for C++ and C++/CLI (OpenMP for managed/.net code!) C# Futures Library for writing concurrent C# programs: Parallel.For, etc, using generics & anonymous delegates (MORE) MPI.C++ and MPI.Net Based on Boost.MPI library: Optimized, natural integration of MPI w C++ and STL Plinq (Parallel) Language Integrated Query: SQL syntax built into C# Joins, maps, reductions, sorts, ; being extended to clusters (MORE) App Integration Employs SOA for load balancing a web service like interactive app on a cluster Parametric Sweeps: embarassingly parallel jobs Job Scheduler Updated interface: ease of use GPGPU Integrated support for Data Parallel programming in C#; rich syntax (research compiler) Offload to generic GPGPU s Various 3 rd party solutions that integrate into VS Microsoft HPC landscape Introduction Microsoft HPC Server 2008 Demo Parallel application development Microsoft Parallel Programming initiative MPI / OMP / TPL Demos 14

15 using System; using MPI; class MPIHello { static void Main(string[] args) { using (new MPI.Environment(ref args)) { // MPI program goes here! } } } using System; using MPI; class Ring { static void Main(string[] args) { using (new MPI.Environment(ref args)) { Communicator comm = Communicator.world; if (comm.rank == 0) { // program for rank 0 } else // not rank 0 { // program for all other ranks } } } } 15

16 System.Concurrency.dll 16

17 Microsoft HPC landscape Introduction Microsoft HPC Server 2008 Demo Parallel application development Microsoft Parallel Programming initiative MPI / OMP / TPL Demos 17

18 Tech Preview Beta 1 Beta 2 Aug 2007 Nov 2007 Early 2008 RTM Summer 2008 Technical Preview Private Release Beta 1 Publicly Available now! Beta 2 Coincides with Windows Server 2008 RTM RTM 90 to 120 days after Windows RTM 18

19 Technical Communities, Webcasts, Blogs, Chats & User Groups Product Information us/office/bb aspx Microsoft Developer Network (MSDN) & TechNet us/office/aa aspx Trial Software and Virtual Labs us/sharepointserver/fx aspx 19