Running Native Lustre* Client inside Intel Xeon Phi coprocessor

Similar documents
Mellanox Academy Online Training (E-learning)

White Paper. System Administration for the. Intel Xeon Phi Coprocessor

Lustre Networking BY PETER J. BRAAM

Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School

Module I-7410 Advanced Linux FS-11 Part1: Virtualization with KVM

System Administration for the Intel Xeon Phi Coprocessor

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Virtualised MikroTik

Configuring LNet Routers for File Systems based on Intel Enterprise Edition for Lustre* Software

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

Shared Parallel File System

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

NFS SERVER WITH 10 GIGABIT ETHERNET NETWORKS

RDMA over Ethernet - A Preliminary Study

Xeon Phi Application Development on Windows OS

A quick tutorial on Intel's Xeon Phi Coprocessor

Using Red Hat Network Satellite Server to Manage Dell PowerEdge Servers

HP ProLiant SL270s Gen8 Server. Evaluation Report

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions

ECLIPSE Performance Benchmarks and Profiling. January 2009

Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster

Installing Hadoop over Ceph, Using High Performance Networking

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Sun Constellation System: The Open Petascale Computing Architecture

Enabling Technologies for Distributed Computing

Hadoop on the Gordon Data Intensive Cluster

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Configuring VMware vsphere 5.1 with Oracle ZFS Storage Appliance and Oracle Fabric Interconnect

Solution Brief July All-Flash Server-Side Storage for Oracle Real Application Clusters (RAC) on Oracle Linux

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

State of the Art Cloud Infrastructure

Advancing Applications Performance With InfiniBand

Linux NIC and iscsi Performance over 40GbE

Building Enterprise-Class Storage Using 40GbE

BRIDGING EMC ISILON NAS ON IP TO INFINIBAND NETWORKS WITH MELLANOX SWITCHX

Stovepipes to Clouds. Rick Reid Principal Engineer SGI Federal by SGI Federal. Published by The Aerospace Corporation with permission.

Managing your Red Hat Enterprise Linux guests with RHN Satellite

Cluster Implementation and Management; Scheduling

Enabling Technologies for Distributed and Cloud Computing

LS DYNA Performance Benchmarks and Profiling. January 2009

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

FLOW-3D Performance Benchmark and Profiling. September 2012

The following InfiniBand products based on Mellanox technology are available for the HP BladeSystem c-class from HP:

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Data Center Op+miza+on

Performance Evaluation of Linux Bridge

Interoperability Testing and iwarp Performance. Whitepaper

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

D1.2 Network Load Balancing

PCI Express and Storage. Ron Emerick, Sun Microsystems

OFA Training Program. Writing Application Programs for RDMA using OFA Software. Author: Rupert Dance Date: 11/15/

INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering

A Tour of the Linux OpenFabrics Stack

Building a Top500-class Supercomputing Cluster at LNS-BUAP

HPC Software Requirements to Support an HPC Cluster Supercomputer

Telecom - The technology behind

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

InfiniBand vs Fibre Channel Throughput. Figure 1. InfiniBand vs 2Gb/s Fibre Channel Single Port Storage Throughput to Disk Media

Architecting a High Performance Storage System

Preparation Guide. How to prepare your environment for an OnApp Cloud v3.0 (beta) deployment.

Creating Overlay Networks Using Intel Ethernet Converged Network Adapters

Deployment Guide. How to prepare your environment for an OnApp Cloud deployment.

Solution for private cloud computing

Michael Kagan.

High Speed I/O Server Computing with InfiniBand

760 Veterans Circle, Warminster, PA Technical Proposal. Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA

New Data Center architecture

How To Test Nvm Express On A Microsoft I7-3770S (I7) And I7 (I5) Ios 2 (I3) (I2) (Sas) (X86) (Amd)

Resource Scheduling Best Practice in Hybrid Clusters

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

PCI Express Impact on Storage Architectures. Ron Emerick, Sun Microsystems

Intel DPDK Boosts Server Appliance Performance White Paper

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Using the Intel Xeon Phi (with the Stampede Supercomputer) ISC 13 Tutorial

New Storage System Solutions

Know your Cluster Bottlenecks and Maximize Performance

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology

Oracle Database 11g Direct NFS Client. An Oracle White Paper July 2007

Logically a Linux cluster looks something like the following: Compute Nodes. user Head node. network

Highly-Available Distributed Storage. UF HPC Center Research Computing University of Florida

Sistemi ad agenti Principi di programmazione di sistema

Mellanox Academy Course Catalog. Empower your organization with a new world of educational possibilities

Networking Virtualization Using FPGAs

SOLUTION BRIEF AUGUST All-Flash Server-Side Storage for Oracle Real Application Clusters (RAC) on Oracle Linux

SMB Direct for SQL Server and Private Cloud

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms

Pedraforca: ARM + GPU prototype

Sun in HPC. Update for IDC HPC User Forum Tucson, AZ, Sept 2008

Achieving Performance Isolation with Lightweight Co-Kernels

Transcription:

Running Native Lustre* Client inside Intel Xeon Phi coprocessor Dmitry Eremin, Zhiqi Tao and Gabriele Paciucci 08 April 2014 * Some names and brands may be claimed as the property of others.

What is the Intel Xeon Phi coprocessor? Up to 61 Cores and 244 Threads per coprocessor 8 GB of memory Gen2x16 PCI Express: up to 8 GB/s 8 memory controllers supporting up to 16 GDDR5 channels delivering up to 5.5 GT/s (up to 352 GB/s bandwidth) http://intel.com/xeonphi 2

Typical Cluster Configuration Ethernet Switch InfiniBand Switch 3

Typical Cluster Configuration with Intel Xeon Phi coprocessors Ethernet Switch InfiniBand Switch 4

Intel Xeon Phi coprocessor Software Architecture Intel Xeon Phi coprocessors, based on Intel Many Integrated Core (Intel MIC) Architecture Intel MIC Software Stack (Intel MPSS) The Symmetric Communication Interface (SCIF) API is the communication backbone between the host processors and the Intel Xeon Phi coprocessors in a heterogeneous computing environment 5

6

Intel Xeon Phi coprocessors Static Pair Topology Every Intel Xeon Phi coprocessor card is assigned to a separate subnet known only to the host Host is a router for any network communications eth0 12.12.12.1 Host 10.10.10.1 mic1 11.11.11.1 10.10.10.2 11.11.11.2 7

Intel Xeon Phi coprocessors Internal Bridge Topology Bridging together multiple Intel Xeon Phi coprocessor card virtual network interfaces, on the same host Host is a router for external network communications eth0 12.12.12.1 Host br0 10.10.10.1 mic1 10.10.10.2 10.10.10.3 8

Intel Xeon Phi coprocessors External Bridge Topology Bridge the Intel Xeon Phi coprocessor virtual connections to a physical Ethernet device Independent network communications Maximum MTU is 9000 eth0 Host br0 12.12.12.1 mic1 12.12.12.2 12.12.12.3 9

Data transfer over virtual Ethernet All data passed through host memory Intensive data transfer cause intensive host usage eth0 memory CPU chip memory chip chip memory MIC0 MIC1 10

The NFS throughput over virtual Ethernet (MTU is 1500) 25.00 20.00 15.00 MB/sec 10.00 5.00 0.00 1 2 4 6 8 16 32 64 128 160 Threads readers re-readers writers rewriters Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. 11

The NFS throughput over virtual Ethernet (MTU is 64512) 200.00 180.00 160.00 140.00 120.00 MB/sec 100.00 80.00 60.00 40.00 20.00 0.00 1 2 4 6 8 16 32 64 128 160 Threads readers re-readers writers rewriters Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. 12

13

The virtual IB interface in Intel Xeon Phi coprocessor Become available in Intel MPSS starting with v3.1 Required to have the OFED version 1.5.4.1 InfiniBand drivers installed Supports Intel True Scale and Mellanox Fabrics eth0 Host ib0 MIC0 ib0 mic1 ib0 MIC1 14

Data transfer over virtual IB RDMA transfer passed directly to RDMA device RDMA device hardware is shared between Linux-based host and Intel Xeon Phi coprocessor applications eth0 memory RDMA device CPU chip memory chip chip memory MIC0 MIC1 15

The Coprocessor Operating System (coprocessor OS) Based on a standard Linux kernel source code Coprocessor OS is a minimal: Busybox minimal shell environment Linux Standard Base (LSB) Core libraries Can be extended with loadable kernel modules (LKMs) 16

The Lustre* 2.4 Client for Intel Xeon Phi coprocessor Download the SOURCE package from "MPSS 2.1 release for Linux" section Unpack it and then unpack the package-full_srck1om.tar.bz2 file in any location Execute the following commands in this place: # export PATH=/usr/linux-k1om-4.7/bin:$PATH # make defconfig-miclinux # make -C card/kernel ARCH=k1om modules_prepare # sh autogen.sh #./configure --with-linux=<unpacked_path>/card/kernel \ --without-o2ib \ --host=x86_64-k1om-linux --build=x86_64-pc-linux # make rpms * Some names and brands may be claimed as the property of others. 17

The Lustre* 2.5 Client for Intel Xeon Phi coprocessor Download the SOURCE, mpss-3.x-k1om.tar and OS specific files from "MPSS 3.x release for Linux" section Unpack from them kernel-dev-*.rpm, ofed-driver-*- devel-*.rpm and linux-*.tar.bz2 files Prepare Intel MPSS sources for Lustre* build: # rpm2cpio kernel-dev-*.rpm cpio idm # rpm2cpio ofed-driver-*-devel-*.rpm cpio idm # tar xjvf linux-*.tar.bz2 && cd linux-* # cp -f../boot/config-*.config # cp -f../boot/module.symvers-* Module.symvers #. /opt/mpss/3.x/environment-setup-k1om-mpss-linux # make ARCH=k1om silentoldconfig modules_prepare * Some names and brands may be claimed as the property of others. 18

The Lustre* 2.5 Client for Intel Xeon Phi coprocessor (cont) Build Lustre* sources for Intel Xeon Phi coprocessor with Intel MPSS sources with virtual IB support: #. /opt/mpss/3.x/environment-setup-k1om-mpss-linux # sh autogen.sh #./configure --with-linux=<path>/linux-2.6.38+mpss3.x \ --with-o2ib=<path>/usr/src/ofed-driver-*.el6.x86_64 \ --host=k1om-mpss-linux --build=x86_64-pc-linux # make rpms * Some names and brands may be claimed as the property of others. 19

Install & Configure the Lustre* Client Only two Lustre* RPMs should be installed on host: lustre-client-mic-<version>.x86_64.rpm lustre-client-mic-modules-<version>.x86_64.rpm ssh "echo 'options lnet networks=\"o2ib0(ib0)\"' > /etc/modprobe.d/lustre.conf" Host configuration in /etc/modprobe.d/lustre.conf options lnet networks= o2ib0(ib0) Mounting on Host: # mount t lustre 8.8.8.8@o2ib:/lustrefs /mnt/lustrefs Xeon Phi configuration in /etc/modprobe.d/lustre.conf options lnet networks= o2ib0(ib0) Mounting on Xeon Phi: # mount.lustre 8.8.8.8@o2ib:/lustrefs /mnt/lustrefs * Some names and brands may be claimed as the property of others. 20

The Lustre* throughput over virtual IB 1200.00 1000.00 800.00 MB/sec 600.00 400.00 280,00 200.00 0.00 1 2 4 6 8 16 32 64 128 160 Threads readers re-readers writers rewriters Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. 21

Server and storage infrastructure, networking and hardware support was supplied by Intel s High Performance computing Labs in Swindon (UK). Special thanks to Jamie Wilcox and Adam Roe. Questions? 22