NEGST project Tools & software for flow processing in Grid networks Pascale.primet@inria.fr 1
Summary : our NEGST Y2007 activity Context GRIDNET-FJ : INRIA associated team NEGST : JST- CNRS (Naregi-Grid5000) Current activities TCP variants evaluation with GNET10 & Grid5000 L2 Switches behavior investigation Packet capture (GNET1) for traffic analysis Flow Scheduling & Network resource reservation MPI evaluations in Grid5000 Grid5000-Naregi link establishment Some achievements 2 Pfldnet2007 papers (AIST & RESO), HSN07, GridNets07, Cluster07 sub. Collaborative experiments /design RESO & AIST with GNET1 & GNET10 Grid5000-Naregi link is quite ready to use 2
Tools and software for flow processing in Grid networks
Introduction Grid traffic is a mix of highly heterogeneous flows with specific characteristics and constrains We propose to combine tools and software for processing flows at different levels /places of the grid : GNET1 & GNET10 (AIST): for efficient flow/packet processing PSPacer (AIST): for simple packet pacing GridMPI (AIST): for efficient MPI WAN communications BDTS (RESO): a Grid Service for controlled bulk data transfers FlowCTL(RESO): simple API & software for fine grain flow time/rate control MetroGRID (RESO): A Grid Service to measure and model grid traffic HIPCAL/eWAN (RESO): IP-level paradigms for virtual private clusterization 4
Packet processing: GNET1 & GNET10 Latency emulation: explore wide area behavior Precise measurement (delays, aggregate throughput) High speed packet generation TCP variants evaluation (10G - 1ms to 200ms) L2 limits in congestion situation (starvation phenomenon) Evaluation of rate limitation mechanisms (Pspacer) Compare hardware and software emulation solution (collab with MESCAL- Grenoble) High Speed Packet capture Header extraction Header concatenation and sending Precise and fine grain traffic & flow analysis => Concurrent alternative explored: network processor solution 5
TCP variants evaluation A study of large flow interactions in high-speed shared networks with Grid5000 and GtrcNET-10 Instruments - PFLDNET2007 Romaric Guillier, Ludovic Hablot, Yuetsu Kodama, Tomohiro Kudoh, Fumihiro Okazaki, Pascale Vicat-Blanc Primet, Sébastien Soudan and Ryousei Takano 1 to N (=40) nodes with 1 or 10GbE interfaces iperf iperf iperf 10 GbE 1 to N ( =40) nodes with 1 or 10GbE interfaces iperf iperf iperfd GNET G5K Grid5000 backbone Or 10Gb/s WAN emulator (GtrcNET1 or GtrcNET10) Next step: Naregi-Grid5000 link...... 6
Reverse traffic impact on efficiency and fairness [Guillier - Soudan - Primet - High Speed Networks workshop of Infocom2007] Without reverse traffic: Efficiency, Fairness & equilibrium Around 470Mb/s per flow 940Mb/s global Congesting reverse traffic: No efficiency, Fairness & unstability Around 160Mb/s per flow 2 flows forward / 2 flows: reverse : Affects flow performance by 50% & global throughput, unstability Without reverse traffic: Efficiency, Fairness & equilibrium Around 470Mb/s per flow 9200 Mb/s global With reverse traffic: No fficiency, Fairness & unstability Around 400Mb/s per flow 8200 Mb/s global 19 flows forward /19 flows reverse Affects flow performance by 20% & global throughput, unstability 7
Reverse traffic and congestion level [HSN 2007]
Packet capture with GNET1 plan Yichun Li, Yuetsu Kodama, Tomohiro Kudoh Paulo Goncalvès, Pascale Primet
Activities around Grid MPI eval in G5K Ludovic Hablot, Olivier Gluck, Pascale Primet with the help of GridMPI team at AIST [INRIA RR 6200, Cluster 2007 submission] 10
Bulk Data Transfer Scheduling (BDTS) Sebastien Soudan, Dinil M. Divakaran, Chen Cheng, Pascale Primet Goal of the service : Improve Bulk Data Transfers: -delay determinism & reliability by using => admission control & scheduling at flow level => dynamic provisioning at physical (optical) level -Performance : ie : Resource utilization & transfer duration (utility functions) optimize resource allocation handle congestion control (TCP performs better) optimize on demand resource provisionning BDTS can be seen as a General model for E2E bandwidth reservation 11
BDTS usage example 12
BDTS: Reservation flexibility A bulk data transfer can start from any time after its arrival, at any and even time variant bandwidth value, as long as it is completed before its deadline 13
BDTS problem : objective function Previous works minimized makespan given network capacity constraint, our approach is to minimize congestion given time constraints: 1. Can deal with the case when different tasks has different (hard) time constraint 2. Suitable for networks where capacity is provisioned based on the scheduling result (e.g. optical or overlay network) 3. Minimize congestion improves performance of coexisting best-effort interactive traffic 14
Bulk Data Transfer Scheduling problem [Bin Bin Chen, Pascale Primet - CCGrid2007 - ICC2007] 15
Bulk Data Transfer Scheduling problem 16
BDTS : step function 17
Evaluation: simulation on random topology 18
BDTS current work & perspectives BDTS service specification and API definition BDTS scheduler implementation & deployment in Grid5000 End-host based time/rate control mechanisms (FlowCTL) & interaction with High Speed TCP variants Interaction with candidate users (upper layers : workflow engines, SAGA, Distributed File System, RPC ) Interaction with Bandwidth/light path reservation services (lower layers) 19
Conclusion and perspectives Grid traffic is a mix of highly heterogeneous flows with specific characteristics and constrains Large flows with QoS requirements (transfer delay & reliability) Small messages sensitive to slow start effect Grid management & control traffic High performance encrypted channels We propose to combine tools and software for processing flows at different level /place of the grid to Measure & monitor the traffic (awareness, admission control) Differentiate the packets (IP QoS, traffic & flow management) Schedule the flows (congestion control) manage, secure & virtualise virtual clusters interconnexion 20
Future plan for our collaboration Context GRIDNET-FJ : INRIA associated team NEGST : JST- CNRS (Naregi-Grid5000) PhD student, Sebastien Soudan at AIST ( 6 months - bourse Lavoisier) Topics TCP variant benchmkg (ewan) on G5K-Naregi tb (Romaric Guillier - PhD) Packet Header capture on Grid5000 - Naregi link (Yichun Li - IE) Header capture & sampling(gnet10) for traffic analysis (Patrick Loiseau - PhD ) FlowCTL, BDTS & G-Lambda GNS integration (Sebastien Soudan - PhD) Virtual clusters interconnection optimization ( Dinil Mon Divakaran - PhD) MPI evaluations in Grid5000-Naregi (Ludovic Hablot - PhD) Other possible perspectives Collaborative experiments G5K & Naregi on Grid Traffic measurement (Naregi), GridFTP (Osaka) Collaborative Demo at SC07 (with AIST: GNET, ewan, G5K, Naregi ) Standardisation activities (GHPN, NML) RESO is hosting GridNets2007 (210-07) & CCGrid2008 & probably OGF23 in 05-2008 à Lyon 21