Internatinal Telecmmunicatin Unin ITU-T Factring the Netwrk int The GRID Mnique J.Mrrw Distinguished Cnsulting Engineer Cisc Cisc Team: Masum Z. Hasan Nin Vidvic Wayne Clark Hrst Dumcke Dragan Milsavljevic Mnique Mrrw
Agenda ITU-T 1. Factring the Netwrk Int the GRiD 2. Vide Rendering User Case 3. Multi-autnmus Dmain Cnstructs and Challenges and Summary 2
Netwrk Factred Grid Netwrk (ISO/OSI) netwrk Layers 1, 2, 3, and 4 (L1/2/3/4) Issues we will encunter t factr netwrk int Grid 1. High speed data transfer 2. Netwrk abstractin A. Netwrk Virtualizatin 3. Netwrk resurces and services t factr int Grid A. Generally abstracted resurces and services 4. Dynamic prvisining f netwrk resurces fr Grid 5. Netwrk based perfrmance/qs mnitring 3
Advantages f Factring Netwrk int Grid ITU-T Factring netwrk int Grid will facilitate Better use f resurce intensive applicatins Better verall perfrmance (perfrmance and price rati) Netwrk-aware Grid services Netwrk-aware glbal Grid jb scheduling BW, QS and ther netwrk service prvisining (indirectly) Fine-tuning f netwrk parameters n-demand Fine-tuning f netwrk parameters and netwrk resurce prvisining n-demand will facilitate HPC and ther resurce intensive applicatins t mve t Grid 4
High speed Data Transfer Many Grid applicatins require high-speed data transfer ver high bandwidth-delay prduct (BDP) links TCP r TCP based prtcls are mst widely used N matter what is the speed/capacity f a link, thrughput may suffer TCP Thrughput <= 1.2 * MSS / (rtt * sqrt(packet_lss)) 180ms RTT (Geneva LA), packet lss 0.1% (.001) 1.2*1460*8/.18*sqrt(.001) 2.5 Mbps thrughput Frame size (Jumb Ethernet frame) f 9000 bytes 15 Mbps 5
TCP Thrughput Tuning ITU-T Set TCP buffer size t bandwidth-delay-prduct High-speed TCP (RFC 3649) Parallel TCP stream GridFTP Jumb Frame TCP Offlad Engine (TOE) RDMS (Remte Direct Memry Access: bypass kernel fr data cpy) CERN-Geneva t CalTech transfer using Cisc devices All intermediate ruters/switches n the path supprted 9000 byte MTU TCP buffer was cnfigured prperly Memry-t-memry data transfer at 6,5 Gbps with a single TCP stream between CERN Geneva and LA (fr CERN LHC data transfer) This is abut factring netwrk int the ITU-T/OGF Wrkshp n Next Generatin Netwrks Grid and Grids 6
TCP Friendliness ITU-T Aggressive TCP tuning, High-speed TCP may ptentially starve traffic belnging t standard TCP Shuld be careful in deplying in cmmercial envirnments (mixing with standard stack) Tls that factr netwrk int Grid can cntrl this 7
Netwrk Abstractin Dmain and Layer Separatin Dmain and layer separatin is predminant in the netwrk wrld ISO/OSI r Prtcl layer separatin UNI (User t Netwrk Interface) between netwrk dmains Between client and server netwrks Between a custmer and a Netwrk Service Prvider (NSP) Between an IP Data and a Snet/SDH/Optical transprt netwrk Separatin f management and cntrl between data and transprt netwrks in NSP Envirnment If Enterprise IT (EIT) r NSP has t ffer Grid services n existing infrastructure, similar dmain and layer separatin is needed in Grid envirnment 8
Netwrk Abstractin Dmain and Layer Separatin What can r cannt be dne Cnsider the fllwing questins: Shuld an L7 Grid system (such as a Grid applicatin r a glbal Grid jb scheduler) Be allwed t cnfigure full capabilities f a ruter, such as cnfigure ruting prtcls? Be allwed t get full access t netwrk tplgy t make scheduling decisins? Be aware f wide varieties f netwrk technlgies (Ethernet, Optical, MPLS, etc.) n which Grid can be built? Be aware f what QS mechanism being used: 802.1p, IntServ, DiffServ, MPLS? Be aware f varius signaling prtcls: classical RSVP, SIP, H.323? The general answer is n An NSP r Enterprise IT will nt allw access t ruter cnfiguratin r full tplgy Separatin can be ignred in research part f Natinal Research Netwrks (NRN) r purely research netwrks, but it can nt be, in cmmercial netwrks 9
Netwrk Abstractin Dmain and Layer Separatin L7 Grid Systems (L7GS): Grid Applicatins, Grid Clients, Grid Middleware Cmpnents ABC Site1 Firewall CE1 Lad-Balancer IDS SSL Acceleratr Campus Backbne LAN PE1 MAN/WAN P2 P1 PE3 XYZ Site1 CE3 Firewall Lad-Balancer IDS SSL Acceleratr Campus Backbne LAN An L7GS residing in the yellw regin Can see the abstract tplgy shwn red L7 Grid Dmain PE2 P3 L7 Grid Dmain Cannt see the full physical (r lgical) tplgy f NSP/EIT ABC Site2 CE2 Firewall Lad-Balancer IDS SSL Acceleratr Campus Backbne LAN L7 Grid Dmain PE4 XYZ Site2 Firewall Lad-Balancer IDS SSL Acceleratr CE4 Campus Backbne LAN L7 Grid Dmain Cannt directly cnfigure and prvisin the EIT r NSP netwrk elements (NE) with the knwledge f the abstract tplgy, L7GS can perfrm netwrk-aware Grid functins 10
Netwrk Abstractin Dmain and Layer Separatin L7 Grid Systems (L7GS): Grid Applicatins, Grid Clients, Grid Middleware Cmpnents ABC Site1 LAN MAN/WAN XYZ Site1 LAN This abstract tplgy is what an L7GS may be allwed t see Requires mdeling f abstract resurces L7 Grid Dmain ABC Site2 CE2 Campus Backbne LAN XYZ Site2 L7 Grid Dmain LAN Functin serves t an L7GS Such abstracted tplgy And relevant netwrk service interfaces Prvisining Mnitring Netwrk parameter tuning L7 Grid Dmain L7 Grid Dmain 11
Netwrk Resurces and Services ITU-T Nt all resurces r interfaces may be factred in Netwrk Elements themselves r their links Prtcl resurces and their cnfiguratin interfaces Resurces and interfaces that may be factred in are netwrk service related Bandwidth QS VPN Firewall Resurces will be abstracted; Fr example: Exact nature f QS mechanism, such as DiffServ, IntServ, 802.1p, r MPLS will be hidden frm L7GS Exact nature f bandwidth tunnel, such as MPLS TE LSP tunnel r Snet/SDH Circuit r DWDM Lambda LightPath, will be hidden frm L7GS 12
On-demand/Dynamic Netwrk Prvisining ITU-T Netwrk administratrs r peratrs (Service Prviders) generally limit r priritize resurce usage Example: Plicing, Queuing (based n packet marking) Demand fr mre bandwidth, pririty may nt be serviced immediately even thugh resurces available, because f static cnfiguratin Demand fr new features, such as new QS pririty fr a new applicatin cannt be serviced immediately Applicatins, f curse, can use dynamic signaling prtcls like classical RSVP, but the signaling slutin has limitatins Wide varieties f Grid applicatins need t be mdified t supprt signaling Applicatins may need t supprt mre than ne signaling mechanisms in a hetergeneus envirnment like Grid Signaling, such as classical RSVP may nt be supprted end-t-end (E2E) But applicatins smehw shuld be able t signal (request) bandwidth, QS and ther netwrk services (such as VPN, firewall related) n-demand 13
On-demand/Dynamic Netwrk Prvisining cntd On-demand r dynamic netwrk prvisining des nt necessarily mean Immediate prvisining (rather future resurce scheduling) Full prvisining Netwrk resurces are prvisined where and when necessary May make use f existing cnfiguratins One single signaling Resurces may be prvisined E2E using cmbinatin f, fr example, classical RSVP, RSVP aggregatin, CLI, XML, and ther interfaces 14
Grid Acrss Wider Netwrk Dmain In a Grid envirnment there is strng need t facilitate Resurce sharing acrss wider netwrk dmain Nt just single Cluster r LAN, but als Enterprise campus netwrk, multi-site Data Center, MAN and WAN Deplyment f applicatins ver wider netwrk dmain, which Grid users generally shy away frm fr lack f cntrl ver wider netwrk Executin f High-Perfrmance Cmputing (HPC) applicatins acrss wider netwrk dmain (WND) Latency in WND may far exceed fast intercnnects (IPC fabric) used in (single-rack/rm) dedicated clusters r supercmputers Many HPC applicatins with high resurce demand can be deplyed in WND Facilitate cst-effective Grid resurce sharing and deplyment With cntrl ver netwrk the dmain f resurces t be shared is widened Fr example, resurces (servers, clusters) in an Enterprise distributed glbally acrss multiple lcatins (SJ, RTP, Eurpe, India) can be pled int a Grid Nt necessarily need supercmputer r high-end clusters 15
Use case: Vide Rendering n Grid Vide rendering and encding after pst-prductin vide editing This is a resurce intensive applicatin frm bth cmpute and netwrk perspective: Cmpute resurces 1 min vide may take 10 min t render Fr example, Pixar: Mnster Inc. cartn: 150 days n 2000 nde clusters with ne GFLOP each Netwrk resurces Initial lading f applicatin n large number f prcessrs During executin data may be carried frequently back and frth between prcessrs and between prcessrs and strage In the testbed Vide data rendered n 4 (grid) machines in parallel Functin that creates 3 CsLinks frm surce machine (WS1) t 3 ther machines (WS2-WS4) t transfer data Vide data transferred ver prvisined CSLinks, then rendered and merged back One CsLink then created frm machine cntaining rendered vide t a client machine and vide streamed Site 1 WS1 lnx1 SW1 CE1 PE1 P1 P2 PE3 CE3 SW3 Site 3 WS3 Site 2 WS2 SW2 CE2 PE2 P3 PE4 CE4 SW4 Site 4 WS4 16
Factring Netwrk in GriD Multi-Autnmus System Netwrk E2E prvisining and QS is specifically required in Grid envirnments as multiple rganizatins may participate in a Grid fr resurce sharing Shuld be able t prvide E2E Multi-AS prvisining supprt in [G]MPLS netwrks Inter-AS/CsC TE LSP, MPLS VPN supprted in MPLS devices Multi-AS QS is a challenge Standard frum supprt, such as IPSphere, IETF needed RA O-UNI A INTER-Dmain GMPLS-TE LSP S1 B Example f E2E Prvisining in Multiregin/Prvider, Multitechnlgy, Multilayer (IP, Transprt/Optical) Netwrks F C S2 INTRA-Dmain GMPLS-TE LSP O-UNI B C D S4 S3 RA A E-NNI S2 E N32 S1 G Rute Cntrller Metr Cre Reginal E-NNI Ruting: OSPF Metr Ruting Adjacency Cntrl Plane NE/Rute Cntrller/Speaker Data Plane Brder NE D E N32 S3 F G S4 Z Z RZ RZ 17