Understanding Latency in Software Defined Networks Junaid Khalid Keqiang He, Sourav Das, Aditya Akella Li Erran Li, Marina Thottan 1
Latency in SDN Mobility Centralized Controller Openflow msgs 1 flows Time taken to install 1 rules? Can Some be as XXX large msecs?? as 1 secs!!! 2
Do applications care about latency? Intra-DC Traffic Engineering Latency is critical to many applications MicroTE routes predictable traffic on time scales of 1-2s Route Limits updates the applicability can take as long of SDN as applications.5s Latency can undermine MicroTE s effectiveness Fast Failover Applications assume latency is low and constant Reroute the affected flows quickly in face of failures Longer update time increases congestion and drops Latency can inflate failover time by nearly 2s! 3
Factors contributing to Latency? Robust Control Software Design and Distributed Controllers Speed of Control Programs and Network Latency Not received much attention Latency in network switches 4
Our Work Two contributions: Systematic experiments to explore latencies in production switches Latency in network switches Design a framework to overcome the impact of latencies 5
Outline Motivation Elements of Latency Measurement Methodology Inbound Latency Outbound Latency Insertion Modification Deletion 6
Outline Motivation Elements of Latency Measurement Methodology Inbound Latency Outbound Latency Insertion Modification Deletion 7
PCI Elements of Latency 1. Inbound Latency Controller Switch Switch Fabric CPU board I1 Memory DMA Forwarding Engine Lookup I2 ASIC SDK CPU I3 No Match Hardware Tables OF Agent Inbound Latency I1: Send to ASIC SDK I2: Send to OF Agent I3: Send to Controller PHY PHY Packet 8
PCI Elements of Latency 1. Inbound Latency 2. Outbound Latency Controller O1 Switch Switch Fabric CPU board Memory Forwarding Engine DMA Lookup ASIC SDK CPU OF Agent O4 Hardware Tables O2 O3 Outbound Latency O1: Parse OF Msg O2: Software schedules the rule O3: Reordering of rules in table O4: Rule is updated in table PHY PHY 9
Outline Motivation Elements of Latency Measurement Methodology Inbound Latency Outbound Latency Insertion Modification Deletion 1
Latency Measurements - Setup POX Controller Pktgen Libpcap eth eth1 eth2 Control Channel Flows IN Flows OUT Openflow Switch Switch CPU RAM Flow table size Data Plane Vendor A 1 Ghz 1 GB 896 14*1Gbps + 4*4Gbps Vendor B 2 Ghz 2 GB 496 4*1Gbps +4*4Gbps 11
Outline Motivation Elements of Latency Measurement Methodology Inbound Latency Outbound Latency Insertion Modification Deletion 12
Inbound Latency Increases with flow arrival rate CPU Usage is higher for higher flow arrival rates Flow Arrival Rate (packets/sec) Low Power CPU Mean Delay per packet_in (msec) CPU Usage ( % ) - 7.1 1 3.32 15.7 2 8.33 26.5 vendor A switch 13
inbound delay (ms) inbound delay (ms) Inbound Latency Increases with interference from outbound msgs 3 25 2 15 1 5 2 4 6 8 1 flow # (a) with flow_mod/pkt_out 3 25 Low Power 2CPU 15 1 Software Inefficiency 5 2 4 6 8 1 flow # (b) w/o flow_mod/pkt_out vendor A switch 2 flows/sec 14
Outline Motivation Elements of Latency Measurement Methodology Inbound Latency Outbound Latency Insertion Modification Deletion 15
Outbound Latency Latency for three different flow_mod operations Insertion Modification Deletion Impact of key factors on these latencies Table occupancy Rule priority structure 16
Outline Motivation Elements of Latency Measurement Methodology Inbound Latency Outbound Latency Insertion Modification Deletion 17
insertion delay (ms) insertion delay (ms) Insertion Latency Priority Effects 3 25 2 15 1 5 2 4 6 8 1 2 4 6 8 1 (a) Burst size 1, same priority (b) Burst size 1, increasing priority Vendor B switch 3 25 2 15 1 5 18
insertion delay (ms) insertion delay (ms) Insertion Latency Priority Effects 3 25 2 15 1 5 2 4 6 8 1 2 4 6 8 1 (a) Burst size 1, same priority (b) Burst size 1, increasing priority Vendor B switch 3 25 2 15 1 5 1 1 x TCAM 19
insertion delay (ms) insertion delay (ms) Insertion Latency Priority Effects 3 25 2 15 1 5 2 4 6 8 1 2 4 6 8 1 (a) Burst size 1, same priority (b) Burst size 1, increasing priority Vendor B switch 3 25 2 15 1 5 1 1 x 99 TCAM 2
insertion delay (ms) insertion delay (ms) Insertion Latency Priority Effects 3 25 2 15 1 5 2 4 6 8 1 2 4 6 8 1 (a) Burst size 1, same priority (b) Burst size 1, increasing priority Vendor B switch 3 25 2 15 1 5 1 1 99 x 11 TCAM 21
insertion delay (ms) insertion delay (ms) Insertion Latency Priority Effects 3 25 2 15 1 5 2 4 6 8 1 2 4 6 8 1 (a) Burst size 1, same priority (b) Burst size 1, increasing priority Vendor B switch 3 25 2 15 1 5 11 1 1 99 x TCAM 22
insertion delay (ms) insertion delay (ms) Insertion Latency Priority Effects 1 8 1 8 6 4 2 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 (a) Burst size 8, same priority (b) Burst size 8, decreasing priority Vendor A switch 6 4 2 23
insertion delay (ms) insertion delay (ms) Insertion Latency Priority Effects 1 8 1 8 6 4 2 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 (a) Burst size 8, same priority (b) Burst size 8, decreasing priority Vendor A switch 6 4 2 x TCAM 24
insertion delay (ms) insertion delay (ms) Insertion Latency Priority Effects 1 8 1 8 6 4 2 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 (a) Burst size 8, same priority (b) Burst size 8, decreasing priority Vendor A switch 6 4 2 x 1 11 12 13 14 TCAM 25
insertion delay (ms) insertion delay (ms) Insertion Latency Priority Effects 1 8 1 8 6 4 2 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 (a) Burst size 8, same priority (b) Burst size 8, decreasing priority Vendor A switch 6 4 2 99 x 1 11 12 13 14 TCAM 26
insertion delay (ms) insertion delay (ms) Insertion Latency Priority Effects 1 8 1 8 6 4 2 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 (a) Burst size 8, same priority (b) Burst size 8, decreasing priority Vendor A switch 6 4 2 99 x 1 11 12 13 14 TCAM 27
insertion delay (ms) insertion delay (ms) Insertion Latency Priority Effects 1 8 1 8 6 4 2 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 (a) Burst size 8, same priority (b) Burst size 8, decreasing priority Vendor A switch 6 4 2 99 x 1 11 12 13 14 TCAM 28
avg completion time (ms) avg completion time (ms) Insertion Latency Table occupancy Effects Rule Priority Structure and Table Occupancy Affected by the types of rules in the table 2 18 16 14 12 1 8 6 4 2 table 1 table 4 2 TCAM Organization and 18 # Of Slices 16 14 12 1 Switch Software Overhead 8 1 2 3 4 5 6 burst size (a) low priority rules into a table with high priority existing rules 6 4 2 table 1 table 4 1 2 3 4 5 6 burst size (b) high priority rules into a table with low priority existing rules Vendor B switch 29
Outline Motivation Elements of Latency Measurement Methodology Inbound Latency Outbound Latency Insertion Modification Deletion 3
modification dely (ms) modification delay (ms) Modification Latency Higher than Insertion latency for vendor B Not affected by rule priority but affected by table occupancy 16 14 12 1 8 6 4 2 16 14 12 1 Poorly optimized software in vendor B 8 6 4 2 2 4 6 8 1 5 1 15 2 (a) 1 rules in the table (b) 2 rules in table Vendor B switch 31
Outline Motivation Elements of Latency Measurement Methodology Inbound Latency Outbound Latency Insertion Modification Deletion 32
deletion delay (ms) deletion delay (ms) Deletion Latency Higher than Insertion latency for both vendor A and B Not affected by rule priority but affected by table occupancy 12 1 12 1 8 6 4 2 2 4 6 8 1 (a) 1 rules in table 8 6 4 2 5 1 15 2 (b) 2 rules in table Vendor B switch 33
deletion delay (ms) deletion delay (ms) Deletion Latency Higher than Insertion latency for both vendor A and B Not affected by rule priority but affected by table occupancy 14 12 1 8 6 4 2 8 Deletion is causing TCAM 6 Reorganization 4 2 4 6 8 1 (a) 1 rules in table 14 12 1 2 5 1 15 2 (b) 2 rules in table Vendor A switch 34
Summary Latency in SDN is critical to many applications Assumption: Latency is small or constant Latency is high and variable Varies with Platforms, Type of operations, Rule priorities, Table occupancy, Concurrent operations Key Factors: TCAM Organization, Switch CPU and inefficient Software Implementation 35
Summary Latency in SDN is critical to many applications Assumption: Latency is small or constant Latency is high and variable Varies with Platforms, Type of operations, Rule priorities, Table occupancy, Concurrent operations Key Factors: TCAM Organization, Switch CPU and inefficient Software Implementation 36
Backup Slides 37
Impact of Concurrent CPU jobs Impacts insertion delay. E.g. Polling Statistics Polling stats impacts more when table occupancy is higher table occupancy table occupancy 5 12 1 8 6 4 2 Low Power CPU delays concurrent jobs 1 2 3 4 5 flow stats polling count Vendor B 38