Seminar Report Elasticity in virtual middleboxes using NFV/SDN Author: Mihir Vegad J. Guide: Prof. Purushottam Kulkarni A report submitted in partial fulfilment of the requirements for the degree of Master of Technology in the Computer Science and Engineering Department of Computer Science and Engineering Indian Institute of Technology, Bombay
Acknowledgement I would like to thank my guide, Prof. Purushottam Kulkarni for giving me the opportunity to work in this field. I really appreciate the efforts which he made for us in every seminar meeting, to understand the work done by us and then to guide us to the next step. During this process, I learned a lot and overall it has created strong base for me in the field of NFV/SDN. 1
Abstract Today, in the era of the internet, most part of the network is implemented using traditional network architecture. As requirements of network applications are changing and no. of end users are increasing with time, legacy network may not meet upto the expectations. We will discuss some of the shortcomings of the legacy networks. Software-defined networking (SDN) is a new a way of networking which can address the issues of the legacy network. When it comes to virtualizing a network, Network Function Virtualization is also an emerging technology against SDN. But, it has its own challenges to address. SDN can be a solution to some of them if NFv and SDN are used together. But SDN also has its own set of challenges. We will discuss some of the challenges in detail. We will try to solve elasticity or dynamic scalability issue of NFV(SDN). We will see how each of the solution have different degree of participation from NFV and SDN side. At the end we will compare these solutions and discuss pros and cons of them. 2
Contents References 1 1 Introduction 5 1.1 Software-Defined Networking....................... 5 1.1.1 Rise of SDN............................ 5 1.1.2 SDN Architecture......................... 6 1.2 Network Function Virtualization..................... 7 1.2.1 Rise of NFV............................ 7 1.3 Emergence of SDN + NFV........................ 8 1.4 Scope of the seminar........................... 10 2 Challenges in SDN 12 2.1 Controller issues.............................. 12 2.1.1 Reliability :............................ 12 2.1.2 Placement problem :....................... 13 2.2 Data plane issues............................. 13 2.2.1 Flow setup latency :....................... 13 2.2.2 Datapath between ASIC and CPU in switches :........ 13 2.3 General issues............................... 14 2.3.1 Scalability :............................ 14 2.3.2 Load balancing :......................... 15 2.3.3 Security :............................. 15 3 Elasticity in Virtual Middleboxes 16 3.1 Elasticity in MB application using VNF in SDN environment..... 17 3.2 Elasticity of MB application using SDN elements........... 18 3.3 Efficient Control Plane Architecture using NFV and SDN....... 20 4 Comparing the solutions 21 3
List of Figures 1.1 SDN architecture............................. 6 1.2 NFV framework.............................. 8 1.3 SDN + NFV............................... 9 1.4 Routing using NFV + SDN....................... 10 2.1 classification of challenges in SDN.................... 12 3.1 Classification of the state of the middlebox............... 16 3.2 Split/Merge architecture......................... 18 3.3 Openflow based load balancer...................... 19 3.4 OpenNF architecture........................... 20 4.1 Comparision................................ 22 4
Chapter 1 Introduction One of the key factor in the rise of the cloud computing in recent years is, the core system infrastructure including computer resources, storage, and mainly networking is becoming software-defined. Modern applications and platforms can specify their fine-grained needs, thus precisely defining the virtual environment in which they want to run instead of being limited by physical infrastructure. NFV and SDN are answer to most of the inefficiencies and barriers to innovation that exist in traditional network architecture. 1.1 Software-Defined Networking In the era of internet, modern applications requires internet to be fast, scalable, dynamic/flexible, highly available and to have large bandwidth. Traditional networks have become relatively static, hardware dependent and complex to handle. In this section, We will see why SDN is an emerging network architecture and fundamental components of SDN. 1.1.1 Rise of SDN The basic building blocks of the traditional network architecture are routers/switches. Let s see some of the features of legacy networks and consequences of that. Legacy networks have switching devices, each of which does the decision making part that is to compute the forwarding path for incoming packets. So Network intelligence is distributed among various hardware components. This means that there is not a single node in the network who has complete view of it. This design was adopted to achieve scalability but it makes network very inflexible as deploying new device or service in the network requires to configure lots of network nodes. As video traffic, big data, mobile usage, no. of servers, virtual machines in data center and traffic among server to server communication increases, the requirements to handle them in network also increases. They want network to be flexible, agile and scalable. But this node based control plane architecture is very rigid such that it gives very less opportunity to network operators to program the network to meet customer requirements. Most of the network functionalities are implemented in hardware i.e. firewall, DNS, caching, IDS, routing etc. If we want to deploy any new service in net- 5
work, we first need compatible hardware for that service. For this, we have to be dependent on hardware manufacturers. So, in real time it causes delay in deployment of the new service in the market. And hardware components are also prone to wear and tear, so maintenance is not easy as well. Software-Defined Networking offers an architecture which provides network functionalities implemented in software. It adopts flexible control plane design which makes network programmable to meet dynamic requirements of the client. These are some of the key features of the SDN. All the network intelligence resides in an aggregated and centralized control plane node named controller. Controller is implemented in software. They can accommodate changes quickly and easier to maintain as well. It is not dependent on hardware. Control plane and data plane are separated from each other. Control plane logic resides in controller and networking elements such as switches just do the forwarding part. So network operators see controller and underlying hardware as an abstraction and he/she can write various control applications on top of it. 1.1.2 SDN Architecture SDN APPLICATION 1 APPLICATION LAYER SDN APPLICATION 1 SDN APPLICATION 1 NORTHBOUND INTERFACE CONTROL PLANE EAST WEST PROTOCOL CONTROLLER EAST WEST PROTOCOL SOUTHBOUND INTERFACE DATA PLANE SWITCH 1 SWITCH 2 SWITCH 3 SWITCH N HOST 1 HOST 2 HOST 1 HOST 1 HOST 1 HOST 2 Figure 1.1: SDN architecture PHYSICAL/VIRTUAL HOST Compare to traditional networks, there are four more components in SDN. Control plane: controller or control plane has the complete view of network infrastructure, which allows network operator to deploy any service throughout the network. Now a days, most widely used controller is NOX. 6
Northbound interface: it is an interface between controller software and applications running on top of the network architecture. Northbound APIs can be used to implement basic network functions like path computation, loop avoidance, routing, security etc. They are open source as well. East-West protocols: In case of distributed controller architecture, east-west protocols manage communication among the controllers. All the controllers share the control plane parameters like QoS, policy information etc by using these protocols. Data plane & Southbound protocols: data plane consist of only switches which works as forwarding elements. Southbound API handles communication between controller and underlying network hardware. There are various southbound APIs available i.e. openflow, ovsdb, forces. The most popular southbound API is openflow. SDN is growing from strength to strength but there are some challenges like scalability, latency, placement of controller, more flexibility of data plane etc. We will see these challenges in some detail in upcoming chapters. 1.2 Network Function Virtualization NFV is an attempt to virtualize the network applications, which are currently being implemented in proprietary hardware. NFV attracted those service providers who wanted to accelerate the service deployment life cycle in order to increase their growth and revenue. These service providers came together as a group named European Telecommunications Standards Institute (ETSI). They come up with the definitive reasons to support NFV. 1.2.1 Rise of NFV Traditional network has following issues, which is drawing more and more attention towards NFV.[5] Increasing no. of hardware appliances such as routers, firewalls, switches. So space and power needed to accommodate these hardware also increases. Appliances have short life span and difficult to maintain A long design-integrate-deploy cycle to deploy new network functions Difficult to scale up or scale down dedicated hardware as per dynamic requirements As a solution to these problems, service providers replaced these appliances with software running on the, off the shelf (COTS) hardware. This approach provides several benefits. Main features of the NFV are, Decouple the network functions such as routing, firewalls, load balancers, NAT, caching etc. from their dedicated hardwares and implement them in software. 7
Host those network functions on virtual machines. Now these functions are under the control of hypervisor, they can be executed on standard machines instead of some dedicated hardware. NFV helps to reduce the hardware cost by replacing dedicated hardware, provides dynamic scaling of the system as it runs on VMs to reduce wasteful over provisioning of hardware and reduce the time to deploy new networking services to market. Now, we will see higher level architecture of NFV. NFV Architecture Network Functions Load Balancer Router Firewall NAT NFV MANAGEMENT AND ORCHESTRATION NFV INFRASTRUCTURE Virtual compute Virtual storage Virtual network Virtualisation Layer Hardware resources Compute Storage Network Figure 1.2: NFV framework NFV management and orchestration : It handles provisioning and connection of virtual network functions with other network resources. In a virtualized network using virtual machines and virtual switches, these resources include compute, connect and storage resources. We are familiar with other parts of the framework, which are hardware resources, virtualisation layer(hypervisor), VMs, and network functions implemented in software. Now, going ahead we will see how NFV and SDN serves the purpose of virtualizing the network and can they co-exist?, if yes then what are the advantages of that. 1.3 Emergence of SDN + NFV SDN and NFV complement each other in terms of functionalities they provide. They are independent of each other and can be implemented separately. Both technologies evolved to meet some or other shortcomings of traditional networks mentioned in above section. 8
SDN Network Framework Automation AUTOMATION + PROVISIONING NFV Resource Provisioning Figure 1.3: SDN + NFV SDN was invented to make whole network programmable, to give centralized control to network providers and to provide a simple data plane architecture. NFV was created to get rid of dedicated network appliances, to reduce delivery to market time of an application and to reduce capital expenditure and operational expenditure. So, if both of them used together, SDN provides network framework abstraction and NFV provides virtual network functions inside that framework. Following are some of the benefits of using NFV and SDN together instead of NFV over traditional network.[6] Rapid growth of IP end points, Because of virtualization of network functions, no. of network end points will increase at higher rate when using NFV than existing network. This is result in huge load on networks. As we discussed in beginning of the chapter current network may not be able to handle that. SDN can solve this issue. Network end point mobility, virtual network functions can be migrated easily on other servers or at different location in different networks compare to hardware network appliances. So, underlying network should be flexible enough to accommodate these changes quickly. But in traditional network it is complex and we need to configure many nodes in order to support this. But SDN can solve this by providing programmable network. Elasticity, our requirement is to create, replicate, destroy VNFs on demand in real time. To meet this requirement, again network should be easy to configure and flexible. Elasticity leads to optimal use of resources. SDN can provide such support to VNFs. Multi-tenancy, multi-tenancy needed by cloud forces NFV to allow the use of software overlay network. SDN can provide this kind of network in a simple and efficient way compared to traditional network. Virtualized routing functionality using NFV and SDN in comparison with others is shown below. 9
data plane control plane users private ip services public ip services traditional router network users Traditional Network Router router application on VMs VM VM users private ip services public ip services network users switch Virtualized Router using NFV router application on VMs VM VM users private ip services public ip services network users switch Virtualized Router using NFV + SDN Figure 1.4: Routing using NFV + SDN 1.4 Scope of the seminar I started reading the survey papers related to SDN. I learned what is SDN, how it fuctions, and major research challenges in SDN. Initially I thought of working on some issues in control plane. But While learning the challenges, I came across NFV integration in SDN. It motivated me to read more about NFV and why should we use combined NFV and SDN. Then I narrowed down my scope to one of the research 10
challenge in NFV + SDN, Elasticity. And then I learned different solutions to to mitigate this problem. We will see examples of few more network applications implemented by NFV and SDN in different ways to see how it mitigates the issues of respective technologies in upcoming chapters. 11
Chapter 2 Challenges in SDN SDN promises to simplify network operations as well as it lowers the total cost of network applications by providing programmable network services. SDN has numerous advantages over traditional network as we discussed in first chapter. But it has its own set of challenges which can cut short the advantages and affects the performance of the network specially in cloud environments. We will discuss some of these challenges in this section. Data plane - flow setup latencies - ASIC and CPU limitations Controller - reliability - placement problem SDN CHALLENGES General - scalability - load balancing - security Figure 2.1: classification of challenges in SDN 2.1 Controller issues 2.1.1 Reliability : In legacy networks when any of the network device fails, all the packets passing through that node will be re-routed to alternative path which doesn t contain the failed node. It was robust in this sense. In SDN, controller handles the whole network. If stand-by controller is not present then centralized controller becomes single-point-of-failure.[2] One of the solution to this problem is, to split the controller functionality among control plane and data plane nodes. That is putting some intelligence in data plane. This may solve the 12
controller failure issue but it contradicts with the central idea of SDN, that is to have a node which has broad view of network and flexible service deployment. Other approach to solve this issue is, SDN should make controller functionality distributed to increase reliability. 2.1.2 Placement problem : Controller placement problem includes placement of the controller according to the network topology and number of controllers needed for the given network. It affects most of the performance metrics like flow-setup latencies, high availability of network etc. Finding optimal placement for controller is the hot area of research in SDN currently. Optimality of placement can be based on different placement metrics.[4] reliability-aware controller : placement metric is reliability, which is percentage of valid control paths. Here, optimization is to maximize the percentage. This metric will be affected by location of the controller, controller to controller adjacencies, and the available number of controllers. resiliency(path protection)-aware controller : Any failure that halts communication between data plane and control plane can lead to serious performance issues. This placement metric takes into account connection resiliency between controller and switch which is how switches can protect their paths to controller. Here, optimization is to maximize the probability of fast recovery from failure based on controller placement according to this metrics. latency-aware controller : placement metrics is average propogation latency and optimization tries to minimize it by appropriate controller placement. 2.2 Data plane issues 2.2.1 Flow setup latency : Smallest granularity at which SDN works is a single flow. Two metrics to measure SDN s performance are, flow setup time and no. of flows that the controller can manage per second. Flow setup is a four step process. When a packet which belongs to a new flow arrives at a switch, no matching entry will be there. The switch forwards the packet to the controller and as response it will receive a forwarding rule for that flow. Now the switch updates the flow table entries. The performance of this setup process is limited by switch resources (cpu, memory) and software performance of controller. Controller can respond to such flow setup requests within 1 millisecond. But hardware switches take 10 ms or more for flow setup. So it hinders the SDN performance. 2.2.2 Datapath between ASIC and CPU in switches : Switches have CPU to handle ASIC but bandwidth among them is limited. Datapath between ASIC and CPU is not used frequently as a part of switch operation as it is slow. For example, the Procurve 5406lz ethernet switch has a bandwidth of 300 GB/sec, but the measured loopback bandwidth between ASIC and CPU is 35 MB/sec. It restricts the bandwidth between the switch and the controller. To control 13
the datapath between ASIC and CPU we use some variables per each flow entry that is no. of matches, no. of bytes in matches and flow duration. If these variables are implemented on ASIC hardware then making some changes to it leads to re-designing of the ASIC or deploying new switch hardware. Cost of ASICs depends on area of the chip. There is an upper bound on area of ASIC to keep it cost effective. 2.3 General issues 2.3.1 Scalability : For any architecture scalability is an important aspect. Especially when we see SDN as future of networking, it must be scalable. We can classify scalability in SDN as shown below in the table. Scalability Type Type 1 Type 2 Description No. of switches that an SDN based controller can support No. of flow table entries supported by a switch Table 2.1: Types of scalability Scalability type-1 : As no. of switches and no. of end hosts in network increases, gradually SDN controller bocomes a bottleneck. As no. of switches and no. of flows increases no. of requests to the controller increases. Controller may not be able to handle them as its computation power is limited to some value. Study of an NOX(SDN controller) shows that, it can manage 30K requests/second. It may be good enough for small organizations or campus networks but not sufficient for a data-center network with higher flow rates. A study shows that a data-center having 2 million virtual machines can generate 20 million flows per second. In optimal case, no. of flows supported by current controllers are approximately 100000 flows per second. Several solutions proposed to this problem are Coronet, Devoflow, McNettle. McNettle is a scalable control framework, which executes on shared memory, multi-core servers. Experiments on McNettle shows that a single controller having 46 cores can handle 5000 switches and 14 million flows per second with latency less then 200ms for light loads and upto 10 ms for higher loads[4]. Scalability type-2 : For each packet coming from a new flow, controller will push a forwarding rule for that flow in the switch. Switch maintains a forwarding table, each entry of the table consist of three fields: flow identification information, what action should be performed to the packets of that flow (forward to next hop/controller, drop), statistics like no. of packets matched, time since last packet matched etc. Now whenever a packet comes from that flow, it will be matched against flow table entry and appropriate action will be taken. This should be very fast in a network which is having higher no. of flows for high performance. To maintain flow table Ternary Content Addressable Memory (TCAM) is the preferred choice with lookup performance of O(1). But size of the TCAM we can use is limited due to its high power consumption. Smaller TCAMs can only have smaller flow table. But when no. of flow increases we need to support big flow tables. Solution to this can be vertical scale the switch or horizontal scaling of switches. One of the vertical scaling 14
solution is Tag-In-Tag, it can support 15 times more flow entries in a fixed size TCAM and power consumption per flow reduced by 80 percentage compare to unoptimized SDN switch. Horizontal scaling solution to the problem is to arrange the switches in the hierarchical manner such as authority switches,local switches etc. DIFANE is an example of horizontal scaling. 2.3.2 Load balancing : Load balancing[3] is a technique which can reduce power consumption, make efficient resource utilization for a network, and more importantly it can help to scale the network. By achieving all these goals, it also ensures minimal and uniform response time to all the end user applications. In a data center, legacy networks use load balancing techniques such as Equal Cost Multi Path(ECMP) and Valiant Load Balancing(VLB). ECMP calculates the cost for each path and forward the traffic based on the cost of the path. VLB forwards the packet to random switch. These load balancing techniques can work with SDN as well. To implement this techniques we need specific hardware load balancers. In SDN, we can leverage on controller s functionality for load balancing. Controller decides to which switch the incoming flow will go. So by putting some more intelligence we can make controller work as a load balancer. Several examples of this are Openflow based load balancing, Split/Merge etc. 2.3.3 Security : Based on a survey, 12 percent of the people in IT business technologies said that SDN has security challenges, 31 percent were not sure about security of SDN compare to traditional network. So clearly one of the major threat to the future of SDN is its ability to provide security. Based on above studies SDN can not integrate current security technologies and it is difficult to keep check on each packet in SDN. In addition to this, Controller has all the intelligence of the network, so it makes itself a go to target for hackers. If a hacker gets controller, whole network will be in his hand. There are some solutions suggested to increase security of SDN. Controller should support authentication and authorization classes for network administrators. Controller can maintain an intelligent access control list(acl) to filter the packets entering the network. Controller should also have an ability to alert administrators in case of attack and some technique should be used to limit controller s functionality in case of an attack. SDN should employ some standard policies to ensure safety of the network. We can see that SDN and NFV are two different technologies of their own kind with almost same goal of making physical network virtual. If merged, they can mitigate some of the challenges of the legacy network and also challenges of their own. We will try to solve some these challenges especially dynamic scalability, load balancing using these technologies. 15
Chapter 3 Elasticity in Virtual Middleboxes Middle box application is any networking function other than routing, which sits in between sender and receiver. Intrusion Detection System(IDS), Network Address Traversal(NAT), Squid caching proxy, Firewall, Load Balancer, Protocol Accelerators are some of the examples of middleboxes. Middleboxes are often implemented around the idea that each individual flow is an isolated context of execution.[8] Each flow will have separate execution path in the middlebox, so we can change execution path of any of the flow without disturbing execution of other flows coming to the middlebox. This characteristic of middlebox helps us in providing Elasticity in its execution. Elasticity can be defined as middlebox s ability to scale in or scale out depending on the network requirements. Here, basic idea of providing elasticity is by creating or destroying virtual machines/replicas and divide the load among them in real time. Ability of elastic execution of the middlebox also depends on type of the middle box. Middleboxes can be classified into two categories based on the state of the middlebox. State-full middlebox State-less middlebox To understand this, we can divide middlebox s state into two parts. Internal and External state. INTERNAL STATE Business logic, cache data, background processes EXTERNAL STATE COHERENT STATE Configuration policies, statistics, counter etc. PARTITIONED STATE flow tables, timers etc. (depends on middlebox application) Figure 3.1: Classification of the state of the middlebox Internal state includes business logic of the middlebox. Whenever we replicate a middlebox, new replica must contain internal state in order to run. This state will be changed very less frequently, But all the replicas need to be updated when it changes. State change on any of the replica does not have any side effects on other replica. 16
External state is the actual state which is being manipulated to provide elasticity in the middlebox. External state cannot be changed at any replica, without affecting output of the middlebox. It can be further divided into two parts, Coherent and Partitioned. Partitioned state contains the information specific to a flow. While dividing the work among replicas, we actually divide this state among them. Coherent state contains global information which is relevant to all the flows and any flow can access/update it at any time. So, this state should be consistent(strongly or eventually) among the replicas. Middlebox whose state can be contains only internal state is called Stateless middlebox. For example, load balancer. Middlebox whose state includes internal state as well as external state is called State-full middlebox. For example, NAT. Now, we will see some techniques to provide elasticity in both kind of middleboxes. 3.1 Elasticity in MB application using VNF in SDN environment In this section, we will talk about a layer of abstraction on top of virtual middleboxes. This layer is named as Split/Merge. We are considering state-full middleboxes for this example. Split/Merge classifies the state of the middlebox, identifies the state related to an incoming flow, create replicas and destroy them. In background, it uses SDN framework. SDN provides a network abstraction which ensures that input packets related to a particular flow arrive at appropriate replica. Split/Merge also provides a library which manages migration of flow state from one replica to other one. It is very important to manage state of the middlebox at the time of scale-in or scale-out. While scaling out, we create new replica of the middlebox. Internal state and coherent state of the middlebox is replicated to the new replica. Now, coherent state should remain consistent among all the replicas. Partitioned state is divided among the replicas to make them work in parallel. At the same time, network input to the middlebox also splits based on the flow states at each replica. While scaling out, we need to destroy a replica. We can discard its internal state. Then we will check if any outstanding update is there for coherent state or not, if it is then it will be pushed to the other replicas to keep coherent state consistent. It can be destroyed then after. Partitioned state of the replica will be merged to the partition state of destination replica. Network input is also redirected to destination replica. So, two major functions of Split/Merge are splitting/merging middlebox state and splitting/merging network input. Split/Merge layer is made up of four components, Freeflow library is implemented in C. Initially, it allocates a large virtual address space to maintain all the flow states throughout the operation. Then it indicates VMM agent to take initial snapshot. Freeflow library contains functions such as create flow, delete flow, get flow, put flow, flow timer for partitioned state handling and create shared, delete shared, get shared, put shared for coherent state handling. For each new flow, library allocates memory to create flow state. It also maintains transaction boundaries before moving a state by keeping reference counter. Library will copy partitioned state from one replica to another, when it receives notification from orchestrator. At any time only one library instance can have an active flow state. Library also maintains (eventual or strong) consistency of coherent states. In case of strong consistency, library uses distributed locking service. 17
VM Replica 1 L I B R A R Y vnic with unique address vnic with non-unique address across replicas Orchestrator Replica 2 L I B R A R Y VM VMM AGENT Control Network AGENT VMM FLOW1 Openflow controller S D N FLOW2 Traffic to middlebox Figure 3.2: Split/Merge architecture Underlying SDN framework, split/merge leverages SDN elements functionalities. SDN control plane does the job of splitting the network input among all replicas and make sure that all the replicas receive appropriate packets even after state migration. When SDN controller receives notification from orchestrator about a flow state migration, controller removes current rules for that flow from all the network elements. Flow is considered suspended. Now, until migration completes any incoming packet from that flow will be buffered at controller. Once migration is over, controller push new forwarding rules for that flow to the switches and send buffered packets. Orchestrator, it directs the most important task of this technique, flow migration. It decides the policy for migration that is when to scale in or when to scale out. Orchestrator interacts with other components in order to complete flow migration. In case of flow migration, it firsts notifies SDN controller to suspend the flow. Then it notifies Freeflow library to copy state from source replica to destination replica. At the end, it again notifies SDN to resume the flow. Either based on policy or explicit user request, orchestrator orders the creation or destruction of replicas to the VMM agent. VMM agent, it actually creates or destroys replicas based on notifications from orchestrator. It creates a replica by using the initial snapshot of the system which was taken at the instantiation time of the middlebox. This technique provides 25% reduction in maximum client response time compared to middlebox system without elasticity. It prevents load from becoming skewed among replicas. In addition, it also provides 50% quicker scale-in than other standard approaches.[8] 3.2 Elasticity of MB application using SDN elements We will use stateless middlebox application load balancer to understand this technique. It provides dynamic and flexible execution of load balancer and get rid of 18
dedicated hardware based load balancer which is expensive and very less customizable. It uses controller and commodity switches to do load balancing. Initial idea was very simple. For each incoming flow, controller installs a forwarding rule in the switch such that switch act as a load balancer and divides the load among replicas. This technique provides great flexibility but its has scalability issues. It involves latency and overhead of consulting the controller for each new client flow. And no. of rules a switch can accommodate is also limited. So, this technique is modified to make use of wildcard rules. It pushes wildcard rules to the switches to handle packets from a large set of clients. This technique performs two major tasks. partitioning, generating the wildcard rules and transitioning, moving from one set of rules to another set.[7] DATA CENTER INTERNET IP1 a1=3 R1 IP2 a2=4 a3=1 R2 R3 Load Balancer Switch CONTROLLER APPLICATION Gateway switch IP3 IP4 IP5 IP6 IP7 Figure 3.3: Openflow based load balancer Partitioning algorithm divides the traffic among replicas as per weight associated with the replica. For simplicity let s assume that traffic is uniform across all the client IP addresses. Main challenge to the algorithm is how it handles the current load with minimum no. of wildcard rules. Each client is associated with a leaf node of the binary tree. No. of ones in the binary representation of the weight of the replica indicates minimum no. of wildcard rules needed for that replica. In binary represented weight if position of 1 bit is i then it indicates merging of 2 power i leaf(client) nodes. Algorithm starts dividing the leaf nodes among replicas in the order of the highest 1 bit position in binary representation of their weight. This way when we complete assigning all the leaf nodes to the replicas, we will get a complete and minimal set of wildcard rules. Transitioning algorithm, once we get the minimal set of wildcard rules, if weight associated with any of the replica changes then we need to do re-partitioning. During re-partitioning we try to maximize reuse of the old wildcard rules as far as possible. Reusable wildcard rules are those rules in which the highest bit set to 1 in old weight and new weight is the same. There are two algorithms for transitioning with keeping transaction boundaries in mind. In Transitioning with microflow rules, each packet coming for the migrating rule will be directed to the controller. If the packet is coming from a new client then controller will install a microflow rule to direct it to new replica else forwards it to new replica. In Transitioning without controller intervention, controller split the rule which is being migrated. Then it waits for some time, if no packet arrives for new rule then it migrates that that part of the address space. It continues untill whole address space is migrated. In next chapter, we will see a solution to elasticity in which SDN and NFV almost plays equal amount of role. 19
3.3 Efficient Control Plane Architecture using NFV and SDN Together NFV and SDN can solve issues like scalability, elasticity, high availability etc. We will discuss about OpenNF in this section. OpenNF is a control plane design which has ability to dynamically redistribute packet processing across multiple replicas of the middlebox or NF for example elastic NF scaling, load balancing.[1] It satisfies tight service level agreements on NF performance or availability. It accurately monitors and manipulates the network traffic. It also reduces operating cost of NFs. In order to scale out to meet performance requirements, just creating the new VM instance and updating the forwarding state is not enough. We need to move NF state as well. NORTHBOUND API CONTROL APPLICATION OpenNF CONTROLLER NF State Manager FLOW MANAGER SOUTHBOUND API NF 1 NF 2 SDN Switches Figure 3.4: OpenNF architecture Southbound API is used to manipulate the diverse NF states. It classifies NF state into three categories, per-flow, multi-flow or all-flow state. It specifies functions such as get, put, delete to import or export NF state. It also has API which implements event handling mechanism. It can enable event for any particular flow and specify the action such as process the packet, drop the packet or forward the packet to controller. Northbound API provides support to move, copy or share part of state among NF replicas. Move operation shifts NF state and network input for a set of flows to the destination NF instance. Move operation can be normal move, loss free move or order preserving move. In order to provide tight service level agreement and correctness of some middlebox applications loss less and order preserving moves are very important. Order preserving move is achieved by combination of events and two phase forwarding update. To reduce latency of this move operation, some of the suggested optimizations are parallelizing the move operation and early-released optimized move. Copy and share operations are used when a single state is needed at multiple NF instances. Copy is useful when state consistency is not required or eventual consistency is needed. Share is useful when strong or strict consistency is required. Various Control applications can be written on top of controller and they communicate via Northbound API. Few examples of the control application are Bro IDS, squid caching proxy, PRADS asset monitor etc. So, we addressed several issues by combining SDN and NFv. We came up with an advanced control plane architecture OpenNF. Now, in next chapter we will compare all the techniques we discussed so far. 20
Chapter 4 Comparing the solutions Split/Merge technique, To provide transparent and elastic execution of middleboxes can be used only if state of the middlebox is similar to the state that we have discussed, that is if the state can be classified as internal, coherent and partitioned state. Although most of the middleboxes have the similar state structure. OpenNF control plane architecture solves the this issue. It supports diverse state structure middleboxes. It handles state of each middlebox in different way. According to that, we have to add southbound API to each middlebox. Split/merge or OpenNF, both can provide elasticity to state-full as well as stateless middleboxe applications. But openflow based load balancing can not be used for any state-full middlebox application as it doesn t have the logic of migrating the flow state among replicas. But that technique may become helpful for other stateless middlebox applications. In Split/Merge during the flow migration, any packet arriving for that flow will get buffered at controller. Once the migration completes controller releases all the buffered packets. But before they arrive to the replica, some new packet may arrive. So, Split/Merge does not preserve order of execution of packets. After migration if any packet comes at old replica then it will get dropped. So, it doesn t guarantee loss less execution as well. For some middleboxes like IDS, these things are necessary. OpenNF provides solution to this by enabling the lossless and order-preserving move operation. Both of the things does not matter in case of stateless middlibox application. Split/Merge can handle middlebox application well, if all the modules of the MB uses same granularity to refer the flow. If one module refers the flow using fine-grained state and other module refers to the flow using coarse-grained state. At the time of implementing Split/Merge architecture on this MB, either we need all modules to use fine grained state to refer the flows or coarse grained state. In some cases coarse grained state becomes coherent state. It needs to be consistent and synchronized all the time. It may suffer with high overhead as synchronization is required. OpenNF has different kind of flow states as we discussed. So, OpenNF doesn t face such challenges. In openflow based load balancing technique, assumption was made that all the clients generates the uniform traffic. Load balancing may not work properly in case of non-uniform traffic from different clients. Some modifications to these technique were implemented in order to serve non-uniform traffic across the clients. Split/Merge works well in case of uniform traffic among the clients. One experiment should be performed to check weather Split/Merge can support non-uniform traffic across the clients or not. OpenNF works well with uniform traffic from clients and same experiment needs to be done for OpenNF. Split/Merge is a technique in which NF is implemented as a separate layer, and 21
SDN just directs the packets to the abstraction layer in appropriate way. In openflow based load balancing, We used existing SDN elements with some modifications to its functionalities. In OpenNF, we redesigned control plane architecture. Whole NFs are implemented inside control plane and task of the controller becomes complex. In these examples, SDN is doing more than just directing the packet. Figure 4.1: Comparision 22
References [1] Aaron-Gember-Jacobson, Raajay Viswanathan, C. P. R. G. J. K. S. D. A. A. Opennf: Enabling innovation in network function control. In SIGCOMM (2014). [2] Amin Tootoonchian, Sergey Gorbunov, Y. G. M. C. R. S. On controller performance in software-defined networks. In USENIX (2012). [3] Kannan Govindarajan, Kong Chee Meng, H. O. A literature review on software-defined networking research topics, challenges and solutions. In IEEE (2013). [4] Manar Jammala1, Taranpreet Singha, A. S. R., and Lic, Y. Softwaredefined networking: State of the art and research challenges. In Elsevier s journal of Computer Networks (2014). [5] NFV basics. https://www.sdxcentral.com/resources/nfv/whats-networkfunctions-virtualization-nfv/. [6] OpenFlow-enabled SDN and Network Functions Virtualization. https://www.opennetworking.org/images/stories/downloads/sdnresources/solution-briefs/sb-sdn-nvf-solution.pdf. [7] Richard Wang, Dana Butnariu, J. R. Openflow-based server load balancing gone wild. In HotIce (2013). [8] Shriram Rajagopalan, Dan Williams, H. J. A. W. Split/merge: System support for elastic execution in virtual middleboxes. In USENIX (2013). 23