TCP ISSUES IN DATA CENTER SYSTEM- SURVEY



Similar documents
# % # % & &( ) & & + ) ),,, ) & & ## )&&. ),,, ) & & ## )&&. / /1 4 ) (.

# % # % & () # () + (, ( ( (. /( ( (. /(. 0!12!3 &! 1. 4 ( /+ ) 0

Providing Reliable Service in Data-center Networks

Low-rate TCP-targeted Denial of Service Attack Defense

TCP in Wireless Mobile Networks

Lecture 15: Congestion Control. CSE 123: Computer Networks Stefan Savage

Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication

TCP over Wireless Networks

TCP over Multi-hop Wireless Networks * Overview of Transmission Control Protocol / Internet Protocol (TCP/IP) Internet Protocol (IP)

Improving the Performance of TCP Using Window Adjustment Procedure and Bandwidth Estimation

Improving Flow Completion Time for Short Flows in Datacenter Networks

Extensions to FreeBSD Datacenter TCP for Incremental Deployment Support

Application Level Congestion Control Enhancements in High BDP Networks. Anupama Sundaresan

Extreme Networks: Congestion Management and Buffering in Data Center Networks A SOLUTION WHITE PAPER

AN IMPROVED SNOOP FOR TCP RENO AND TCP SACK IN WIRED-CUM- WIRELESS NETWORKS

PACE Your Network: Fair and Controllable Multi- Tenant Data Center Networks

ICTCP: Incast Congestion Control for TCP in Data Center Networks

Simulation-Based Comparisons of Solutions for TCP Packet Reordering in Wireless Network

Network Performance: Networks must be fast. What are the essential network performance metrics: bandwidth and latency

Protagonist International Journal of Management And Technology (PIJMT) Online ISSN Vol 2 No 3 (May-2015) Active Queue Management

Comparative Analysis of Congestion Control Algorithms Using ns-2

On implementation of DCTCP on three tier and fat tree data center network topologies

17: Queue Management. Queuing. Mark Handley

Basic Multiplexing models. Computer Networks - Vassilis Tsaoussidis

High-Performance Automated Trading Network Architectures

CSE 123: Computer Networks

An Investigation into Data Center Congestion Control with ECN

Data Networks Summer 2007 Homework #3

A Survey on Congestion Control Mechanisms for Performance Improvement of TCP

IP ETHERNET STORAGE CHALLENGES

High-Speed TCP Performance Characterization under Various Operating Systems

Friends, not Foes Synthesizing Existing Transport Strategies for Data Center Networks

Transport layer issues in ad hoc wireless networks Dmitrij Lagutin,

Frequently Asked Questions

Load Balancing Mechanisms in Data Center Networks

Data Center TCP (DCTCP)

Enabling Flow-level Latency Measurements across Routers in Data Centers

NicPic: Scalable and Accurate End-Host Rate Limiting

SJBIT, Bangalore, KARNATAKA

TCP/IP Optimization for Wide Area Storage Networks. Dr. Joseph L White Juniper Networks

MMPTCP: A Novel Transport Protocol for Data Centre Networks

Technical Bulletin. Arista LANZ Overview. Overview

Switching Architectures for Cloud Network Designs

Broadcom Smart-Buffer Technology in Data Center Switches for Cost-Effective Performance Scaling of Cloud Applications

A Survey: High Speed TCP Variants in Wireless Networks

TCP/IP Over Lossy Links - TCP SACK without Congestion Control

Lecture Objectives. Lecture 07 Mobile Networks: TCP in Wireless Networks. Agenda. TCP Flow Control. Flow Control Can Limit Throughput (1)

Deconstructing Datacenter Packet Transport

International Journal of Scientific & Engineering Research, Volume 6, Issue 7, July ISSN

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM

IEEE Congestion Management Presentation for IEEE Congestion Management Study Group

Optimal Bandwidth Monitoring. Y.Yu, I.Cheng and A.Basu Department of Computing Science U. of Alberta

Outline. TCP connection setup/data transfer Computer Networking. TCP Reliability. Congestion sources and collapse. Congestion control basics

15-744: Computer Networking Routing in the DC

Student, Haryana Engineering College, Haryana, India 2 H.O.D (CSE), Haryana Engineering College, Haryana, India

Congestions and Control Mechanisms n Wired and Wireless Networks

Analysis of Internet Transport Service Performance with Active Queue Management in a QoS-enabled Network

ALTHOUGH it is one of the first protocols

Congestion Control Review Computer Networking. Resource Management Approaches. Traffic and Resource Management. What is congestion control?

Transport Layer Protocols

Advanced Computer Networks. Scheduling

TCP Adaptation for MPI on Long-and-Fat Networks

Cisco Bandwidth Quality Manager 3.1

[Yawen comments]: The first author Ruoyan Liu is a visting student coming from our collaborator, A/Prof. Huaxi Gu s research group, Xidian

Giving life to today s media distribution services

EFFECT OF TRANSFER FILE SIZE ON TCP-ADaLR PERFORMANCE: A SIMULATION STUDY

PART III. OPS-based wide area networks

How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet

A Dell Technical White Paper Dell PowerConnect Team

Microsoft Exchange Server 2003 Deployment Considerations

Building Enterprise-Class Storage Using 40GbE

Active Queue Management (AQM) based Internet Congestion Control

TCP loss sensitivity analysis ADAM KRAJEWSKI, IT-CS-CE

SANs across MANs and WANs. Joseph L White, Juniper Networks

CSE 473 Introduction to Computer Networks. Exam 2 Solutions. Your name: 10/31/2013

La couche transport dans l'internet (la suite TCP/IP)

OpenFlow Based Load Balancing

2 TCP-like Design. Answer

Denial-of-Service Shrew Attacks

Fulvio Risso Politecnico di Torino

Cloud-ready network architecture

D1.2 Network Load Balancing

Delay-Based Early Congestion Detection and Adaptation in TCP: Impact on web performance

phost: Distributed Near-Optimal Datacenter Transport Over Commodity Network Fabric

Data Center Network Topologies: FatTree

Disfer. Sink - Sensor Connectivity and Sensor Android Application. Protocol implementation: Charilaos Stais (stais AT aueb.gr)

TCP Pacing in Data Center Networks

WEB SERVER PERFORMANCE WITH CUBIC AND COMPOUND TCP

Storage at a Distance; Using RoCE as a WAN Transport

An Improved TCP Congestion Control Algorithm for Wireless Networks

An enhanced TCP mechanism Fast-TCP in IP networks with wireless links

Mobile Computing/ Mobile Networks

AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK

Transcription:

TCP ISSUES IN DATA CENTER SYSTEM- SURVEY Raj Kumar Yadav Assistant Professor DES s College of Engineering, Dhamangaon Rly(MS), India, rajyadav.engg@gmail.com A B S T R A C T Data center systems are at the heart of massive information systems like the Internet, ensuring availabilityand scalability of services that demonstrate great variance in performance metrics. It is thereforenecessary and interesting to study these systems in order to ensure that our dataintensive systems canfulfill the goals of efficiency, fault-tolerance, and scalability.in this paper, weproceed to understand the transport layer issues in data centers. The TCP Incast problem is a particularproblem that arises when using vanilla TCP inside data centers. We begin by trying to understandthe causes and interactions involved in the occurrence of TCP Incast as explained. Unfortunately DCTCP, the problem has only been studied very recently and there is a lack of multiple perspectives on this importantissue. We describe the solution to the TCP Incast problem proposed. In this paper, conclude our study oftransport layer issues in data centers by understanding DCTCP, a variant of TCP for data centers proposed. Index Term: Incast Prorblem, TCP protocol, bandwidth. I. INTRODUCTION Data centers lie at the heart of the massive ICT systems deployed by organizations like Google, Yahoo, Amazon, etc. to manage data and to provide online services. The traditional data center could be viewed as a privately owned centralized infrastructure. However, concerns such as better performance and disaster management have led to the emergence of distributed, virtualized data centers (DVDC). A further refinement has been the cloud computing model which utilizes DVDCs to provide pay-as-you-go services to customers. The modern data center is a confluence of various technologies in action - computation, networking, storage, virtualization, configuration control, and power efficiency [1]. a) Physical Organization Main components in a data center are switches, routers, servers, and storage bricks. The physical architecture of a datacenter is determined by the composition of these components. For example, a data center design may use low-cost L2switches connected in a mesh network to reduce costs and ensure a high bisection bandwidth [2]. b) Network Infrastructure Four networking fabrics are prevalent in data centers - Ethernet, Infinite Band, Fiber Channel, and Myrinet. Connectionof external clients to data center uses Ethernet, while server-server communication may use Ethernet or Infinite Band.Access between servers and storage has typically used fiber optics, though Infinite Band and Ethernet have also beenemployed. c) Storage Infrastructure 21 2014, IJAFRC All Rights Reserved www.ijafrc.org

Three types of storage can be found in data centers. First, Direct Attached Storage (DAS) uses SCSI and SAS, and can provide throughputs of upto 2.4 GBPS on a 6 GBPS link. Storage Area Network (SAN) and Network Attached Storage(NAS) segregate the storage aspect of the data center from the servers. SAN uses protocols like iscsi, ifcp, FCIP,and FCoE for communicating with the servers using Fiber Channel. On the other hand, NAS often uses Ethernet andprovides higher-level access to memory across the network than SAN. e.g. file or object level access. As a result, it isslower then SAN due to the selfmanagement of storage rather than allowing applications low-level access to data. d) Power Infrastructure Typically, the external power supply to a data center is 1.33 kv, 3 phase which is stepped down to 280-480 V, 3phase on premises. This is converted by the UPS from AC to DC for battery storage, and back again from DC to ACfor output to the data center. The UPS output of 240V/120V, single phase is distributed by Power Distribution Unit(PDU) to individual servers or chassis, where it is stepped down, converted from AC to DC, and partially regulated.the power is finally delivered to motherboards and converted by VRs into many voltage rails. Due to various losses,overall power efficiency of a data center stands at less than 50%. e) Control Infrastructure Control and configuration management involves Out-Of-Band (OOB) management for Baseboard Management Controller (BMC), switches, routers, and storage bricks; and InBand (IB) management for server CPU and OS. II. TCP ISSUES IN DATA CENTERS TCP Incast problem is defined as the degradation of goodput (application throughput) when multiple senders communicatewith asingle receiver in a high throughput, low latency network with shallow switch buffers often found incommodity switches. a) Fine-grained TCP for Data Centers To solve this problem, the authors of [2] suggest reducing or eliminating the RTOmin parameter, using high-resolution timers to maintain RTT and RTO timer values at microsecond resolutions, randomizing the timeout to desynchronizethe flows, and disabling the delayed ACK feature. In barrier synchronized requests, a client makes the request to multiple servers for data portions. The client cannotproceed with another request unless all previous parallel requests have been answered. In vanilla TCP designed forwan architecture, the RTO timer is calculated as a function of the RTT samples. It is lower-bounded by RTOmin, which is usually 200 ms. this minimum value helps avoid spurious timeouts when the RTT suddenly increases.in a DCN, the latencies are of the order of microseconds. As a result, in the TCP Incast scenario, the flows whosepackets are lost must wait an inordinately high amount of time compared to the network latency to retransmit theirpackets. In a barrier synchronized request scenario, this can lead to synchronized retransmissions and losses, leadingto a loss in goodput by an order of magnitude. The above problem can be avoided by setting the RTOmin to 200 μs.this ensures that the effective RTO is not too high to cause long link idle times. However, in future DCN, where thelatencies will be further reduced due to 10 Gbps links and the RTT will be around 20 μs, the 200 μsrtomin will againlead to TCP Incast. Hence,the conclusion that the RTOmin should be of the order of the network latency. This isrequired to ensure that RTOmin does not delay detection of TCP Incast losses 22 2014, IJAFRC All Rights Reserved www.ijafrc.org

and subsequent retransmission [4].In order to maintain timers with microresolution granularity, the authors describe kernel changes that allow them to utilize the GTOD framework that uses CPU cycle counter and provides nanosecond resolution. This is used fortimestamping and calculating RTT. Also, hr timer interface using HPET (High Precision Event Timer) hardware isused for maintaining TCP timers such as RTO timer. Reducing RTOmin does not prevent the retransmissions fromcoinciding leading to a throughput drop, since the RTO values are chosen deterministically. b) Understanding TCP Incast It provides an incomplete quantitative model to explain some parts of the empirical curve, and qualitative explanations for the remaining curve characteristics that are not covered by the quantitative model. It also suggests minor, intuitive modifications like reducing RTOmin already suggested by [5]. Authors tried out setting a small multiplier to RTO exponential back off, setting a randomized multiplier for RTO exponential backoff, randomizing the minimum TCPRTO timer value, and decreasing the minimum TCP RTO timer value. Smaller minimum RTO values lead to larger initial goodput minimum. The initial good put minimum occurs at thesame number of senders for all values of minimum RTO. Larger RTO minimum values lead to the goodput localmaximum occurring at higher number of senders. After the goodput local maximum, the rate of goodput decreasewith the number of senders is independent of the minimum RTO values. All observations have been made for a fixedfragment size, variable block size workload. i. Disabling Delayed ACKs: It leads to a higher and less stable congestion window due to immediate ACKs overdriving the congestion window. Thiscauses an increased number of smoothed RTT spikes in spite of the mean smoothed RTT remaining the same, resulting in frequent, unnecessary congestion timeout events. Hence, the authors of [3] conclude that delayed ACKsshould not be disabled. This observation is different from the one in [6], where it is suggested that delayed ACKs cancause a throughput drop in data center environment and should be disabled. ii. Qualitative Refinements Initial Goodput Decrease: As S increases, R increases. This causes more RTO events and spurious retransmissionsleading to goodput loss. Goodput Recovery: As the number of senders increases, the RTT and RTO estimate of each sender also increases, since all senders share the same queue. This leads to lesser RTO events and hence Goodput increases. Further Goodput Decrease: As senders increase, RTO keeps increasing. After a certain RTO increase, there wouldbe interference between the senders transmitting after an RTO timeout and senders that are transmitting becausethey are not in an RTO state. c) Data Center TCP (DCTCP) Transport protocol design can make certain requirement assumptions in the data center environment that cannot bemadegenerally in the Internet due to end-to-end principle. These assumptions such as low latencies, high throughput,burst traffic, and low packet losses allow for customizing the TCP for much better performance in the homogenousenvironment of data centers.dctcp [4] is a TCP variant for the low latency, high throughput, burst data center environment that notifies thesender of congestion using ECN and reduces the congestion window size in proportion to the extent of congestion asindicated by the receipt of ECN packets. According to authors, the following typical traffic classes are reported in data centers: query traffic (very short, latencycritical flows; between 1.6 KB to 2 KB), short message flows 23 2014, IJAFRC All Rights Reserved www.ijafrc.org

(time-sensitive; 50 KB to 1 MB), and large data flows(throughput-sensitive; 1 MB to 50 MB). Thus, application-level query and message traffic is latency sensitive whiledata traffic is utilization sensitive. III. PROBLEMS WITH TCP The following problems are reported with TCP in data center environment [8]: Incast: In the Partition/Aggregate pattern, a receiver requests for data simultaneously from multiple receivers. Asa result, the buffer queue length at the switch port of the receiver increases, thereby causing timeouts. TCPinterprets this wrongly as network congestion and reduces cwnd thereby reducing network utilization drastically. Queue Buildup: When a queue at a port is shared by short as well as long flows, the shorter flows experiencelatencies. Since performance over short flows (application-level query traffic and messages) is based on latency,the performance degrades. This is because of delays in the buffers being shared by the short and long flows. Buffer Pressure: Switches are often shared memory switches i.e. all ports share the same memory. This reducescosts under the assumption of statistical multiplexing. However, long flows in a DC are often stable and do notutilize statistical multiplexing. As a result, a short flow at one port may be coupled with a long flow at anotherport of the same switch due to shared memory queue buffers. IV. DCTCP Algorithm Simple marking of CE code point at switch: Based on a threshold K of the instantaneous queue, each switch can mark a packet with a CE code point. This indicates to the receiver that congestion has occurred. ECN-Echo at the receiver: The receiver uses a combination of immediate and delayed ACKs to notify the sender about congestion. The delayed ACKs are used to convey ECN status of a certain number of packets at a time, and reduce pressure on the sender. The immediate ACKs are used when the ECN status of packet most recently received at the sender changes compared to its previous packet. Controller at the sender: The sender maintains a parameter a indicating the level of congestion, which is updated every RTT. The sender receives the ACKs from the receiver and changes cwnd according to congestion. Given that F is the fraction of packets marked for congestion during the last data window, the formulae used by the controller for computing a and cwnd are as follows: a := (1- g) * a+g * F cwnd := cwnd * (1- a / 2) The notification mechanism of DCTCP is similar to AQM/RED, except that DCTCP detects congestion using instantaneousqueue length and reports it using delayed ACK. Also, the backoff is based on the extent of congestion conveyedby ECN-Echo. V. CONCLUSION The paper is concentrated mainly on the network layerand transport layer issuesin the data center environment. Though some of the papers read have been about automatic control in data centers, the treatment of these papers has been generally towards the solution of automatic control problems in complexsystems, data center systems being an apt example for the application of these solutions.some notable works in the area of data center research could not be included due to lack of time and space. Particularly, the Map Reduce framework has had a defining impact on the programming model used in 24 2014, IJAFRC All Rights Reserved www.ijafrc.org

data centers,and can be arguably regarded to be the C/C++ of the data center world.in the solution space of TCP Incast, provides some fresh insights by trying to adopt a preventive approach to solving the TCP Incast problem. The authors suggest controlling the receiver window to prevent the occurrence of TCP Incast instead of using fine-resolution RTO timers as propounded in some earlier works described in this report. VI. REFERENCES [1]K. Kant, Data center evolution: A tutorial on state of the art, issues, and challenges, Computer Networks (Elsevier), vol. 53, pp. 939 2965, 2009. [2] V. Vasudevan, A. Phanishayee, H. Shah, E. Krevat, D. G. Andersen, G. R. Ganger, G. A. Gibson, and B. Mueller, Safe and effective fine-grained TCP retransmissions for data center communication, SIGCOMM, August 17 212009. [3] Y. Chen, R. Griffith, J. Liu, R. Katz, and A. D. Joseph, Understanding tcpincast throughput collapse in datacenter networks, WREN, August 21 2009. [4] M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan, Datacenter tcp (dctcp), SIGCOMM, August 30 September 3 2010. [5] Y. Chen, R. Griffith, J. Liu, R. Katz, and A. D. Joseph, Understanding tcpincast throughput collapse in datacenter networks, WREN, August 21 2009. [7] V. Vasudevan, A. Phanishayee, H. Shah, E. Krevat, D. G. Andersen, G. R. Ganger, G. A. Gibson, and B. Mueller, Safe and effective fine-grained tcp retransmissions for data center communication, SIGCOMM, August 17 212009. [8] M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan, Datacenter tcp (dctcp), SIGCOMM, August 30 September 3 2010. 25 2014, IJAFRC All Rights Reserved www.ijafrc.org