Hashing in Networked Systems



Similar documents
ICP. Cache Hierarchies. Squid. Squid Cache ICP Use. Squid. Squid

CSE 473 Introduction to Computer Networks. Exam 2 Solutions. Your name: 10/31/2013

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

CS514: Intermediate Course in Computer Systems

Server Traffic Management. Jeff Chase Duke University, Department of Computer Science CPS 212: Distributed Information Systems

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

Lecture 8b: Proxy Server Load Balancing

Single Pass Load Balancing with Session Persistence in IPv6 Network. C. J. (Charlie) Liu Network Operations Charter Communications

Measuring the Web: Part I - - Content Delivery Networks. Prof. Anja Feldmann, Ph.D. Dr. Ramin Khalili Georgios Smaragdakis, PhD

Advanced Computer Networks. Layer-7-Switching and Loadbalancing

DATA COMMUNICATOIN NETWORKING

How To Understand The Power Of A Content Delivery Network (Cdn)

architecture: what the pieces are and how they fit together names and addresses: what's your name and number?

20. Switched Local Area Networks

Content Delivery Networks

Overlay Networks. Slides adopted from Prof. Böszörményi, Distributed Systems, Summer 2004.

CSC2231: Akamai. Stefan Saroiu Department of Computer Science University of Toronto

Communications and Networking

Outline. VL2: A Scalable and Flexible Data Center Network. Problem. Introduction 11/26/2012

Internet Content Distribution

Outline. CSc 466/566. Computer Security. 18 : Network Security Introduction. Network Topology. Network Topology. Christian Collberg

Load Balancing and Sessions. C. Kopparapu, Load Balancing Servers, Firewalls and Caches. Wiley, 2002.

Multi-layer switch hardware commutation across various layers. Mario Baldi. Politecnico di Torino.

Final exam review, Fall 2005 FSU (CIS-5357) Network Security

Application Layer. CMPT Application Layer 1. Required Reading: Chapter 2 of the text book. Outline of Chapter 2

OVERLAYING VIRTUALIZED LAYER 2 NETWORKS OVER LAYER 3 NETWORKS

Note! The problem set consists of two parts: Part I: The problem specifications pages Part II: The answer pages

Content Delivery Networks. Shaxun Chen April 21, 2009

CS 457 Lecture 19 Global Internet - BGP. Fall 2011

Scalable Internet Services and Load Balancing

VLAN und MPLS, Firewall und NAT,

VXLAN: Scaling Data Center Capacity. White Paper

Final for ECE374 05/06/13 Solution!!

Exam 1 Review Questions

First Midterm for ECE374 03/24/11 Solution!!

Overview. Tor Circuit Setup (1) Tor Anonymity Network

Advanced Computer Networks. Datacenter Network Fabric

Understanding Slow Start

IP Multicasting. Applications with multiple receivers

FortiOS Handbook - Load Balancing VERSION 5.2.2

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Web Caching and CDNs. Aditya Akella

Introduction to Computer Networks

TRILL for Service Provider Data Center and IXP. Francois Tallet, Cisco Systems

CSE 123b Communications Software

Distributed Systems. 23. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2015

COMP 361 Computer Communications Networks. Fall Semester Midterm Examination

Computer Networks - CS132/EECS148 - Spring

First Midterm for ECE374 03/09/12 Solution!!

Lab 2. CS-335a. Fall 2012 Computer Science Department. Manolis Surligas

CS 188/219. Scalable Internet Services Andrew Mutz October 8, 2015

EINDHOVEN UNIVERSITY OF TECHNOLOGY Department of Mathematics and Computer Science

Load Balancing. Final Network Exam LSNAT. Sommaire. How works a "traditional" NAT? Un article de Le wiki des TPs RSM.

Distributed Systems. 25. Content Delivery Networks (CDN) 2014 Paul Krzyzanowski. Rutgers University. Fall 2014

2. What is the maximum value of each octet in an IP address? A. 28 B. 255 C. 256 D. None of the above

Decentralized Peer-to-Peer Network Architecture: Gnutella and Freenet

Midterm. Name: Andrew user id:

PowerLink Bandwidth Aggregation Redundant WAN Link and VPN Fail-Over Solutions

P2P: centralized directory (Napster s Approach)

Multi-Homing Security Gateway

Network: several computers who can communicate. bus. Main example: Ethernet (1980 today: coaxial cable, twisted pair, 10Mb 1000Gb).

Agenda. Distributed System Structures. Why Distributed Systems? Motivation

FortiOS Handbook Load Balancing for FortiOS 5.0

Distributed Systems 19. Content Delivery Networks (CDN) Paul Krzyzanowski

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

Zarząd (7 osób) F inanse (13 osób) M arketing (7 osób) S przedaż (16 osób) K adry (15 osób)

DEPLOYMENT GUIDE Version 1.1. DNS Traffic Management using the BIG-IP Local Traffic Manager

Introduction to IP v6

OpenFlow Based Load Balancing

Life of a Packet CS 640,

What is VLAN Routing?

How do I get to

Scalable Internet Services and Load Balancing

Internet Firewall CSIS Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS net15 1. Routers can implement packet filtering

Content Distribu-on Networks (CDNs)

GATE CS Topic wise Questions Computer Network

Flow Analysis Versus Packet Analysis. What Should You Choose?

APV9650. Application Delivery Controller

CLE202 Introduction to ServerIron ADX Application Switching and Load Balancing

Request Routing, Load-Balancing and Fault- Tolerance Solution - MediaDNS

COS 461: Computer Networks

Ref: A. Leon Garcia and I. Widjaja, Communication Networks, 2 nd Ed. McGraw Hill, 2006 Latest update of this lecture was on

Load Balance Router R258V

EE 7376: Introduction to Computer Networks. Homework #3: Network Security, , Web, DNS, and Network Management. Maximum Points: 60

CS101 Lecture 19: Internetworking. What You ll Learn Today

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

High Availability HTTP/S. R.P. (Adi) Aditya Senior Network Architect

Chapter 8 Security Pt 2

Chapter 11 Cloud Application Development

Wharf T&T Limited DDoS Mitigation Service Customer Portal User Guide

CSCI-1680 CDN & P2P Chen Avin

Transcription:

LB Server Cluster Switches Hashing in Networked Systems COS 461: Computer Networks Spring 2011 Mike Freedman h@p://www.cs.princeton.edu/courses/archive/spring11/cos461/

2 Hash funcion Hashing FuncIon that maps a large, possibly variable sized datum into a small datum, onen a single integer that serves to index an associaive array In short: maps n bit datum into k buckets (k << 2 n ) Provides Ime & space saving data structure for lookup Main goals: Low cost DeterminisIc Uniformity (load balanced)

3 Uses of hashing Today s outline Equal cost mulipath rouing in switches Network load balancing in server clusters Per flow staisics in switches (QoS, IDS) Caching in cooperaive CDNs and P2P file sharing Data pariioning in distributed storage services Various hashing strategies Modulo hashing Consistent hashing Bloom Filters

Uses of Hashing 4

5 Equal cost mulipath rouing (ECMP) ECMP MulIpath rouing strategy that splits traffic over muliple paths for load balancing Why not just round robin packets? Reordering (lead to triple duplicate ACK in TCP?) Different RTT per path (for TCP RTO) Different MTUs per path

6 Equal cost mulipath rouing (ECMP) Path selecion via hashing # buckets = # outgoing links Hash network informaion (source/dest IP addrs) to select outgoing link: preserves flow affinity

7 Now: ECMP in datacenters Datacenter networks are muli rooted tree Goal: Support for 100,000s of servers Recall Ethernet spanning tree problems: No loops L3 rouing and ECMP: Take advantage of muliple paths

8 Network load balancing Goal: Split requests evenly over k servers Map new flows to any server Packets of exising flows coninue to use same server 3 approaches Load balancer terminates TCP, opens own connecion to server Virtual IP / Dedicated IP (VIP/DIP) approaches One global facing virtual IP represents all servers in cluster Hash client s network informaion (source IP:port) NAT approach: Replace virtual IP with server s actual IP Direct Server Return (DSR)

9 Load balancing with DSR LB Server Cluster Switches Servers bind to both virtual and dedicated IP Load balancer just replaces dest MAC addr Server sees client IP, responds directly Packet in reverse direcion do not pass through load balancer Greater scalability, paricularly for traffic with assymmetric bandwidth (e.g., HTTP GETs)

10 Per flow state in switches Switches onen need to maintain connecion records or per flow state Quality of service for flows Flow based measurement and monitoring Payload analysis in Intrusion DetecIon Systems (IDSs) On packet receipt: Hash flow informaion (packet 5 tuple) Perform lookup if packet belongs to known flow Otherwise, possibly create new flow entry ProbabilisIc match (false posiives) may be okay

11 CooperaIve Web CDNs Tree like topology of cooperaive web caches Check local If miss, check siblings / parent public Internet One approach Internet Cache Protocol (ICP) UDP based lookup, short Imeout Parent webcache AlternaIve approach A priori guess is siblings/children have content Nodes share hash table of cached content with parent / siblings ProbabilisIc check (false posiives) okay, as actual ICP lookup to neighbor could just return false

12 Hash tables in P2P file sharing Two layer network (e.g., Gnutella, Kazaa) Ultrapeers are more stable, not NATted, higher bandwidth Leaf nodes connect with 1 or more ultrapeers Ultrapeers handle content searchers Leaf nodes send hash table of content to ultrapeers Search requests flooded through ultrapeer network When ultrapeer gets request, checks hash tables of its children for match

13 Data pariioning Network load balancing: All machines are equal Data pariioning: Machines store different content Non hash based soluion Directory server maintains mapping from O(entries) to machines (e.g., Network file system, Google File System) Named data can be placed on any machine Hash based soluion Nodes maintain mappings from O(buckets) to machines Data placed on the machine that owns the name s bucket

14 Examples of data pariioning Akamai 1000 clusters around Internet, each >= 1 servers Hash (URL s domain) to map to one server Akamai DNS aware of hash funcion, returns machine that 1. is in geographically nearby cluster 2. manages paricular customer domain Memcached (Facebook, Twi@er, ) Employ k machines for in memory key value caching On read: Check memcache If miss, read data from DB, write to memcache On write: invalidate cache, write data to DB

How Akamai Works Already Cached 15 cnn.com (content provider) DNS root server Akamai server GET index. html 1 2 End user 7 8 9 10 GET /cnn.com/foo.jpg Akamai high-level DNS server Akamai low-level DNS server Cluster Nearby hash-chosen Akamai server

Hashing Techniques 16

17 Basic Hash Techniques Simple approach for uniform data If data distributed uniformly over N, for N >> n Hash fn = <data> mod n Fails goal of uniformity if data not uniform Non uniform data, variable length strings Typically split strings into blocks Perform rolling computaion over blocks CRC32 checksum Cryptographic hash funcions (SHA 1 has 64 byte blocks)

18 Applying Basic Hashing Consider problem of data pariion: Given document X, choose one of k servers to use Suppose we use modulo hashing Number servers 1..k Place X on server i = (X mod k) Problem? Data may not be uniformly distributed Place X on server i = hash (X) mod k Problem? What happens if a server fails or joins (k k±1)? What is different clients has different esimate of k? Answer: All entries get remapped to new nodes!

19 Consistent Hashing insert(key lookup(key 1,value) 1 key 1 =value key 1 key 2 key 3 Consistent hashing pariions key space among nodes Contact appropriate node to lookup/store key Blue node determines red node is responsible for key 1 Blue node sends lookup or insert to red node

20 Consistent Hashing 0000 0010 0110 1010 1100 1110 1111 URL 0001 1 URL 0100 2 URL 1011 3 ParIIoning key space among nodes Nodes choose random idenifiers: e.g., hash(ip) Keys randomly distributed in ID space: e.g., hash(url) Keys assigned to node nearest in ID space Spreads ownership of keys evenly across nodes

21 Consistent Hashing ConstrucIon Assign C hash buckets to random points on mod 2 n circle; hash key size = n Map object to random posiion on circle Hash of object = closest clockwise bucket 0 14 12 Bucket 8 4 Desired features Balanced: No bucket has disproporionate number of objects Smoothness: AddiIon/removal of bucket does not cause movement among exising buckets (only immediate buckets) Spread and load: Small set of buckets that lie near object

22 Bloom Filters Data structure for probabilisic membership tesing Small amount of space, constant Ime operaions False posiives possible, no false negaives Useful in per flow network staisics, sharing informaion between cooperaive caches, etc. Basic idea using hash fn s and bit array Use k independent hash funcions to map item to array If all array elements are 1, it s present. Otherwise, not

23 Bloom Filters Start with an m bit array, filled with 0s. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 To insert, hash each item k Imes. If H i (x) = a, set Array[a] = 1. 0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0 To check if y is in set, check array at H i (y). All k values must be 1. 0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0 Possible to have a false posiive: all k values are 1, but y is not in set. 0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0

24 Uses of hashing Today s outline Equal cost mulipath rouing in switches Network load balancing in server clusters Per flow staisics in switches (QoS, IDS) Caching in cooperaive CDNs and P2P file sharing Data pariioning in distributed storage services Various hashing strategies Modulo hashing Consistent hashing Bloom Filters