ICP. Cache Hierarchies. Squid. Squid Cache ICP Use. Squid. Squid



Similar documents
CSC2231: Akamai. Stefan Saroiu Department of Computer Science University of Toronto

How To Understand The Power Of A Content Delivery Network (Cdn)

DATA COMMUNICATOIN NETWORKING

Web Caching and CDNs. Aditya Akella

Overview. Tor Circuit Setup (1) Tor Anonymity Network

Content Distribu-on Networks (CDNs)

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

Distributed Systems 19. Content Delivery Networks (CDN) Paul Krzyzanowski

Hashing in Networked Systems

Decentralized Peer-to-Peer Network Architecture: Gnutella and Freenet

Napster and Gnutella: a Comparison of two Popular Peer-to-Peer Protocols. Anthony J. Howe Supervisor: Dr. Mantis Cheng University of Victoria

Overlay Networks. Slides adopted from Prof. Böszörményi, Distributed Systems, Summer 2004.

Measuring the Web: Part I - - Content Delivery Networks. Prof. Anja Feldmann, Ph.D. Dr. Ramin Khalili Georgios Smaragdakis, PhD

Internet Content Distribution

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

Content Delivery Networks (CDN) Dr. Yingwu Zhu

Using IPM to Measure Network Performance

Distributed Systems. 24. Content Delivery Networks (CDN) 2013 Paul Krzyzanowski. Rutgers University. Fall 2013

Distributed Systems. 23. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2015

CHAPTER 4 PERFORMANCE ANALYSIS OF CDN IN ACADEMICS

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. Caching, Content Distribution and Load Balancing

Lecture 8a: WWW Proxy Servers and Cookies

Distributed Systems. 25. Content Delivery Networks (CDN) 2014 Paul Krzyzanowski. Rutgers University. Fall 2014

HOST AUTO CONFIGURATION (BOOTP, DHCP)

Job Reference Guide. SLAMD Distributed Load Generation Engine. Version 1.8.2

Interoperability of Peer-To-Peer File Sharing Protocols

Indirection. science can be solved by adding another level of indirection" -- Butler Lampson. "Every problem in computer

EECS 489 Winter 2010 Midterm Exam

Rapid IP redirection with SDN and NFV. Jeffrey Lai, Qiang Fu, Tim Moors December 9, 2015

Web DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

Load Balancing and Sessions. C. Kopparapu, Load Balancing Servers, Firewalls and Caches. Wiley, 2002.

Acknowledgements. Peer to Peer File Storage Systems. Target Uses. P2P File Systems CS 699. Serving data with inexpensive hosts:

CS514: Intermediate Course in Computer Systems

Concept of Cache in web proxies

Real-Time Analysis of CDN in an Academic Institute: A Simulation Study

Lecture 8a: WWW Proxy Servers and Cookies

Claudio Jeker. RIPE 41 Meeting Amsterdam, 15. January Using BGP topology information for DNS RR sorting

The Hadoop Distributed File System

Improving Gnutella Protocol: Protocol Analysis And Research Proposals

CSCI-1680 CDN & P2P Chen Avin

GLOBAL SERVER LOAD BALANCING WITH SERVERIRON

Content Delivery Networks

Advanced Networking Technologies

YouServ: A Web Hosting and Content Sharing Tool for the Masses

Request Routing, Load-Balancing and Fault- Tolerance Solution - MediaDNS

Implementing Reverse Proxy Using Squid. Prepared By Visolve Squid Team

MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?

Internet Firewall CSIS Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS net15 1. Routers can implement packet filtering

THE MASTER LIST OF DNS TERMINOLOGY. v 2.0

Understanding Slow Start

Evaluating Cooperative Web Caching Protocols for Emerging Network Technologies 1

Working With Virtual Hosts on Pramati Server

COMP 361 Computer Communications Networks. Fall Semester Midterm Examination

Single Pass Load Balancing with Session Persistence in IPv6 Network. C. J. (Charlie) Liu Network Operations Charter Communications

Content Delivery Networks. Shaxun Chen April 21, 2009

Carnegie Mellon Computer Science Department Spring 2007 Theory Problem Set 2

First Midterm for ECE374 02/25/15 Solution!!

An Introduction to Peer-to-Peer Networks

FortiOS Handbook - Load Balancing VERSION 5.2.2

A Measurement Study of Peer-to-Peer File Sharing Systems

About Firewall Protection

HollyShare: Peer-to-Peer File Sharing Application

Application Layer -1- Network Tools

HW2 Grade. CS585: Applications. Traditional Applications SMTP SMTP HTTP 11/10/2009

THE MASTER LIST OF DNS TERMINOLOGY. First Edition

P2P Storage Systems. Prof. Chun-Hsin Wu Dept. Computer Science & Info. Eng. National University of Kaohsiung

How To Model The Content Delivery Network (Cdn) From A Content Bridge To A Cloud (Cloud)

Citrix NetScaler Global Server Load Balancing Primer:

Protocolo HTTP. Web and HTTP. HTTP overview. HTTP overview

CDN and Traffic-structure

Scalable Source Routing

Web. Services. Web Technologies. Today. Web. Technologies. Internet WWW. Protocols TCP/IP HTTP. Apache. Next Time. Lecture # Apache.

Sitecore Health. Christopher Wojciech. netzkern AG. Sitecore User Group Conference 2015

SANE: A Protection Architecture For Enterprise Networks

The Gnutella Protocol Specification v0.4

Computer Networks - CS132/EECS148 - Spring

1. The Web: HTTP; file transfer: FTP; remote login: Telnet; Network News: NNTP; SMTP.

Content Distribution Networks (CDN)

Denial of Service. Tom Chen SMU

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Global Server Load Balancing

The Role and uses of Peer-to-Peer in file-sharing. Computer Communication & Distributed Systems EDA 390

Applications & Application-Layer Protocols: The Domain Name System and Peerto-Peer

Glossary of Technical Terms Related to IPv6

Simple Solution for a Location Service. Naming vs. Locating Entities. Forwarding Pointers (2) Forwarding Pointers (1)

CS5412: TIER 2 OVERLAYS

IP SLAs Overview. Finding Feature Information. Information About IP SLAs. IP SLAs Technology Overview

DNS, CDNs Weds March Lecture 13. What is the relationship between a domain name (e.g., youtube.com) and an IP address?

FAQs for Oracle iplanet Proxy Server 4.0

Server Installation Manual 4.4.1

CS 5480/6480: Computer Networks Spring 2012 Homework 1 Solutions Due by 9:00 AM MT on January 31 st 2012

Scalable Internet Services and Load Balancing

Transparent Identification of Users

Transcription:

Caching & CDN s 15-44: Computer Networking L-21: Caching and CDNs HTTP APIs Assigned reading [FCAB9] Summary Cache: A Scalable Wide- Area Cache Sharing Protocol [Cla00] Freenet: A Distributed Anonymous Information Storage and Retrieval System 2 Overview Web caches Content distribution networks Peer-to-peer networks Web Caching Why cache HTTP objects? Reduce client response time Reduce network bandwidth usage Wide area vs. local area use These two objectives are often in conflict May do exhaustive local search to avoid using wide area bandwidth Prefetching uses extra bandwidth to reduce client response time 3 4 Web Proxies Also used for security Proxy is only host that can access Internet Administrators makes sure that it is secure Performance How many clients can a single proxy handle? Caching Provides a centralized coordination point to share information across clients How to index Early caches used file system to find file Metadata now kept in memory on most caches Caching Proxies - Sources for misses Capacity How large a cache is necessary or equivalent to infinite On disk vs. in memory typically on disk Compulsory First time access to document Non-cacheable documents CGI-scripts Personalized documents (cookies, etc) Encrypted data (SSL) Consistency Document has been updated/expired before reuse Conflict no such issue 5 6 1

Cache Hierarchies Use hierarchy to scale a proxy to more than limited population Why? Larger population = higher hit rate Larger effective cache size Why is population for single proxy limited? Performance, administration, policy, etc. NLANR cache hierarchy Most popular 9 top level caches Internet Cache Protocol based () /Harvest proxy Simple protocol to query another cache for content Uses UDP why? message contents Type query, hit, hit_obj, miss Other identifier, URL, version, sender address (is this needed?) Special message types used with UDP echo port Used to probe server or dumb cache Transfers between caches still done using HTTP Cache Use Upon query that is not in cache Sends _ to each peer (or _Decho to echo port of peer caches that do not speak ) May also send _Secho to origin server s echo port Sets time to short period (default 2 sec) Peer caches process queries and return either _Hit or _Miss Proxy begins transfer upon reception of _Hit, _Decho or _Secho Upon timer expiration, proxy object from closest (RTT) parent proxy Would be better to direct to parent that is towards origin server 9 10 MISS Child MISS Child Child 11 12 2

HIT HIT MISS 13 14 vs HTTP Why not just use HTTP to query other caches? is lightweight positive and negative Makes it easy to process quickly Caches may process many more s than HTTP s HTTP has many functions that are not supported by does not evolve with HTTP changes Adds extra RTT to any proxy-proxy transfer 15 16 Optimal Cache Mesh Behavior Minimize number of hops through mesh Each hop add significant latency hops can cost a 2 sec timeout each! Strict hierarchies cost disk lookup, etc. Especially painful for misses Share across many users and scale to many caches does not scale to a large number of peers Cache and fetch data close to clients Hinting Have proxies store content as well as metadata about contents of other proxies (hints) Minimizes number of hops through mesh Size of hint cache is a concern size of key vs. size of document Having hints can help consistency Makes it possible to push updated documents or invalidations to other caches How to keep hints up-to-date? Not critical incorrect hint results in extra lookups not incorrect behavior Can batch updates to peers 1 1 3

Summary Cache Primary innovation use of compact representation of cache contents Typical cache has GB of space and KB objects 1M objects Using 16byte MD5 16MB per peer Solution: Bloom filters Delayed propagation of hints Waits until threshold %age of cached documents are not in summary Perhaps should have looked at %age of false hits? Bloom Filters Proxy contents summarize as a M bit value Each page stored contributes k hash values in range [1..M] Bits for k hashes set in summary Check for page = if all pages k hash bits are set in summary it is likely that proxy has summary Tradeoff false positives Larger M reduces false positives What should M be? -16 * number of pages seems to work well What about k? Is related to (M/number of pages) 4 works for above M 19 20 Leases Only consistency mechanism in HTTP is for clients to poll server for updates Should HTTP also support invalidations? Problem: server would have to keep track of many, many clients who may have document Possible solution: leases Leases server promises to provide invalidates for a particular lease duration Server can adapt time/duration of lease as needed To number of clients, frequency of page change, etc. 21 Problems Over 50% of all HTTP objects are uncacheable why? Not easily solvable Dynamic data stock prices, scores, web cams CGI scripts results based on passed parameters Obvious fixes SSL encrypted data is not cacheable Most web clients don t handle mixed pages well many generic objects transferred with SSL Cookies results may be based on passed data Hit metering owner wants to measure # of hits for revenue, etc. What will be the end result? 22 Proxy implementation problems Aborted transfers Many proxies transfer entire document even though client has stopped eliminates saving of bandwidth Making objects cacheable Proxy s apply heuristics cookies don t apply to some objects, guesswork on expiration May not match client behavior/desires misconfiguration Many clients have either absurdly small caches or no cache How much would hit rate drop if clients did the same things as proxies Problems Population Size How does population size affect hit rate? Critical to understand usefulness of hierarchy or placement of caches Issues: frequency of access vs. frequency of change (ignore working set size infinite cache) UW/Msoft measurement hit rate rises quickly to about 5000 people and very slowly beyond that Proxies/Hierarchies don t make much sense for populations > 5000 Single proxies can easily handle such populations Hierarchies only make sense for policy/administrative reasons 23 24 4

Problems Common Interests Do different communities have different interests? I.e. do CS and English majors access same pages? IBM and Pepsi workers? Has some impact UW departments have about 5% higher hit rate than randomly chosen UW groups Many common interests remain Is this true in general? UW students have more in common than IBM & Pepsi workers Some related observations Geographic caching server traces have shown that there is geographic locality to interest UW & MS hierarchy performance is bad could be due to size or interests? Overview Web caches Content distribution networks Peer-to-peer networks 25 26 CDN Replicate content on many servers Challenges How to replicate content Where to replicate content How to find replicated content How to choose among know replicas How to direct clients towards replica Discussed in DNS/server selection lecture DNS, HTTP 304 response, anycast, etc. Akamai How Akamai Works s fetch html document from primary server E.g. fetch index.html from cnn.com URLs for replicated content are replaced in html E.g. <img src= http://cnn.com/af/x.gif > replaced with <img src= http://a3.g.akamaitech.net//23/cnn.com/af/x.gif > is forced to resolve axyz.g.akamaitech.net hostname 2 2 How Akamai Works How is content replicated? Akamai only replicates static content Modified name contains original file Akamai server is asked for content First checks local cache If not in cache, s file from primary server and caches file How Akamai Works Root server gives NS record for akamai.net Akamai.net name server returns NS record for g.akamaitech.net Name server chosen to be in region of client s name server TTL is large G.akamaitech.net nameserver choses server in region Should try to chose server that has file in cache - How to choose? Uses axyz name and consistent hash TTL is small 29 30 5

Consistent Hash view = subset of all hash buckets that are visible Desired features Smoothness little impact on hash bucket contents when buckets are added/removed Spread small set of hash buckets that may hold an object regardless of views Load across all views # of objects assigned to hash bucket is small Consistent Hash Example Construction Assign each of C hash buckets to Klog(C) random points on unit interval Map object to random position on unit interval Hash of object = closest bucket Monotone addition of bucket does not cause movement between existing buckets Spread & Load small set of buckets that lie near object Balance no bucket is responsible for large portion of unit interval 31 32 How Akamai Works cnn.com (content provider) DNS root server Akamai server Akamai Subsequent Requests cnn.com (content provider) DNS root server Akamai server Get 11 index. html 1 2 3 End-user Get foo.jpg 4 12 9 5 6 10 Get /cnn.com/foo.jpg Akamai high- level DNS server Akamai low-level DNS server Closest Akamai server Get index. html 1 2 Akamai high- level DNS server End-user 9 10 Get /cnn.com/foo.jpg Akamai low-level DNS server Closest Akamai server 33 34 Overview Web caches Content distribution networks Peer-to-peer networks Peer-to-Peer Networks Typically each member stores content that it desires Basically a replication system for files Always a tradeoff between possible location of files and searching difficulty Peer-to-peer allow files to be anywhere searching is the challenge Dynamic member list makes it more difficult 35 36 6

Napster Simple centralized scheme motivated by ability to sell/control On startup client contacts central server and reports list of files Upon query, central server returns list of possible clients that store data Transfer is done peer-to-peer Gnutella On startup client contacts any servent (server + client) in network Basic message header Unique ID, TTL, Hops Transfers are done with HTTP between peers Servent interconnection used to forward control (queries, hits, etc) 3 3 Gnutella Details Message types Ping probes network for other servents Pong response to ping, contains IP addr, # of files, # of Kbytes shared search criteria + speed requirement of servent Hit successful response to, contains addr + port to transfer from, speed of servent, number of hits, hit results, servent ID Push to servent ID to initiate connection, used to traverse firewalls Ping, Queries are flooded Hit, Pong, Push reverse path of previous message 39 Freenet Anonymity a primary goal Files are stored according to associated key Core idea: try to cluster information about similar keys Messages Random 64bit ID used for loop detection TTL TTL 1 are forwarded with finite probablity Helps anonymity depth counter Opposite of TTL incremented with each hop Depth counter initialized to small random value 40 Freenet Requests Freenet Request User s key XYZ not in local cache Looks up nearest key in routing table and forwards to corresponding node If reaches node with data, it forwards data back to upstream or Requestor adds file to cache, adds entry in routing table Any node forwarding reply may change the source of the reply (to itself or any other node) Helps anonymity If data is not found, failure is reported back Requestor then tries next closest match in routing table Data Request Data Reply Request Failed A 1 12 6 F 5 B 4 2 3 11 10 9 E C D 41 42

Freenet Search Features Nodes tend to specialize in searching for similar keys over time Gets queries from other nodes for similar keys Nodes store similar keys over time Caching of files as a result of successful queries Similarity of keys does not reflect similarity of files Routing does not reflect network topology Freenet File Creation Key for file generated and searched helps identify collision Not found ( All clear ) result indicates success Source of insert message can be change by any forwarding node Creation mechanism adds files/info to locations with similar keys New nodes are discovered through file creation Erroneous/malicious inserts propagate original file further 43 44 Cache Management LRU Cache of files Files are not guaranteed to live forever Files fade away as fewer s are made for them File contents can be encrypted with original text names as key Cache owners do not know either original name or contents cannot be held responsible Freenet Naming Freenet deals with keys But humans need names Keys are flat would like structure as well Could have files that store keys for other files File /text/philiosophy could store keys for files in that directory how to update this file though? Search engine undesirable centralized solution 45 46 Freenet Naming - Indirect files Normal files stored using content-hash key Prevents tampering, enables versioning, etc. Indirect files stored using name-based key Indirect files store keys for normal files Inserted at same time as normal file Has same update problems as directory files Updates handled by signing indirect file with public/private key Collisions for insert of new indirect file handled specially check to ensure same key used for signing Allows for files to be split into multiple smaller parts Next Lecture: QOS & IntServ QOS IntServ Architecture Assigned reading [She95] Fundamental Design Issues for the Future Internet [CSZ92] Supporting Real-Time Applications in an Integrated Services Packet Network: Architecture and Mechanisms 4 4