Department of Computing Building Internet-Scale Distributed Systems for Fun and Profit Peter Pietzuch prp@doc.ic.ac.uk Large-Scale Distributed Systems Group http://platypus.doc.ic.ac.uk Peter R. Pietzuch Distributed Software Engineering (DSE) Section Department prp@doc.ic.ac.uk p@ of Computing Imperial College London Oxford University Computer Laboratory Oxford June 2009
Internet-Scale Distributed Systems - Search engines (e.g. Google, Yahoo,...) Global crawling, indexing and search Google: over 450,000 servers in at least 30 data centres world-wide (?) Content delivery networks (CDNs) (e.g. Akamai, Limelight,...) Scalable web hosting, file distribution, media streaming,... Akamai: hosting for Microsoft.com, CNN.com, BBC iplayer,... Social networking sites (e.g. Facebook, Twitter, LinkedIn,...) Facebook: serves 200 million users and stores 40 billion photos Cloud computing applications (e.g. Amazon, Microsoft, Google,...) Pay-as-you-use as you use storage and computation for applications Amazon: bought servers worth $86 million in 2008 alone 2
Internet-Scale Distributed Systems Peer-to-peer computing (e.g. Bittorrent, BOINC,...) Contribute users users resources for file sharing, scientific computing Bittorrent: 1/3 of all Internet traffic (?) [CacheLogic 04] @home computing: Quake-Catcher@home SETI@home Large-scale test-beds (e.g. PlanetLab, Emulab,...) Possible to deploy research systems in real-world l ld PlanetLab: 1041 nodes at 500 sites (May 09) Great for student projects! 3
Properties of Internet-Scale Systems Large number of users, requests, resources,... Single/multiple data centres, hosts and/or mobile clients Requirement: Scalability Wide-area Internet communication Cannot ignore network effects Requirement: Network-awareness Long-running, 24/7 service Must adapt to changing conditions and failure Requirement: Fault-tolerance t l 4
Why is Building Internet-Scale Systems Hard? Scalability is hard to achieve How to organise 1000s of processing hosts? What is the programming model? Applications must be intelligent about network use How can we achieve application requirements? Lead to data loss, loss resource shortages shortages, inconsistency PlanetLab: 630 healthy machines outt off 1041 ttotal t l (May 09) Google: 1 failure per hour in 10,000 node clusters source: Google Continuous network, node failures 5
High-level Abstractions Help Google uses several layers of abstraction Runs applications (search, mail,...) on top of highest layer Each layer is scalable, network-aware and fault-tolerant Google Apps Google Apps Google Apps MapReduce computation BigTable storage system Chubby lock service Google File System 6
Large-Scale Distributed Systems Group Research goal: Support the design and engineering of scalable and robust Internet-wide e applications Need to provide higher-level abstractions at different layers Many success stories from research exist e.g. overlay networks, distributed hash tables, network coordinates, storage and replication mechanisms,... Combination of networks, distributed systems & database research Data management layer Application layer Network layer 7
Talk Structure III. Data management layer: Supporting imperfect data processing DISSP Project: Dependable Internet-scale stream processing II. Application layer: Building adaptive overlay networks LANC CDN Project: Network/load-awareaware content delivery I. Network layer: Improving Internet routing Ukairo Project: Detour routing for applications 8
I. Improving Internet Routing Internet-scale applications want custom communication paths Skype wants path with low packet loss itunes wants path with high download rate Internet et uses two-level eehierarchical eac carouting gsce schemee AS 2 AS 3 AS 4 b 2 a AS 1 AS 5 AS 6 Internet hosts part of autonomous systems (ASs) Inter-AS routing (BGP) and intra-as routing (OSPF) Internet routing optimises for ISPs concerns! One path for all applications and no control over returned path 9
Taking Detours on the Internet Idea: Take multiple Internet paths and stitch them together Direct Path a AS 1 AS 2 AS 3 AS 4 AS 5 b AS 6 d Detour Path Resulting detour path may have better properties What causes Internet detour paths? Inter-AS routing not optimal + limited expressiveness 10
Finding Detours in the AS Graph [IPTPS 09] Idea: Analyse detours in the Internet AS graph Assume that similar AS-level paths benefit from similar detours Shared AS link a AS 1 AS2 AS 3 AS 4 b c Known good AS 5 detour AS 6 d AS 7 e Potential good detour Perform clustering on similarity metric: shared link count 11
Ukairo Project: Detour Routing for Applications Deploying general-purpose detour routing plane on PlanetLab Continuously searches for Internet detour paths Node exchange found detours using gossiping Applications can use it transparently, e.g. web browser download Open research questions What is the overhead of finding detour paths? What happens if everybody uses detour routing? What do ISPs think about this? What are the lessons for future Internet designs? 12
Talk Structure III. Data management layer: Supporting imperfect data processing DISSP Project: Dependable Internet-scale stream processing II. Application layer: Building adaptive overlay networks LANC CDN Project: Network/load-awareaware delivery of content I. Network layer: Improving Internet routing Ukairo Project: Detour routing for applications 13
II. Building Adaptive Overlay Networks Imagine your start-up idea of mugbook becomes an overnight success... mugbook mugbook How do you support such a website? Single web server? Multiple web servers in single data centre? 14
Content Delivery Networks Content delivery networks (CDNs) serve content to many clients ce world-wide de Overlay network consists of: Distributed set of servers that maintain content replicas Clients (web browsers) that request content 15
Mapping Clients to Content Servers How do we assign clients to content servers? Load awareness Don t direct clients to overloaded content servers Network awareness Don t send traffic on congested network paths Many heuristics proposed in the past Geographic location Clustering of address prefixes Proprietary solutions 16
Cost Graph Associate each client/server pair with cost Use download times from servers as cost metric Incorporates load and network congestion But: measurement overhead remains high Can t measure all costs need to estimate missing ones 17
Network Coordinates Idea: Assume cost graph embeddable in metric space Approximate missing measurements using Euclidean distances Assign each client/server a network coordinate C Distances between coordinates estimate download costs C(Client1) C(Server1) = download_time 18
Computing Network Coordinates Scalable, decentralised computation (e.g. using Vivaldi algorithm) [Dabek 04] 2-5 dimensions sufficient in practice Low measurement overhead Continuous process ~1500 web servers with network delay as cost 19
LANC Content-Delivery Network [ROADS 08] Use network coordinates to organise content servers and clients Clients keep track of content servers in neighbourhood Map clients to nearest content servers in space Overloaded content servers move away 20
Does it really work? (Yes!) Deployed LANC CDN on PlanetLab 119 content servers and 16 clients Downloaded Linux distribution from 100 web servers world-wide Tried several different assignment strategies 1.0 LANC CDN CDF Nearest 08 0.8 Random Direct 0.6 0.4 02 0.2 0.0 10 100 1000 10000 Transfer data rate per request (KB/s) 21
Talk Structure III. Data management layer: Supporting imperfect data processing DISSP Project: Dependable Internet-scale stream processing II. Application layer: Building adaptive overlay networks LANC CDN Project: Network/load-awareaware delivery of content I. Network layer: Improving Internet routing Ukairo Project: Detour routing for applications 22
III. Supporting Imperfect Data Processing Global sensing infrastructures Users Mobile sensing devices Applications Traffic monitors Data collection, f sion fusion, aggregation & dissemination Scientific instruments RFID tags g Cameras Body sensor networks Webfeeds Embedded sensors Wireless sensor networks Web content Runs continuous queries over sensor streams Failure takes out resources 23
Stream Data Model Data sources emit streams of data tuples Tuples contain schema with fields ts coord image ts coord image ts coord image ts coord image ts coord image User submit declarative queries Range of operators (filter, join, transform,...) process data tuples image merging operator coordinate transform operator coordinate transform operator 24
Failure Recovery in Stream Processing Use redundant resources to achieve dependability image merging operator coordinate transform operator image merging operator coordinate transform operator Run multiple copies of same query operator But: Internet-scale system may have not enough spare resources Instead accept degradation in processing quality Idea: Enhance stream data model to include quality information 25
Quality-Centric Stream Data Model Enhance data tuples with: D8 D7 data weight recall 3 D8 2 D7 3 0.83 1 2 0.75 1 D1 1 D3 1 D5 1 D1 1 1 D3 1 1 D5 1 1 D2 1 D4 1 D6 1 D2 1 1 D4 1 1 D6 1 1 Weight Number of data sources in tuples Recall Fraction of received tuples 26
What is it Good for? Provide feedback about result quality to users Measure of how much data made it into the result tuple Allow system to handle node and network failures 1. Proactive operator replication Invest resources where failure impact highest 2. Reactive failure recovery Decide based on lost recall if recovery worthwhile Support for smart load-shedding under resource shortage Discard tuples with lowest impact on overloaded processing nodes 27
DISSP Project: Dependable Internet-Scale Stream Processing Currently building prototype system Anybody will be able to connect sensor sources + run queries System provide best effort service given available resources Users Applications Mobile sensing devices Scientific instruments Data collection, fusion, aggregation & dissemination Traffic monitors RFID tags Cameras Body sensor networks Open questions What s the right data model for processing sensor data? How to discovery data sources in a scalable fashion? How to perform query optimisation at a global scale? Webfeeds Embedded sensors Wireless sensor networks Web content 28
Research Outlook Programming model What are the right abstractions for building Internet-scale systems? Need richer Internet interface not just send(packet,dest_ip) How do we build robust cloud applications? Currently too much focus on low-level services System management How do we provision Internet-scale systems? Scale up/down for sudden rise in popularity p flash crowds Testing and evaluation How do we test, debug and evaluate Internet-scale systems? Hard to obtain reproducible results from PlanetLab experiments 29
Conclusions Internet-scale apps have new network requirements One size doesn t fitall but it s hard to change the Internet Ukairo: Overlay networks can provide custom routing Internet-scale systems need new overlay abstractions Apply geometric algorithm to solve distributed systems problems LANC CDN: Metric space for node organisation in CDN Internet-scale systems require new data models Unrealistic to expect perfect processing Instead accept failure and overload as a fact of life DISSP: Make impact of failure on processing explicit Thank You! Any Questions? Peter Pietzuch <prp@doc.ic.ac.uk> http://platypus.doc.ic.ac.uk 30