Systems for Fun and Profit



Similar documents
From Internet Data Centers to Data Centers in the Cloud

DATA COMMUNICATOIN NETWORKING

Indirection. science can be solved by adding another level of indirection" -- Butler Lampson. "Every problem in computer

Web DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

Web Caching and CDNs. Aditya Akella

Measuring the Web: Part I - - Content Delivery Networks. Prof. Anja Feldmann, Ph.D. Dr. Ramin Khalili Georgios Smaragdakis, PhD

Data Center Content Delivery Network

Distributed Systems. 23. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2015

Peer-to-Peer Networks

Distributed Systems. 25. Content Delivery Networks (CDN) 2014 Paul Krzyzanowski. Rutgers University. Fall 2014

THEMIS: Fairness in Data Stream Processing under Overload

Hadoop. Sunday, November 25, 12

How To Understand The Power Of A Content Delivery Network (Cdn)

The Effect of Caches for Mobile Broadband Internet Access

Testing & Assuring Mobile End User Experience Before Production. Neotys

Distributed Systems 19. Content Delivery Networks (CDN) Paul Krzyzanowski

A Topology-Aware Relay Lookup Scheme for P2P VoIP System

Content Distribu-on Networks (CDNs)

A programming model in Cloud: MapReduce

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Software Defined Networking What is it, how does it work, and what is it good for?

Hypertable Architecture Overview

Networking in the Hadoop Cluster

Challenges for Data Driven Systems

Distributed Systems. 24. Content Delivery Networks (CDN) 2013 Paul Krzyzanowski. Rutgers University. Fall 2013

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

NoSQL. Thomas Neumann 1 / 22

Chapter 7. Using Hadoop Cluster and MapReduce

CHAPTER 8 CONCLUSION AND FUTURE ENHANCEMENTS

Distribution transparency. Degree of transparency. Openness of distributed systems

Study of Flexible Contents Delivery System. With Dynamic Server Deployment

Peer-to-Peer Networks. Chapter 6: P2P Content Distribution

Exploring Big Data in Social Networks

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

FortiBalancer: Global Server Load Balancing WHITE PAPER

A very short history of networking

Content Delivery Networks. Shaxun Chen April 21, 2009

Introduction: Why do we need computer networks?

Cloud Computing. Theory and Practice. Dan C. Marinescu. Morgan Kaufmann is an imprint of Elsevier HEIDELBERG LONDON AMSTERDAM BOSTON

Internet Content Distribution

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

Introduction to Hadoop

GLOBAL SERVER LOAD BALANCING WITH SERVERIRON

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

So What s the Big Deal?

Distributed Systems. Tutorial 12 Cassandra

Big Table A Distributed Storage System For Data

NoSQL Data Base Basics

Content Distribution Networks (CDN)

A PROXIMITY-AWARE INTEREST-CLUSTERED P2P FILE SHARING SYSTEM

Advanced Farm Administration with XenApp Worker Groups

Cloud Enabled Emergency Navigation Using Faster-than-real-time Simulation

Inter-domain Routing. Outline. Border Gateway Protocol

Large-Scale Web Applications

How To Scale Out Of A Nosql Database

Scaling Big Data Mining Infrastructure: The Smart Protection Network Experience

DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING. Carlos de Alfonso Andrés García Vicente Hernández

Load Distribution in Large Scale Network Monitoring Infrastructures

Based on Computer Networking, 4 th Edition by Kurose and Ross

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

Apache HBase. Crazy dances on the elephant back

Skype network has three types of machines, all running the same software and treated equally:

Essential Ingredients for Optimizing End User Experience Monitoring

Internet Firewall CSIS Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS net15 1. Routers can implement packet filtering

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang

International Journal of Scientific & Engineering Research, Volume 6, Issue 4, April ISSN

Parallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data

Cloud Computing Trends

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

Optimizing Data Center Networks for Cloud Computing

Imperial College London

The Internet: A Remarkable Story. Inside the Net: A Different Story. Networks are Hard to Manage. Software Defined Networking Concepts

CDN and Traffic-structure

How To Handle Big Data With A Data Scientist

Software Defined Networking & Openflow

Availability of Services in the Era of Cloud Computing

Portable Wireless Mesh Networks: Competitive Differentiation

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

The old Internet. Software in the Network: Outline. Traditional Design. 1) Basic Caching. The Arrival of Software (in the network)

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

Distributed Systems Lecture 1 1

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

The Power of Social Data: Transforming Big Data into Decisions. Andreas Weigend

HPAM: Hybrid Protocol for Application Level Multicast. Yeo Chai Kiat

SiteCelerate white paper

Our Data & Methodology. Understanding the Digital World by Turning Data into Insights

Transcription:

Department of Computing Building Internet-Scale Distributed Systems for Fun and Profit Peter Pietzuch prp@doc.ic.ac.uk Large-Scale Distributed Systems Group http://platypus.doc.ic.ac.uk Peter R. Pietzuch Distributed Software Engineering (DSE) Section Department prp@doc.ic.ac.uk p@ of Computing Imperial College London Oxford University Computer Laboratory Oxford June 2009

Internet-Scale Distributed Systems - Search engines (e.g. Google, Yahoo,...) Global crawling, indexing and search Google: over 450,000 servers in at least 30 data centres world-wide (?) Content delivery networks (CDNs) (e.g. Akamai, Limelight,...) Scalable web hosting, file distribution, media streaming,... Akamai: hosting for Microsoft.com, CNN.com, BBC iplayer,... Social networking sites (e.g. Facebook, Twitter, LinkedIn,...) Facebook: serves 200 million users and stores 40 billion photos Cloud computing applications (e.g. Amazon, Microsoft, Google,...) Pay-as-you-use as you use storage and computation for applications Amazon: bought servers worth $86 million in 2008 alone 2

Internet-Scale Distributed Systems Peer-to-peer computing (e.g. Bittorrent, BOINC,...) Contribute users users resources for file sharing, scientific computing Bittorrent: 1/3 of all Internet traffic (?) [CacheLogic 04] @home computing: Quake-Catcher@home SETI@home Large-scale test-beds (e.g. PlanetLab, Emulab,...) Possible to deploy research systems in real-world l ld PlanetLab: 1041 nodes at 500 sites (May 09) Great for student projects! 3

Properties of Internet-Scale Systems Large number of users, requests, resources,... Single/multiple data centres, hosts and/or mobile clients Requirement: Scalability Wide-area Internet communication Cannot ignore network effects Requirement: Network-awareness Long-running, 24/7 service Must adapt to changing conditions and failure Requirement: Fault-tolerance t l 4

Why is Building Internet-Scale Systems Hard? Scalability is hard to achieve How to organise 1000s of processing hosts? What is the programming model? Applications must be intelligent about network use How can we achieve application requirements? Lead to data loss, loss resource shortages shortages, inconsistency PlanetLab: 630 healthy machines outt off 1041 ttotal t l (May 09) Google: 1 failure per hour in 10,000 node clusters source: Google Continuous network, node failures 5

High-level Abstractions Help Google uses several layers of abstraction Runs applications (search, mail,...) on top of highest layer Each layer is scalable, network-aware and fault-tolerant Google Apps Google Apps Google Apps MapReduce computation BigTable storage system Chubby lock service Google File System 6

Large-Scale Distributed Systems Group Research goal: Support the design and engineering of scalable and robust Internet-wide e applications Need to provide higher-level abstractions at different layers Many success stories from research exist e.g. overlay networks, distributed hash tables, network coordinates, storage and replication mechanisms,... Combination of networks, distributed systems & database research Data management layer Application layer Network layer 7

Talk Structure III. Data management layer: Supporting imperfect data processing DISSP Project: Dependable Internet-scale stream processing II. Application layer: Building adaptive overlay networks LANC CDN Project: Network/load-awareaware content delivery I. Network layer: Improving Internet routing Ukairo Project: Detour routing for applications 8

I. Improving Internet Routing Internet-scale applications want custom communication paths Skype wants path with low packet loss itunes wants path with high download rate Internet et uses two-level eehierarchical eac carouting gsce schemee AS 2 AS 3 AS 4 b 2 a AS 1 AS 5 AS 6 Internet hosts part of autonomous systems (ASs) Inter-AS routing (BGP) and intra-as routing (OSPF) Internet routing optimises for ISPs concerns! One path for all applications and no control over returned path 9

Taking Detours on the Internet Idea: Take multiple Internet paths and stitch them together Direct Path a AS 1 AS 2 AS 3 AS 4 AS 5 b AS 6 d Detour Path Resulting detour path may have better properties What causes Internet detour paths? Inter-AS routing not optimal + limited expressiveness 10

Finding Detours in the AS Graph [IPTPS 09] Idea: Analyse detours in the Internet AS graph Assume that similar AS-level paths benefit from similar detours Shared AS link a AS 1 AS2 AS 3 AS 4 b c Known good AS 5 detour AS 6 d AS 7 e Potential good detour Perform clustering on similarity metric: shared link count 11

Ukairo Project: Detour Routing for Applications Deploying general-purpose detour routing plane on PlanetLab Continuously searches for Internet detour paths Node exchange found detours using gossiping Applications can use it transparently, e.g. web browser download Open research questions What is the overhead of finding detour paths? What happens if everybody uses detour routing? What do ISPs think about this? What are the lessons for future Internet designs? 12

Talk Structure III. Data management layer: Supporting imperfect data processing DISSP Project: Dependable Internet-scale stream processing II. Application layer: Building adaptive overlay networks LANC CDN Project: Network/load-awareaware delivery of content I. Network layer: Improving Internet routing Ukairo Project: Detour routing for applications 13

II. Building Adaptive Overlay Networks Imagine your start-up idea of mugbook becomes an overnight success... mugbook mugbook How do you support such a website? Single web server? Multiple web servers in single data centre? 14

Content Delivery Networks Content delivery networks (CDNs) serve content to many clients ce world-wide de Overlay network consists of: Distributed set of servers that maintain content replicas Clients (web browsers) that request content 15

Mapping Clients to Content Servers How do we assign clients to content servers? Load awareness Don t direct clients to overloaded content servers Network awareness Don t send traffic on congested network paths Many heuristics proposed in the past Geographic location Clustering of address prefixes Proprietary solutions 16

Cost Graph Associate each client/server pair with cost Use download times from servers as cost metric Incorporates load and network congestion But: measurement overhead remains high Can t measure all costs need to estimate missing ones 17

Network Coordinates Idea: Assume cost graph embeddable in metric space Approximate missing measurements using Euclidean distances Assign each client/server a network coordinate C Distances between coordinates estimate download costs C(Client1) C(Server1) = download_time 18

Computing Network Coordinates Scalable, decentralised computation (e.g. using Vivaldi algorithm) [Dabek 04] 2-5 dimensions sufficient in practice Low measurement overhead Continuous process ~1500 web servers with network delay as cost 19

LANC Content-Delivery Network [ROADS 08] Use network coordinates to organise content servers and clients Clients keep track of content servers in neighbourhood Map clients to nearest content servers in space Overloaded content servers move away 20

Does it really work? (Yes!) Deployed LANC CDN on PlanetLab 119 content servers and 16 clients Downloaded Linux distribution from 100 web servers world-wide Tried several different assignment strategies 1.0 LANC CDN CDF Nearest 08 0.8 Random Direct 0.6 0.4 02 0.2 0.0 10 100 1000 10000 Transfer data rate per request (KB/s) 21

Talk Structure III. Data management layer: Supporting imperfect data processing DISSP Project: Dependable Internet-scale stream processing II. Application layer: Building adaptive overlay networks LANC CDN Project: Network/load-awareaware delivery of content I. Network layer: Improving Internet routing Ukairo Project: Detour routing for applications 22

III. Supporting Imperfect Data Processing Global sensing infrastructures Users Mobile sensing devices Applications Traffic monitors Data collection, f sion fusion, aggregation & dissemination Scientific instruments RFID tags g Cameras Body sensor networks Webfeeds Embedded sensors Wireless sensor networks Web content Runs continuous queries over sensor streams Failure takes out resources 23

Stream Data Model Data sources emit streams of data tuples Tuples contain schema with fields ts coord image ts coord image ts coord image ts coord image ts coord image User submit declarative queries Range of operators (filter, join, transform,...) process data tuples image merging operator coordinate transform operator coordinate transform operator 24

Failure Recovery in Stream Processing Use redundant resources to achieve dependability image merging operator coordinate transform operator image merging operator coordinate transform operator Run multiple copies of same query operator But: Internet-scale system may have not enough spare resources Instead accept degradation in processing quality Idea: Enhance stream data model to include quality information 25

Quality-Centric Stream Data Model Enhance data tuples with: D8 D7 data weight recall 3 D8 2 D7 3 0.83 1 2 0.75 1 D1 1 D3 1 D5 1 D1 1 1 D3 1 1 D5 1 1 D2 1 D4 1 D6 1 D2 1 1 D4 1 1 D6 1 1 Weight Number of data sources in tuples Recall Fraction of received tuples 26

What is it Good for? Provide feedback about result quality to users Measure of how much data made it into the result tuple Allow system to handle node and network failures 1. Proactive operator replication Invest resources where failure impact highest 2. Reactive failure recovery Decide based on lost recall if recovery worthwhile Support for smart load-shedding under resource shortage Discard tuples with lowest impact on overloaded processing nodes 27

DISSP Project: Dependable Internet-Scale Stream Processing Currently building prototype system Anybody will be able to connect sensor sources + run queries System provide best effort service given available resources Users Applications Mobile sensing devices Scientific instruments Data collection, fusion, aggregation & dissemination Traffic monitors RFID tags Cameras Body sensor networks Open questions What s the right data model for processing sensor data? How to discovery data sources in a scalable fashion? How to perform query optimisation at a global scale? Webfeeds Embedded sensors Wireless sensor networks Web content 28

Research Outlook Programming model What are the right abstractions for building Internet-scale systems? Need richer Internet interface not just send(packet,dest_ip) How do we build robust cloud applications? Currently too much focus on low-level services System management How do we provision Internet-scale systems? Scale up/down for sudden rise in popularity p flash crowds Testing and evaluation How do we test, debug and evaluate Internet-scale systems? Hard to obtain reproducible results from PlanetLab experiments 29

Conclusions Internet-scale apps have new network requirements One size doesn t fitall but it s hard to change the Internet Ukairo: Overlay networks can provide custom routing Internet-scale systems need new overlay abstractions Apply geometric algorithm to solve distributed systems problems LANC CDN: Metric space for node organisation in CDN Internet-scale systems require new data models Unrealistic to expect perfect processing Instead accept failure and overload as a fact of life DISSP: Make impact of failure on processing explicit Thank You! Any Questions? Peter Pietzuch <prp@doc.ic.ac.uk> http://platypus.doc.ic.ac.uk 30