Systems for Fun and Profit

Size: px
Start display at page:

Download "Systems for Fun and Profit"

Transcription

1 Department of Computing Building Internet-Scale Distributed Systems for Fun and Profit Peter Pietzuch Large-Scale Distributed Systems Group Peter R. Pietzuch Distributed Software Engineering (DSE) Section Department of Computing Imperial College London Oxford University Computer Laboratory Oxford June 2009

2 Internet-Scale Distributed Systems - Search engines (e.g. Google, Yahoo,...) Global crawling, indexing and search Google: over 450,000 servers in at least 30 data centres world-wide (?) Content delivery networks (CDNs) (e.g. Akamai, Limelight,...) Scalable web hosting, file distribution, media streaming,... Akamai: hosting for Microsoft.com, CNN.com, BBC iplayer,... Social networking sites (e.g. Facebook, Twitter, LinkedIn,...) Facebook: serves 200 million users and stores 40 billion photos Cloud computing applications (e.g. Amazon, Microsoft, Google,...) Pay-as-you-use as you use storage and computation for applications Amazon: bought servers worth $86 million in 2008 alone 2

3 Internet-Scale Distributed Systems Peer-to-peer computing (e.g. Bittorrent, BOINC,...) Contribute users users resources for file sharing, scientific computing Bittorrent: 1/3 of all Internet traffic (?) [CacheLogic computing: Large-scale test-beds (e.g. PlanetLab, Emulab,...) Possible to deploy research systems in real-world l ld PlanetLab: 1041 nodes at 500 sites (May 09) Great for student projects! 3

4 Properties of Internet-Scale Systems Large number of users, requests, resources,... Single/multiple data centres, hosts and/or mobile clients Requirement: Scalability Wide-area Internet communication Cannot ignore network effects Requirement: Network-awareness Long-running, 24/7 service Must adapt to changing conditions and failure Requirement: Fault-tolerance t l 4

5 Why is Building Internet-Scale Systems Hard? Scalability is hard to achieve How to organise 1000s of processing hosts? What is the programming model? Applications must be intelligent about network use How can we achieve application requirements? Lead to data loss, loss resource shortages shortages, inconsistency PlanetLab: 630 healthy machines outt off 1041 ttotal t l (May 09) Google: 1 failure per hour in 10,000 node clusters source: Google Continuous network, node failures 5

6 High-level Abstractions Help Google uses several layers of abstraction Runs applications (search, mail,...) on top of highest layer Each layer is scalable, network-aware and fault-tolerant Google Apps Google Apps Google Apps MapReduce computation BigTable storage system Chubby lock service Google File System 6

7 Large-Scale Distributed Systems Group Research goal: Support the design and engineering of scalable and robust Internet-wide e applications Need to provide higher-level abstractions at different layers Many success stories from research exist e.g. overlay networks, distributed hash tables, network coordinates, storage and replication mechanisms,... Combination of networks, distributed systems & database research Data management layer Application layer Network layer 7

8 Talk Structure III. Data management layer: Supporting imperfect data processing DISSP Project: Dependable Internet-scale stream processing II. Application layer: Building adaptive overlay networks LANC CDN Project: Network/load-awareaware content delivery I. Network layer: Improving Internet routing Ukairo Project: Detour routing for applications 8

9 I. Improving Internet Routing Internet-scale applications want custom communication paths Skype wants path with low packet loss itunes wants path with high download rate Internet et uses two-level eehierarchical eac carouting gsce schemee AS 2 AS 3 AS 4 b 2 a AS 1 AS 5 AS 6 Internet hosts part of autonomous systems (ASs) Inter-AS routing (BGP) and intra-as routing (OSPF) Internet routing optimises for ISPs concerns! One path for all applications and no control over returned path 9

10 Taking Detours on the Internet Idea: Take multiple Internet paths and stitch them together Direct Path a AS 1 AS 2 AS 3 AS 4 AS 5 b AS 6 d Detour Path Resulting detour path may have better properties What causes Internet detour paths? Inter-AS routing not optimal + limited expressiveness 10

11 Finding Detours in the AS Graph [IPTPS 09] Idea: Analyse detours in the Internet AS graph Assume that similar AS-level paths benefit from similar detours Shared AS link a AS 1 AS2 AS 3 AS 4 b c Known good AS 5 detour AS 6 d AS 7 e Potential good detour Perform clustering on similarity metric: shared link count 11

12 Ukairo Project: Detour Routing for Applications Deploying general-purpose detour routing plane on PlanetLab Continuously searches for Internet detour paths Node exchange found detours using gossiping Applications can use it transparently, e.g. web browser download Open research questions What is the overhead of finding detour paths? What happens if everybody uses detour routing? What do ISPs think about this? What are the lessons for future Internet designs? 12

13 Talk Structure III. Data management layer: Supporting imperfect data processing DISSP Project: Dependable Internet-scale stream processing II. Application layer: Building adaptive overlay networks LANC CDN Project: Network/load-awareaware delivery of content I. Network layer: Improving Internet routing Ukairo Project: Detour routing for applications 13

14 II. Building Adaptive Overlay Networks Imagine your start-up idea of mugbook becomes an overnight success... mugbook mugbook How do you support such a website? Single web server? Multiple web servers in single data centre? 14

15 Content Delivery Networks Content delivery networks (CDNs) serve content to many clients ce world-wide de Overlay network consists of: Distributed set of servers that maintain content replicas Clients (web browsers) that request content 15

16 Mapping Clients to Content Servers How do we assign clients to content servers? Load awareness Don t direct clients to overloaded content servers Network awareness Don t send traffic on congested network paths Many heuristics proposed in the past Geographic location Clustering of address prefixes Proprietary solutions 16

17 Cost Graph Associate each client/server pair with cost Use download times from servers as cost metric Incorporates load and network congestion But: measurement overhead remains high Can t measure all costs need to estimate missing ones 17

18 Network Coordinates Idea: Assume cost graph embeddable in metric space Approximate missing measurements using Euclidean distances Assign each client/server a network coordinate C Distances between coordinates estimate download costs C(Client1) C(Server1) = download_time 18

19 Computing Network Coordinates Scalable, decentralised computation (e.g. using Vivaldi algorithm) [Dabek 04] 2-5 dimensions sufficient in practice Low measurement overhead Continuous process ~1500 web servers with network delay as cost 19

20 LANC Content-Delivery Network [ROADS 08] Use network coordinates to organise content servers and clients Clients keep track of content servers in neighbourhood Map clients to nearest content servers in space Overloaded content servers move away 20

21 Does it really work? (Yes!) Deployed LANC CDN on PlanetLab 119 content servers and 16 clients Downloaded Linux distribution from 100 web servers world-wide Tried several different assignment strategies 1.0 LANC CDN CDF Nearest Random Direct Transfer data rate per request (KB/s) 21

22 Talk Structure III. Data management layer: Supporting imperfect data processing DISSP Project: Dependable Internet-scale stream processing II. Application layer: Building adaptive overlay networks LANC CDN Project: Network/load-awareaware delivery of content I. Network layer: Improving Internet routing Ukairo Project: Detour routing for applications 22

23 III. Supporting Imperfect Data Processing Global sensing infrastructures Users Mobile sensing devices Applications Traffic monitors Data collection, f sion fusion, aggregation & dissemination Scientific instruments RFID tags g Cameras Body sensor networks Webfeeds Embedded sensors Wireless sensor networks Web content Runs continuous queries over sensor streams Failure takes out resources 23

24 Stream Data Model Data sources emit streams of data tuples Tuples contain schema with fields ts coord image ts coord image ts coord image ts coord image ts coord image User submit declarative queries Range of operators (filter, join, transform,...) process data tuples image merging operator coordinate transform operator coordinate transform operator 24

25 Failure Recovery in Stream Processing Use redundant resources to achieve dependability image merging operator coordinate transform operator image merging operator coordinate transform operator Run multiple copies of same query operator But: Internet-scale system may have not enough spare resources Instead accept degradation in processing quality Idea: Enhance stream data model to include quality information 25

26 Quality-Centric Stream Data Model Enhance data tuples with: D8 D7 data weight recall 3 D8 2 D D1 1 D3 1 D5 1 D1 1 1 D3 1 1 D5 1 1 D2 1 D4 1 D6 1 D2 1 1 D4 1 1 D6 1 1 Weight Number of data sources in tuples Recall Fraction of received tuples 26

27 What is it Good for? Provide feedback about result quality to users Measure of how much data made it into the result tuple Allow system to handle node and network failures 1. Proactive operator replication Invest resources where failure impact highest 2. Reactive failure recovery Decide based on lost recall if recovery worthwhile Support for smart load-shedding under resource shortage Discard tuples with lowest impact on overloaded processing nodes 27

28 DISSP Project: Dependable Internet-Scale Stream Processing Currently building prototype system Anybody will be able to connect sensor sources + run queries System provide best effort service given available resources Users Applications Mobile sensing devices Scientific instruments Data collection, fusion, aggregation & dissemination Traffic monitors RFID tags Cameras Body sensor networks Open questions What s the right data model for processing sensor data? How to discovery data sources in a scalable fashion? How to perform query optimisation at a global scale? Webfeeds Embedded sensors Wireless sensor networks Web content 28

29 Research Outlook Programming model What are the right abstractions for building Internet-scale systems? Need richer Internet interface not just send(packet,dest_ip) How do we build robust cloud applications? Currently too much focus on low-level services System management How do we provision Internet-scale systems? Scale up/down for sudden rise in popularity p flash crowds Testing and evaluation How do we test, debug and evaluate Internet-scale systems? Hard to obtain reproducible results from PlanetLab experiments 29

30 Conclusions Internet-scale apps have new network requirements One size doesn t fitall but it s hard to change the Internet Ukairo: Overlay networks can provide custom routing Internet-scale systems need new overlay abstractions Apply geometric algorithm to solve distributed systems problems LANC CDN: Metric space for node organisation in CDN Internet-scale systems require new data models Unrealistic to expect perfect processing Instead accept failure and overload as a fact of life DISSP: Make impact of failure on processing explicit Thank You! Any Questions? Peter Pietzuch <prp@doc.ic.ac.uk> 30

From Internet Data Centers to Data Centers in the Cloud

From Internet Data Centers to Data Centers in the Cloud From Internet Data Centers to Data Centers in the Cloud This case study is a short extract from a keynote address given to the Doctoral Symposium at Middleware 2009 by Lucy Cherkasova of HP Research Labs

More information

DATA COMMUNICATOIN NETWORKING

DATA COMMUNICATOIN NETWORKING DATA COMMUNICATOIN NETWORKING Instructor: Ouldooz Baghban Karimi Course Book: Computer Networking, A Top-Down Approach, Kurose, Ross Slides: - Course book Slides - Slides from Princeton University COS461

More information

Indirection. science can be solved by adding another level of indirection" -- Butler Lampson. "Every problem in computer

Indirection. science can be solved by adding another level of indirection -- Butler Lampson. Every problem in computer Indirection Indirection: rather than reference an entity directly, reference it ( indirectly ) via another entity, which in turn can or will access the original entity A x B "Every problem in computer

More information

Web Email DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

Web Email DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing) 1 1 Distributed Systems What are distributed systems? How would you characterize them? Components of the system are located at networked computers Cooperate to provide some service No shared memory Communication

More information

Web Caching and CDNs. Aditya Akella

Web Caching and CDNs. Aditya Akella Web Caching and CDNs Aditya Akella 1 Where can bottlenecks occur? First mile: client to its ISPs Last mile: server to its ISP Server: compute/memory limitations ISP interconnections/peerings: congestion

More information

Measuring the Web: Part I - - Content Delivery Networks. Prof. Anja Feldmann, Ph.D. Dr. Ramin Khalili Georgios Smaragdakis, PhD

Measuring the Web: Part I - - Content Delivery Networks. Prof. Anja Feldmann, Ph.D. Dr. Ramin Khalili Georgios Smaragdakis, PhD Measuring the Web: Part I - - Content Delivery Networks Prof. Anja Feldmann, Ph.D. Dr. Ramin Khalili Georgios Smaragdakis, PhD Acknowledgement Material presented in these slides is borrowed from presentajons

More information

Data Center Content Delivery Network

Data Center Content Delivery Network BM 465E Distributed Systems Lecture 4 Networking (cont.) Mehmet Demirci Today Overlay networks Data centers Content delivery networks Overlay Network A virtual network built on top of another network Overlay

More information

Distributed Systems. 23. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2015

Distributed Systems. 23. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2015 Distributed Systems 23. Content Delivery Networks (CDN) Paul Krzyzanowski Rutgers University Fall 2015 November 17, 2015 2014-2015 Paul Krzyzanowski 1 Motivation Serving web content from one location presents

More information

Peer-to-Peer Networks

Peer-to-Peer Networks Peer-to-Peer Networks Chapter 1: Introduction Jussi Kangasharju Chapter Outline Course outline and practical matters Peer-to-peer (P2P) overview Definition of P2P What is P2P and how it is different from

More information

Scalable Internet/Scalable Storage. Seif Haridi KTH/SICS

Scalable Internet/Scalable Storage. Seif Haridi KTH/SICS Scalable Internet/Scalable Storage Seif Haridi KTH/SICS Interdisk: The Big Idea 2 Interdisk: The Big Idea I: 3 Interdisk: The Big Idea I: Internet is global data communication 4 Interdisk: The Big Idea

More information

Distributed Systems. 25. Content Delivery Networks (CDN) 2014 Paul Krzyzanowski. Rutgers University. Fall 2014

Distributed Systems. 25. Content Delivery Networks (CDN) 2014 Paul Krzyzanowski. Rutgers University. Fall 2014 Distributed Systems 25. Content Delivery Networks (CDN) Paul Krzyzanowski Rutgers University Fall 2014 November 16, 2014 2014 Paul Krzyzanowski 1 Motivation Serving web content from one location presents

More information

THEMIS: Fairness in Data Stream Processing under Overload

THEMIS: Fairness in Data Stream Processing under Overload THEMIS: Fairness in Data Stream Processing under Overload Evangelia Kalyvianaki City University London, UK Marco Fiscato Imperial College London, UK Theodoros Salonidis IBM Research, USA Peter R. Pietzuch

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

How To Understand The Power Of A Content Delivery Network (Cdn)

How To Understand The Power Of A Content Delivery Network (Cdn) Overview 5-44 5-44 Computer Networking 5-64 Lecture 8: Delivering Content Content Delivery Networks Peter Steenkiste Fall 04 www.cs.cmu.edu/~prs/5-44-f4 Web Consistent hashing Peer-to-peer CDN Motivation

More information

The Effect of Caches for Mobile Broadband Internet Access

The Effect of Caches for Mobile Broadband Internet Access The Effect of s for Mobile Jochen Eisl, Nokia Siemens Networks, Munich, Germany Haßlinger, Deutsche Telekom Technik,, Darmstadt, Germany IP-based content delivery: CDN & cache architecture Impact of access

More information

Testing & Assuring Mobile End User Experience Before Production. Neotys

Testing & Assuring Mobile End User Experience Before Production. Neotys Testing & Assuring Mobile End User Experience Before Production Neotys Agenda Introduction The challenges Best practices NeoLoad mobile capabilities Mobile devices are used more and more At Home In 2014,

More information

Traffic delivery evolution in the Internet ENOG 4 Moscow 23 rd October 2012

Traffic delivery evolution in the Internet ENOG 4 Moscow 23 rd October 2012 Traffic delivery evolution in the Internet ENOG 4 Moscow 23 rd October 2012 January 29th, 2008 Christian Kaufmann Director Network Architecture Akamai Technologies, Inc. way-back machine Web 1998 way-back

More information

Distributed Systems 19. Content Delivery Networks (CDN) Paul Krzyzanowski pxk@cs.rutgers.edu

Distributed Systems 19. Content Delivery Networks (CDN) Paul Krzyzanowski pxk@cs.rutgers.edu Distributed Systems 19. Content Delivery Networks (CDN) Paul Krzyzanowski pxk@cs.rutgers.edu 1 Motivation Serving web content from one location presents problems Scalability Reliability Performance Flash

More information

A Topology-Aware Relay Lookup Scheme for P2P VoIP System

A Topology-Aware Relay Lookup Scheme for P2P VoIP System Int. J. Communications, Network and System Sciences, 2010, 3, 119-125 doi:10.4236/ijcns.2010.32018 Published Online February 2010 (http://www.scirp.org/journal/ijcns/). A Topology-Aware Relay Lookup Scheme

More information

Content Distribu-on Networks (CDNs)

Content Distribu-on Networks (CDNs) Content Distribu-on Networks (CDNs) Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:0am in Architecture N101 hjp://www.cs.princeton.edu/courses/archive/spr12/cos461/ Second Half of the Course

More information

A programming model in Cloud: MapReduce

A programming model in Cloud: MapReduce A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

Flash Crowds & Denial of Service Attacks

Flash Crowds & Denial of Service Attacks Flash Crowds & Denial of Service Attacks Characterization and Implications for CDNs and Web sites Jaeyeon Jung MIT Laboratory for Computer Science Balachander Krishnamurthy and Michael Rabinovich AT&T

More information

Software Defined Networking What is it, how does it work, and what is it good for?

Software Defined Networking What is it, how does it work, and what is it good for? Software Defined Networking What is it, how does it work, and what is it good for? slides stolen from Jennifer Rexford, Nick McKeown, Michael Schapira, Scott Shenker, Teemu Koponen, Yotam Harchol and David

More information

Communications Software. CSE 123b. CSE 123b. Spring 2003. Lecture 13: Load Balancing/Content Distribution. Networks (plus some other applications)

Communications Software. CSE 123b. CSE 123b. Spring 2003. Lecture 13: Load Balancing/Content Distribution. Networks (plus some other applications) CSE 123b CSE 123b Communications Software Spring 2003 Lecture 13: Load Balancing/Content Distribution Networks (plus some other applications) Stefan Savage Some slides courtesy Srini Seshan Today s class

More information

Hypertable Architecture Overview

Hypertable Architecture Overview WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for

More information

Networking in the Hadoop Cluster

Networking in the Hadoop Cluster Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2

More information

Distributed Systems. 24. Content Delivery Networks (CDN) 2013 Paul Krzyzanowski. Rutgers University. Fall 2013

Distributed Systems. 24. Content Delivery Networks (CDN) 2013 Paul Krzyzanowski. Rutgers University. Fall 2013 Distributed Systems 24. Content Delivery Networks (CDN) Paul Krzyzanowski Rutgers University Fall 2013 November 27, 2013 2013 Paul Krzyzanowski 1 Motivation Serving web content from one location presents

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

NoSQL. Thomas Neumann 1 / 22

NoSQL. Thomas Neumann 1 / 22 NoSQL Thomas Neumann 1 / 22 What are NoSQL databases? hard to say more a theme than a well defined thing Usually some or all of the following: no SQL interface no relational model / no schema no joins,

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

CHAPTER 8 CONCLUSION AND FUTURE ENHANCEMENTS

CHAPTER 8 CONCLUSION AND FUTURE ENHANCEMENTS 137 CHAPTER 8 CONCLUSION AND FUTURE ENHANCEMENTS 8.1 CONCLUSION In this thesis, efficient schemes have been designed and analyzed to control congestion and distribute the load in the routing process of

More information

Distribution transparency. Degree of transparency. Openness of distributed systems

Distribution transparency. Degree of transparency. Openness of distributed systems Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science steen@cs.vu.nl Chapter 01: Version: August 27, 2012 1 / 28 Distributed System: Definition A distributed

More information

Study of Flexible Contents Delivery System. With Dynamic Server Deployment

Study of Flexible Contents Delivery System. With Dynamic Server Deployment Study of Flexible Contents Delivery System With Dynamic Server Deployment Yuko KAMIYA Toshihiko SHIMOKAWA and orihiko YOSHIDA Graduate School of Information Science, Kyushu Sangyo University, JAPA Faculty

More information

Peer-to-Peer Networks. Chapter 6: P2P Content Distribution

Peer-to-Peer Networks. Chapter 6: P2P Content Distribution Peer-to-Peer Networks Chapter 6: P2P Content Distribution Chapter Outline Content distribution overview Why P2P content distribution? Network coding Peer-to-peer multicast Kangasharju: Peer-to-Peer Networks

More information

Exploring Big Data in Social Networks

Exploring Big Data in Social Networks Exploring Big Data in Social Networks virgilio@dcc.ufmg.br (meira@dcc.ufmg.br) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about

More information

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current

More information

FortiBalancer: Global Server Load Balancing WHITE PAPER

FortiBalancer: Global Server Load Balancing WHITE PAPER FortiBalancer: Global Server Load Balancing WHITE PAPER FORTINET FortiBalancer: Global Server Load Balancing PAGE 2 Introduction Scalability, high availability and performance are critical to the success

More information

A very short history of networking

A very short history of networking A New vision for network architecture David Clark M.I.T. Laboratory for Computer Science September, 2002 V3.0 Abstract This is a proposal for a long-term program in network research, consistent with the

More information

Content Delivery Networks. Shaxun Chen April 21, 2009

Content Delivery Networks. Shaxun Chen April 21, 2009 Content Delivery Networks Shaxun Chen April 21, 2009 Outline Introduction to CDN An Industry Example: Akamai A Research Example: CDN over Mobile Networks Conclusion Outline Introduction to CDN An Industry

More information

Introduction: Why do we need computer networks?

Introduction: Why do we need computer networks? Introduction: Why do we need computer networks? Karin A. Hummel - Adapted slides of Prof. B. Plattner, plattner@tik.ee.ethz.ch - Add-on material included of Peterson, Davie: Computer Networks February

More information

Cloud Computing. Theory and Practice. Dan C. Marinescu. Morgan Kaufmann is an imprint of Elsevier HEIDELBERG LONDON AMSTERDAM BOSTON

Cloud Computing. Theory and Practice. Dan C. Marinescu. Morgan Kaufmann is an imprint of Elsevier HEIDELBERG LONDON AMSTERDAM BOSTON Cloud Computing Theory and Practice Dan C. Marinescu AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO M< Morgan Kaufmann is an imprint of Elsevier

More information

UKAIRO: Internet-Scale Bandwidth Detouring

UKAIRO: Internet-Scale Bandwidth Detouring UKAIRO: Internet-Scale Bandwidth Detouring Thom Haddow Imperial College London Moez Draief Imperial College London Sing Wang Ho Imperial College London Peter Pietzuch Imperial College London Cristian Lumezanu

More information

Internet Content Distribution

Internet Content Distribution Internet Content Distribution Chapter 4: Content Distribution Networks (TUD Student Use Only) Chapter Outline Basics of content distribution networks (CDN) Why CDN? How do they work? Client redirection

More information

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

More information

Introduction to Hadoop

Introduction to Hadoop Introduction to Hadoop Miles Osborne School of Informatics University of Edinburgh miles@inf.ed.ac.uk October 28, 2010 Miles Osborne Introduction to Hadoop 1 Background Hadoop Programming Model Examples

More information

GLOBAL SERVER LOAD BALANCING WITH SERVERIRON

GLOBAL SERVER LOAD BALANCING WITH SERVERIRON APPLICATION NOTE GLOBAL SERVER LOAD BALANCING WITH SERVERIRON Growing Global Simply by connecting to the Internet, local businesses transform themselves into global ebusiness enterprises that span the

More information

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Lecture 4 Introduction to Hadoop & GAE Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Outline Introduction to Hadoop The Hadoop ecosystem Related projects

More information

Studying Black Holes on the Internet with Hubble

Studying Black Holes on the Internet with Hubble Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas Anderson University of Washington August 2008 This

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

Distributed Systems. Tutorial 12 Cassandra

Distributed Systems. Tutorial 12 Cassandra Distributed Systems Tutorial 12 Cassandra written by Alex Libov Based on FOSDEM 2010 presentation winter semester, 2013-2014 Cassandra In Greek mythology, Cassandra had the power of prophecy and the curse

More information

Big Table A Distributed Storage System For Data

Big Table A Distributed Storage System For Data Big Table A Distributed Storage System For Data OSDI 2006 Fay Chang, Jeffrey Dean, Sanjay Ghemawat et.al. Presented by Rahul Malviya Why BigTable? Lots of (semi-)structured data at Google - - URLs: Contents,

More information

NoSQL Data Base Basics

NoSQL Data Base Basics NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

More information

Content Distribution Networks (CDN)

Content Distribution Networks (CDN) 229 Content Distribution Networks (CDNs) A content distribution network can be viewed as a global web replication. main idea: each replica is located in a different geographic area, rather then in the

More information

A PROXIMITY-AWARE INTEREST-CLUSTERED P2P FILE SHARING SYSTEM

A PROXIMITY-AWARE INTEREST-CLUSTERED P2P FILE SHARING SYSTEM A PROXIMITY-AWARE INTEREST-CLUSTERED P2P FILE SHARING SYSTEM Dr.S. DHANALAKSHMI 1, R. ANUPRIYA 2 1 Prof & Head, 2 Research Scholar Computer Science and Applications, Vivekanandha College of Arts and Sciences

More information

Advanced Farm Administration with XenApp Worker Groups

Advanced Farm Administration with XenApp Worker Groups WHITE PAPER Citrix XenApp Advanced Farm Administration with XenApp Worker Groups XenApp Product Development www.citrix.com Contents Overview... 3 What is a Worker Group?... 3 Introducing XYZ Corp... 5

More information

Speak<geek> Tech Brief. RichRelevance Distributed Computing: creating a scalable, reliable infrastructure

Speak<geek> Tech Brief. RichRelevance Distributed Computing: creating a scalable, reliable infrastructure 3 Speak Tech Brief RichRelevance Distributed Computing: creating a scalable, reliable infrastructure Overview Scaling a large database is not an overnight process, so it s difficult to plan and implement

More information

Cloud Enabled Emergency Navigation Using Faster-than-real-time Simulation

Cloud Enabled Emergency Navigation Using Faster-than-real-time Simulation Cloud Enabled Emergency Navigation Using Faster-than-real-time Simulation Huibo Bi and Erol Gelenbe Intelligent Systems and Networks Group Department of Electrical and Electronic Engineering Imperial College

More information

Inter-domain Routing. Outline. Border Gateway Protocol

Inter-domain Routing. Outline. Border Gateway Protocol Inter-domain Routing Outline Border Gateway Protocol Internet Structure Original idea Backbone service provider Consumer ISP Large corporation Consumer ISP Small corporation Consumer ISP Consumer ISP Small

More information

Large-Scale Web Applications

Large-Scale Web Applications Large-Scale Web Applications Mendel Rosenblum Web Application Architecture Web Browser Web Server / Application server Storage System HTTP Internet CS142 Lecture Notes - Intro LAN 2 Large-Scale: Scale-Out

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Scaling Big Data Mining Infrastructure: The Smart Protection Network Experience

Scaling Big Data Mining Infrastructure: The Smart Protection Network Experience Scaling Big Data Mining Infrastructure: The Smart Protection Network Experience 黃 振 修 (Chris Huang) SPN 主 動 式 雲 端 截 毒 技 術 架 構 師 About Me SPN 主 動 式 雲 端 截 毒 技 術 架 構 師 SPN Hadoop 基 礎 運 算 架 構 師 Hadoop in Taiwan

More information

DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING. Carlos de Alfonso Andrés García Vicente Hernández

DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING. Carlos de Alfonso Andrés García Vicente Hernández DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING Carlos de Alfonso Andrés García Vicente Hernández 2 INDEX Introduction Our approach Platform design Storage Security

More information

Network Flow Data Fusion GeoSpatial and NetSpatial Data Enhancement

Network Flow Data Fusion GeoSpatial and NetSpatial Data Enhancement Network Flow Data Fusion GeoSpatial and NetSpatial Data Enhancement FloCon 2010 New Orleans, La Carter Bullard QoSient, LLC carter@qosient.com 1 Carter Bullard carter@qosient.com QoSient - Research and

More information

Load Distribution in Large Scale Network Monitoring Infrastructures

Load Distribution in Large Scale Network Monitoring Infrastructures Load Distribution in Large Scale Network Monitoring Infrastructures Josep Sanjuàs-Cuxart, Pere Barlet-Ros, Gianluca Iannaccone, and Josep Solé-Pareta Universitat Politècnica de Catalunya (UPC) {jsanjuas,pbarlet,pareta}@ac.upc.edu

More information

Based on Computer Networking, 4 th Edition by Kurose and Ross

Based on Computer Networking, 4 th Edition by Kurose and Ross Computer Networks Internet Routing Based on Computer Networking, 4 th Edition by Kurose and Ross Intra-AS Routing Also known as Interior Gateway Protocols (IGP) Most common Intra-AS routing protocols:

More information

High Throughput Computing on P2P Networks. Carlos Pérez Miguel carlos.perezm@ehu.es

High Throughput Computing on P2P Networks. Carlos Pérez Miguel carlos.perezm@ehu.es High Throughput Computing on P2P Networks Carlos Pérez Miguel carlos.perezm@ehu.es Overview High Throughput Computing Motivation All things distributed: Peer-to-peer Non structured overlays Structured

More information

Apache HBase. Crazy dances on the elephant back

Apache HBase. Crazy dances on the elephant back Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage

More information

Skype network has three types of machines, all running the same software and treated equally:

Skype network has three types of machines, all running the same software and treated equally: What is Skype? Why is Skype so successful? Everybody knows! Skype is a P2P (peer-to-peer) Voice-Over-IP (VoIP) client founded by Niklas Zennström and Janus Friis also founders of the file sharing application

More information

Essential Ingredients for Optimizing End User Experience Monitoring

Essential Ingredients for Optimizing End User Experience Monitoring Essential Ingredients for Optimizing End User Experience Monitoring An ENTERPRISE MANAGEMENT ASSOCIATES (EMA ) White Paper Prepared for Neustar IT MANAGEMENT RESEARCH, Table of Contents Executive Summary...1

More information

Internet Firewall CSIS 4222. Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS 4222. net15 1. Routers can implement packet filtering

Internet Firewall CSIS 4222. Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS 4222. net15 1. Routers can implement packet filtering Internet Firewall CSIS 4222 A combination of hardware and software that isolates an organization s internal network from the Internet at large Ch 27: Internet Routing Ch 30: Packet filtering & firewalls

More information

Should and Can a Communication System. Adapt Pervasively An Unofficial View http://san.ee.ic.ac.uk

Should and Can a Communication System. Adapt Pervasively An Unofficial View http://san.ee.ic.ac.uk Should and Can a Communication System MSOffice1 Adapt Pervasively An Unofficial View http://san.ee.ic.ac.uk Erol Gelenbe www.ee.ic.ac.uk/gelenbe Imperial College London SW7 2BT e.gelenbe@imperial.ac.uk

More information

Validating the System Behavior of Large-Scale Networked Computers

Validating the System Behavior of Large-Scale Networked Computers Validating the System Behavior of Large-Scale Networked Computers Chen-Nee Chuah Robust & Ubiquitous Networking (RUBINET) Lab http://www.ece.ucdavis.edu/rubinet Electrical & Computer Engineering University

More information

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Application and practice of parallel cloud computing in ISP Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Outline Mass data management problem Applications of parallel cloud computing in ISPs

More information

International Journal of Scientific & Engineering Research, Volume 6, Issue 4, April-2015 36 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 6, Issue 4, April-2015 36 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 6, Issue 4, April-2015 36 An Efficient Approach for Load Balancing in Cloud Environment Balasundaram Ananthakrishnan Abstract Cloud computing

More information

Parallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data

Parallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data Case Study 2: Document Retrieval Parallel Programming Map-Reduce Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 31 st, 2013 Carlos Guestrin

More information

Cloud Computing Trends

Cloud Computing Trends UT DALLAS Erik Jonsson School of Engineering & Computer Science Cloud Computing Trends What is cloud computing? Cloud computing refers to the apps and services delivered over the internet. Software delivered

More information

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after

More information

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1 Distributed Systems REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1 1 The Rise of Distributed Systems! Computer hardware prices are falling and power increasing.!

More information

Optimizing Data Center Networks for Cloud Computing

Optimizing Data Center Networks for Cloud Computing PRAMAK 1 Optimizing Data Center Networks for Cloud Computing Data Center networks have evolved over time as the nature of computing changed. They evolved to handle the computing models based on main-frames,

More information

Choosing a Content Delivery Method

Choosing a Content Delivery Method Choosing a Content Delivery Method Executive Summary Cache-based content distribution networks (CDNs) reach very large volumes of highly dispersed end users by duplicating centrally hosted video, audio

More information

Imperial College London

Imperial College London Imperial College London Department of Computing Challenges in Cooperation between Internet Service Providers and Peer-to-Peer Applications by Konstantinos G. Gkerpinis Submitted in partial fulfilment of

More information

The Internet: A Remarkable Story. Inside the Net: A Different Story. Networks are Hard to Manage. Software Defined Networking Concepts

The Internet: A Remarkable Story. Inside the Net: A Different Story. Networks are Hard to Manage. Software Defined Networking Concepts The Internet: A Remarkable Story Software Defined Networking Concepts Based on the materials from Jennifer Rexford (Princeton) and Nick McKeown(Stanford) Tremendous success From research experiment to

More information

CDN and Traffic-structure

CDN and Traffic-structure CDN and Traffic-structure Outline Basics CDN Traffic Analysis 2 Outline Basics CDN Building Blocks Services Evolution Traffic Analysis 3 A Centralized Web! Slow content must traverse multiple backbones

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

IPTV AND VOD NETWORK ARCHITECTURES. Diogo Miguel Mateus Farinha

IPTV AND VOD NETWORK ARCHITECTURES. Diogo Miguel Mateus Farinha IPTV AND VOD NETWORK ARCHITECTURES Diogo Miguel Mateus Farinha Instituto Superior Técnico Av. Rovisco Pais, 1049-001 Lisboa, Portugal E-mail: diogo.farinha@ist.utl.pt ABSTRACT IPTV and Video on Demand

More information

Software Defined Networking & Openflow

Software Defined Networking & Openflow Software Defined Networking & Openflow Autonomic Computer Systems, HS 2015 Christopher Scherb, 01.10.2015 Overview What is Software Defined Networks? Brief summary on routing and forwarding Introduction

More information

Availability of Services in the Era of Cloud Computing

Availability of Services in the Era of Cloud Computing Availability of Services in the Era of Cloud Computing Sanjay P. Ahuja 1 & Sindhu Mani 1 1 School of Computing, University of North Florida, Jacksonville, America Correspondence: Sanjay P. Ahuja, School

More information

Portable Wireless Mesh Networks: Competitive Differentiation

Portable Wireless Mesh Networks: Competitive Differentiation Portable Wireless Mesh Networks: Competitive Differentiation Rajant Corporation s kinetic mesh networking solutions combine specialized command and control software with ruggedized, high-performance hardware.

More information

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at distributing load b. QUESTION: What is the context? i. How

More information

The old Internet. Software in the Network: Outline. Traditional Design. 1) Basic Caching. The Arrival of Software (in the network)

The old Internet. Software in the Network: Outline. Traditional Design. 1) Basic Caching. The Arrival of Software (in the network) The old Software in the Network: What Happened and Where to Go Prof. Eric A. Brewer UC Berkeley Inktomi Corporation Local networks with local names and switches IP creates global namespace and links the

More information

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop

More information

Distributed Systems Lecture 1 1

Distributed Systems Lecture 1 1 Distributed Systems Lecture 1 1 Distributed Systems Lecturer: Therese Berg therese.berg@it.uu.se. Recommended text book: Distributed Systems Concepts and Design, Coulouris, Dollimore and Kindberg. Addison

More information

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

1. Comments on reviews a. Need to avoid just summarizing web page asks you for: 1. Comments on reviews a. Need to avoid just summarizing web page asks you for: i. A one or two sentence summary of the paper ii. A description of the problem they were trying to solve iii. A summary of

More information

The Power of Social Data: Transforming Big Data into Decisions. Andreas Weigend

The Power of Social Data: Transforming Big Data into Decisions. Andreas Weigend Milano, 04 Dec 2013 1 The Power of Social Data: Transforming Big Data into Decisions Andreas Weigend bit.ly/weigenditalia 1. Data and Decisions Value of Data? Agenda 2. Amazon as Data Refinery Equation

More information

HPAM: Hybrid Protocol for Application Level Multicast. Yeo Chai Kiat

HPAM: Hybrid Protocol for Application Level Multicast. Yeo Chai Kiat HPAM: Hybrid Protocol for Application Level Multicast Yeo Chai Kiat Scope 1. Introduction 2. Hybrid Protocol for Application Level Multicast (HPAM) 3. Features of HPAM 4. Conclusion 1. Introduction Video

More information

SiteCelerate white paper

SiteCelerate white paper SiteCelerate white paper Arahe Solutions SITECELERATE OVERVIEW As enterprises increases their investment in Web applications, Portal and websites and as usage of these applications increase, performance

More information

Our Data & Methodology. Understanding the Digital World by Turning Data into Insights

Our Data & Methodology. Understanding the Digital World by Turning Data into Insights Our Data & Methodology Understanding the Digital World by Turning Data into Insights Understanding Today s Digital World SimilarWeb provides data and insights to help businesses make better decisions,

More information