From Internet Data Centers to Data Centers in the Cloud



Similar documents
Content Delivery Networks (CDN) Dr. Yingwu Zhu

CDN and Traffic-structure

Content Delivery Networks. Shaxun Chen April 21, 2009

Distributed Systems. 23. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2015

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Distributed Systems. 25. Content Delivery Networks (CDN) 2014 Paul Krzyzanowski. Rutgers University. Fall 2014

How To Understand The Power Of A Content Delivery Network (Cdn)

Measuring the Web: Part I - - Content Delivery Networks. Prof. Anja Feldmann, Ph.D. Dr. Ramin Khalili Georgios Smaragdakis, PhD

BIG DATA TRENDS AND TECHNOLOGIES

Real Time Big Data Processing

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Open source Google-style large scale data analysis with Hadoop

Elastic Cloud Computing in the Open Cirrus Testbed implemented via Eucalyptus

Big Data and Industrial Internet

Four Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014

MapReduce, Hadoop and Amazon AWS

Single Pass Load Balancing with Session Persistence in IPv6 Network. C. J. (Charlie) Liu Network Operations Charter Communications

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Open source large scale distributed data management with Google s MapReduce and Bigtable

Big Data Explained. An introduction to Big Data Science.

Cloud Computing Now and the Future Development of the IaaS

Indirection. science can be solved by adding another level of indirection" -- Butler Lampson. "Every problem in computer

Internet Content Distribution

Content Delivery Networks

Data Center Content Delivery Network

Logentries Insights: The State of Log Management & Analytics for AWS

Building Out Your Cloud-Ready Solutions. Clark D. Richey, Jr., Principal Technologist, DoD

Large-Scale Data Processing

Content Distribu-on Networks (CDNs)

Content Delivery Network. Version 0.95

CHAPTER 4 PERFORMANCE ANALYSIS OF CDN IN ACADEMICS

DATA COMMUNICATOIN NETWORKING

Introduction to Cloud Computing

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Request Routing, Load-Balancing and Fault- Tolerance Solution - MediaDNS

Learning Management Redefined. Acadox Infrastructure & Architecture

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

Distributed Systems 19. Content Delivery Networks (CDN) Paul Krzyzanowski

Web Caching and CDNs. Aditya Akella

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

HIGH-SPEED BRIDGE TO CLOUD STORAGE

Cloud Computing Summary and Preparation for Examination

Reference Model for Cloud Applications CONSIDERATIONS FOR SW VENDORS BUILDING A SAAS SOLUTION

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 12

Distributed Systems. 24. Content Delivery Networks (CDN) 2013 Paul Krzyzanowski. Rutgers University. Fall 2013

Data Centric Computing Revisited

Doug Goldberg. Vice President of Magento Solutions, ZeroLag

Demand Routing in Network Layer for Load Balancing in Content Delivery Networks

Map Reduce & Hadoop Recommended Text:

Accelerating Wordpress for Pagerank and Profit

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Comparative Performance Report

Real-Time Analysis of CDN in an Academic Institute: A Simulation Study

Cloud Platforms, Challenges & Hadoop. Aditee Rele Karpagam Venkataraman Janani Ravi

Big Data a threat or a chance?

Cloud Computing. Summary

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

Cloud Computing: Computing as a Service. Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad

Security Benefits of Cloud Computing

Overlay Networks. Slides adopted from Prof. Böszörményi, Distributed Systems, Summer 2004.

Large-Scale Web Applications

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Cloud Computing Trends

Assignment # 1 (Cloud Computing Security)

The Value of Content Distribution Networks Mike Axelrod, Google Google Public

Systems for Fun and Profit

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

Fault-Tolerant Computer System Design ECE 695/CS 590. Putting it All Together

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. Caching, Content Distribution and Load Balancing

Table of Contents. Overview... 1 Introduction... 2 Common Architectures Technical Challenges with Magento ChinaNetCloud's Experience...

CiteSeer x in the Cloud

Testing & Assuring Mobile End User Experience Before Production. Neotys

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Hadoop. Sunday, November 25, 12

Hadoop & its Usage at Facebook

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

How To Understand The Power Of Icdn

Amazon Elastic Beanstalk

TECHNOLOGY WHITE PAPER Jun 2012

Doing Multidisciplinary Research in Data Science

DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVIRONMENT

Session 11 Under the hood of a commercial website

Cloud Computing. Up until now

Dynamic Content Acceleration: Lightning-Fast Web Apps with Amazon CloudFront and Amazon Route 53

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Transcription:

From Internet Data Centers to Data Centers in the Cloud This case study is a short extract from a keynote address given to the Doctoral Symposium at Middleware 2009 by Lucy Cherkasova of HP Research Labs Palo Alto. The full keynote is on the course materials page The keynote focus is performance modelling Data Centers Evolution Internet Data Center Enterprise Data Centers Web 2.0 Mega Data Centers 1

Data Center Evolution Internet Data Centers (IDCs first generation per company) Data Center boom started during the dot-com bubble Companies needed fast Internet connectivity and an established Internet presence Web hosting and co-location facilities for company s services Challenges in service scalability, dealing with flash crowds, and dynamic resource provisioning New paradigm: everyone on the Internet can come to your web site! Mostly static web content Many results on improving web server performance, web caching, and request distribution Web interface for configuring and managing devices (products sold by company) New pioneering architectures such as Content Distribution Network (CDN), Overlay networks for delivering media content 2

Content Delivery Network (CDN) High availability and responsiveness are key factors for business Web sites Flash Crowd problem Main goal of CDN s solution is overcome server overload problem for popular sites, minimize the network impact in the content delivery path. CDN: large-scale distributed network of servers, Surrogate servers (proxy caches) are located closer to the edges of the Internet a.k.a. a edge servers Akamai is one of the largest CDNs 56,000 servers in 950 networks in 70 countries Deliver 20% of all Web traffic 3

Retrieving a Web Page Web page is a composite object: HTML file is delivered first Client browser parses it for embedded objects Send a set of requests for these embedded objects Typically, 80% or more of bytes of a web page are images 80% of the page can be served by a CDN. 4

Two main mechanisms URL rewriting CDN s Design <img src =http://www.xyz.com/images/foo.jpg> <img src =http://akamai.xyz.com/images/foo.jpg> DNS redirection Transparent, does not require content modification Typically employs two-level DNS lookup to choose most appropriate edge server (name -> list of edge servers, selected list item -> IP address) 5

CDN Architecture 6

CDN Research Problems Efficient large-scale content distribution large files, video on demand, streaming media low latency, real-time requirement FastReplica for CDNs BitTorrent (general purpose) SplitStream (multicast, video streaming) 7

FastReplica: Distribution Step N 3 N 2 N n-1 N 1 N n N 0 origin server for File F F 1 F 2 F 3 F n-1 F n L. Cherkasova, J. Lee. FastReplica: Efficient Large File Distribution within Content Delivery Networks Proc. of the 4th USENIX Symp. on Internet Technologies and Systems (USITS'2003). 8

FastReplica: Collection Step N 3 F F 3 2 N 2 N n-1 F 1 F 2 F 3 F n-1 2 F n-1 N 1 F n N n F n N 0 File F 9

Remaining Research Problems Some (2009) open questions: Optimal number of edge servers and their placement - Two different approaches: Co-location: placing servers closer to the edge (Akamai) Network core: server clusters in large data centers near the main network backbones (Limelight and AT&T) Content placement Large-scale system monitoring and management - to gather evidence as a basis for design decisions 10

Enterprise Data Centers Data Center Evolution New application design: multi-tier applications - database integration, see next slide Many traditional applications, e.g. HR, payroll, financial, supplychain, call-desk, etc, are re-written using this paradigm. Many different and complex applications Trend: Everything as a Service Service oriented Architecture (SOA) Dynamic resource provisioning within a large cluster Virtualization (datacenter middleware) Dream of Utility Computing: Computing-on-demand (IBM) Adaptive Enterprise (HP) 11

Multi-tier Applications Enterprise applications: Multi-tier architecture is a standard building block HTTP request HTTP reply MySQL query MySQL reply Users Front Server (Web Server + Application Server) 12

Example: Units of Client/Server Activity Add to cart Check out Shipping Payment Confirmation Session: A sequence of individual transactions issued by the same client Concurrent Sessions = Concurrent Clients Think time: The interval from a client receiving a response to the client sending the next transaction 13

Data Growth Unprecedented data growth: The amount of data managed by today s Data Centers quadruples every 18 months New York Stock Exchange generates about 1 TB of new trade data each day. Facebook hosts ~10 billion photos (1 PB of storage). The Internet Archive stores around 2PB, and it is growing at 20TB per month The Large Hadron Collider (CERN) will produce ~15 PB of data per year. 14

Big Data IDC estimate the size of digital universe : 0.18 zettabytes in 2006; 1.8 zettabytes in 2011 (10 times growth); A zettabyte is 10 21 bytes, i.e., 1,000 exabytes or 1,000,000 petabytes Big Data is here Machine logs, RFID readers, sensors networks, retail and enterprise transactions Rich media Publicly available data from different sources New challenges for storing, managing, and processing large-scale data in the enterprise (information and content tmanagement) Performance modeling of new applications 15

Source: IDC, 2008 16

Data Center in the Cloud Data Center Evolution Web 2.0 Mega-Datacenters: Google, Amazon, Yahoo Amazon Elastic Compute Cloud (EC2) Amazon Web Services (AWS) and Google AppEngine New class of applications related to parallel processing of large data Google s Map-Reduce framework (with the open source implementation Apache Hadoop) Mappers do the work on data slices, Reducers process the results Handle node failures and restart failed work One can rent ones own Data Center in the Cloud on a pay-per-use basis Cloud Computing: Software as a Service (SaaS) + Utility Computing 17

Existing and New Technologies Existing Technology New Applications Google Scholar Facebook Existing Technology Existing Applications html, http Web Servers/Web Browsers Google, E-commerce New Technology New Applications Middleware for multi-tier apps Virtualization, Map-Reduce Existing Applications New Technology Existing Applications html, http Web Servers/Web Browsers Google, E-commerce 18