Pavlo Baron. Big Data and CDN



Similar documents
Smart moulds intelligente Formen

Arbitrage-free Volatility Surface Interpolation. Author: Dr. Kay Moritzen (B&C) Dr. Ulrich Leiner (B&C)

The use of Vegetation for Social Housing Renovations: a cases study in the city of Palermo

USING MOBILE MONEY TO PREPAY FOR HEALTHCARE IN KENYA

The Energy Turnaround in Europe and its Consequences for Renewable Generation, Energy Infrastructure and end-use

Metering PDU Manual DN DN-95602

Traffic delivery evolution in the Internet ENOG 4 Moscow 23 rd October 2012

From Internet Data Centers to Data Centers in the Cloud

The EuroSDR Performance Test for Digital Aerial Camera Systems

Solution IT Architectures Key Elements 2 Quality and Constraints

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Testing & Assuring Mobile End User Experience Before Production. Neotys

Content Delivery Networks (CDN) Dr. Yingwu Zhu

Measuring CDN Performance. Hooman Beheshti, VP Technology

Web Caching and CDNs. Aditya Akella

The big data revolution

5 - Low Cost Ways to Increase Your

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Your Web Site Parts - Domain Names, URLs, Web Hosts, DNS - What They Are, and What You Need

WINDOWS AZURE DATA MANAGEMENT

Teridion. Rethinking Network Performance. The Internet. Lightning Fast. Technical White Paper July,

Cloud Computing and Big Data

Request Routing, Load-Balancing and Fault- Tolerance Solution - MediaDNS

The old Internet. Software in the Network: Outline. Traditional Design. 1) Basic Caching. The Arrival of Software (in the network)

Cloud Computing For Bioinformatics

Big Data. Donald Kossmann & Nesime Tatbul Systems Group ETH Zurich

Tableau Server 7.0 scalability


CDN and Traffic-structure

ZCorum s Ask a Broadband Expert Series:

Data Refinery with Big Data Aspects

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

The Value of a Content Delivery Network

Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012

Busin i ess I n I t n e t ll l i l g i e g nce c T r T e r nds For 2013

We are Big Data A Sonian Whitepaper

Web DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

2012 AKAMAI FASTER FORWARD TM

Beyond Watson: The Business Implications of Big Data

Assignment # 1 (Cloud Computing Security)

Internet Content Distribution

Web Hosting 101. with Patrick McNeil

The future of connectivity and consumer rights: exponential digital

Big Data Scoring. April 2014

Introduction to Big Data the four V's

FAMILY. Reference Guide. Pogoplug Family. Reference Guide Cloud Engines, Inc. All Rights Reserved.

BIG DATA FUNDAMENTALS

Intelligent Content Delivery Network (CDN) The New Generation of High-Quality Network

The Flash-Transformed Financial Data Center. Jean S. Bozman Enterprise Solutions Manager, Enterprise Storage Solutions Corporation August 6, 2014

Big Table in Plain Language

HIGH-SPEED BRIDGE TO CLOUD STORAGE

LARGE SCALE INTERNET SERVICES

How To Understand The Power Of A Content Delivery Network (Cdn)

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

Ubuntu: helping drive business insight from Big Data

Embedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP

Synapse s SNAP Network Operating System

Internet Traffic Evolution

BASICS OF SCALING: LOAD BALANCERS

XpoLog Center Suite Log Management & Analysis platform

Azure Media Service Cloud Video Delivery KILROY HUGHES MICROSOFT AZURE MEDIA

AKAMAI WHITE PAPER. Delivering Dynamic Web Content in Cloud Computing Applications: HTTP resource download performance modelling

Big Data With Hadoop

CONTENT DELIVERY WHITE PAPER proinity GmbH 1

Conducting a Successful Cloudmarket CIO

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

The Big Data Paradigm Shift. Insight Through Automation

Ensighten Tag Delivery Network. Advanced Infrastructure for Enterprise-Class Tag Management

Are You Ready for the Holiday Rush?

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

bigdata Managing Scale in Ontological Systems

Apache HBase. Crazy dances on the elephant back

Cloud Computing at Google. Architecture

HOW IS WEB APPLICATION DEVELOPMENT AND DELIVERY CHANGING?

Open source Google-style large scale data analysis with Hadoop

Overlay Networks. Slides adopted from Prof. Böszörményi, Distributed Systems, Summer 2004.

Backup architectures in the modern data center. Author: Edmond van As Competa IT b.v.

There s no way around it: learning about Big Data means

nomorerack CUSTOMER SUCCESS STORY RELIABILITY AND AVAILABILITY WITH FAST GROWTH IN THE CLOUD

Learning Management Redefined. Acadox Infrastructure & Architecture

Big Data. Fast Forward. Putting data to productive use

Getting Started with AWS. Hosting a Static Website

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

SEO AND CONTENT MANAGEMENT SYSTEM

The Requirement for a New Type of Cloud Based CDN

Data Center Content Delivery Network

Data Driven Success. Comparing Log Analytics Tools: Flowerfire s Sawmill vs. Google Analytics (GA)

Cloud Computing with Microsoft Azure

Integrated Physical Security and Incident Management

Google File System. Web and scalability

Social Media For Economic Development

Big Data Tools: Game Changer for Mainstream Enterprises

Content Delivery Networks. Shaxun Chen April 21, 2009

Introduction to Apache Kafka And Real-Time ETL. for Oracle DBAs and Data Analysts

Cognos Performance Troubleshooting

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Transcription:

Pavlo Baron Big Data and CDN

Pavlo Baron

www.pbit.org pb@pbit.org @pavlobaron

What is Big Data

Big Data describes datasets that grow so large that they become awkward to work with using on-hand database management tools (Wikipedia)

Huh?

Somewhere a mosquito coughs

and somewhere else a data center gets flooded with data

Huh???

More than 30 billion pieces of content (web links, news stories, blog posts, notes, photo albums, etc.) get shared each month on Facebook

Aha

Twitter users are, in total, tweeting an average of 55 million tweets a day, also including links etc.

OMG!

But there is even much more: cameras, sensors, RFID, logs, geolocation, GIS and so on

kk

There are several perspectives at Big Data

Data storage and archiving

Data preparation

Live data provisioning

Data analysis / analytics

Real-time event and stream processing

Data visualization

Where does Big Data come from

Uncontrolled human activities in the world wide web, or Web 2.0 if you like

Huh?

Das Bild kann nicht angezeigt werden. Dieser Computer verfügt möglicherweise über zu wenig Arbeitsspeicher, um das Bild zu öffnen, oder das Bild ist beschädigt. Starten Sie den Computer neu, und öffnen Sie dann erneut die Datei. Wenn weiterhin das rote x angezeigt wird, müssen Sie das Bild möglicherweise löschen und dann erneut einfügen. Every human leaves a vast number of data marks on the web every day: intentionally, accidentally and unknowingly

Huh???

Intentionally: we blog, tweet, upload, flatter, link, etc

And: the web has become an industry of its own. With us in the thick of it

Accidentally: we are humans and we make mistakes

Unknowingly: we get tricked, misled, controlled, logged etc

The vast number of data marks we leave on the web every day gets copied, duplicated. Data explodes.

Panic!

Wait! There s even more!

Huh?

Data flowing on streams at a very high rate from many actors

Huh??

The amount of data flying over the air has become enormous, and it s growing unpredictably

Aha

It s not only nuclear reactors anymore having hi-tech sensors and generating tons of data

Aha

And our physically huge globe

has become a tiny electronic ball. It s completely wired. Data needs just seconds to circumnavigate the world

OMG!

But there s even more!

Huh?

Laws and regulations force us to store and archive all sorts of data, and it s getting more and more

Human knowledge grows extremely fast. It s far too gigantic for one single brain

Oh no

And there s still more!

Huh?

Big Brother Big Data. We get observed, filmed, recorded, logged, geolocated etc.

Panic!

Don t panic. Get over it. Brace yourself for the battle.

First of all, some major changes have happened

Instead of huge expensive cabinets

we can use lots of cheap commodity hardware

Physics hit the wall

and we need to think parallel

Our physically huge globe

s has become a tiny electronic ball. It s completely wired

Spontaneous requirements

can be covered by the fog (aka cloud)

And what are my weapons

Cut your data in smaller pieces

Make those pieces bitesize (manageable)

Bring the data closer to those who need it

Bring the data closer to where it s physically accessed

Give up relations where you don t need them

Give up actuality where you don t need it

Find optimal and effective replication mechanisms

Consider latency an adjustment screw if you can

Consider availability an adjustment screw if you can

Be prepared to deal with unlimited amount of data depending on the perspective

Know your data

Know your volumes

Know your scenarios

Consider it what it is: a science

Right tool for the job

kk

And how does this technically work

Live data provisioning

What s the problem

Your users are widely spread, maybe all over the world

And you own Big Data, which has many facets geographic, financial etc.

And your classic silo architecture could break under the weight of such data

And why would I need that

You start and want to be one of those. Aha, ok

You simply grew up to a level

Now you need to segment your users and thus to be faster and more reliable at locations,

to keep your servers free of load and thus to avoid bottle necks,

to cut your big data in smaller, better manageable chunks

What are my weapons

If your content is static in web terms, you are already well prepared

In many cases, you can make your dynamic data static (precompute content)

Huh?

Let s take a look at an online bookstore

Hey, the online bookstore is completely dynamic (except images) it s a shop system!

Really?

Book description page: even when you modify the prices and offer Web 2.0 features such as rating you still can pre-compute the page at some time you don t need to compute the content while the page is getting accessed

Browser mode: this is a classic use case for static content precomputation. There is often simply no need to navigate through dynamically built paths

Book search: even this ultimately dynamic sounding feature can be (partially) dedynamized. Consider the index as static content, not necessarily the data itself

You see: many parts of an online bookstore seem dynamic, but can be actually pre-computed and delivered as static content in web terms. It s all about the frequency of change and the big data pain

Owning big data doesn t necessarily mean owning 100% dynamic data in terms of web

Aha

And now distribute it with CDN content delivery network

Huh?

Akamai web traffic dominance

Akamai web traffic monitoring

Akamai EdgePlatform

73,000 servers 70 countries 1,000 networks 30% of world s web traffic (OMG, is the rest Google?)

There are several CDN providers offering (world wide) such infrastructures

And now let s get a little insane

Huh???

Yeah, something s going on behind the scenes

How does this technically work

CDN is like a deputy. You make a contract, and it takes over parts of your platform. From here, it delivers to your users the content you tell it to deliver, but being much closer to them and much more intelligent than you when it comes to managing the load

Huh?

CDN has its infrastructure including actual nodes directly at the backbones, offering web caching, server load balancing, request routing and, based upon these techniques, content delivery services

Aha

What you have seen earlier: based on the IP address of the machine (origin) which made the DNS A query, the DNS server of the CDN has each time decided to return a different IP address e.g. one from the same geographic region

Aha

What you now can expect is that the returned IP address leads you to a load balancer your gate to a whole subinfrastructure of the CDN which balances between web caches or web servers or similar

Aha

CDN uses different algorithms to decide where it routes user requests to: based upon current load, cost, location etc.

Aha

But in the end, your content gets delivered to the user. If it expires, CDN refreshes it from your servers in the background

Often, you have to offer the last mile the very last database access, e.g. the last item view or similar. Here, the user hits your server

Huh?

cache access 10.2.3.40 5.6.7.8 1.2.3.4 A query inter-cache updates caches cache refresh caches 50.6.7.80 your servers

kk

How can I benefit from this having big data

When you have e.g. images as your big data, you can consider this data as static and thus push down-/uploads to CDN. So, you segment your users and keep your own servers free of load. What you might lose, is consistency between segments

Or you pre-compute static content out of your dynamic big data a sort of snapshots, and push them to CDN. So, you keep you database servers free of load and scale only through the web servers. Complexity comes with the snapshot management

Or you can even push some functional parts of your platform to CDN such as searches etc. You win a lot dealing with big data, but you are more dependent from the CDN provider, and your overall architecture is weaker

Or if you really want to experiment, you can even try to push whole executed database queries to CDN like you would do it with memory caches. That s really cool, but even much more complex and unreliable than a clusterdistributed memory cache

If you use CDN to collect your new data, you might need some complex replication mechanisms

Anyway, with all that in mind: you can have a lot of your big data out there with CDN

Thank you

Most images were licensed from istockphoto.com Several images were taken from corresponding Wikipedia articles, product pages and open sources