Infrastructure overview. Marlon Dutra Production Engineer, Traffic

Similar documents

Traditional v/s CONVRGD

Data Centers and Cloud Computing

Assignment # 1 (Cloud Computing Security)

Netflix Open Connect. 2 Terabits from 6 racks

Zadara Storage Cloud A

Data Centers and Cloud Computing. Data Centers. MGHPCC Data Center. Inside a Data Center

Preparation Guide. How to prepare your environment for an OnApp Cloud v3.0 (beta) deployment.

Are You Ready for the Holiday Rush?

Data Centers and Cloud Computing. Data Centers

Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings

Database Scalability and Oracle 12c

OnApp Cloud. The complete platform for cloud service providers. 114 Cores. 286 Cores / 400 Cores

America s Most Wanted a metric to detect persistently faulty machines in Hadoop

WINDOWS AZURE EXECUTION MODELS

Top 10 Tips for z/os Network Performance Monitoring with OMEGAMON Session 11899

How Solace Message Routers Reduce the Cost of IT Infrastructure

Chapter 1 - Web Server Management and Cluster Topology

MATCHDAY September 2014

Tuning Tableau Server for High Performance

Java, PHP & Ruby - Cloud Hosting

datacenter networking

Avamar. Technology Overview

HBC How to build your cloud - Steps to Extend your Datacenter

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

CS 188/219. Scalable Internet Services Andrew Mutz October 8, 2015

Restricted Document. Pulsant Technical Specification

OVERVIEW. The complete IaaS platform for service providers

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Data Centers and Cloud Computing. Data Centers

Expand Your Infrastructure with the Elastic Cloud. Mark Ryland Chief Solutions Architect Jenn Steele Product Marketing Manager

Managed Appliance Installation Guide

out of this world guide to: POWERFUL DEDICATED SERVERS

Building a big IaaS cloud with Apache CloudStack

F5 Intelligent DNS Scale. Philippe Bogaerts Senior Field Systems Engineer mailto: Mob.:

bla bla OPEN-XCHANGE Open-Xchange Hardware Needs

Running SAP Solutions in the Cloud How to Handle Sizing and Performance Challenges. William Adams SAP AG

OmniCube. SimpliVity OmniCube and Multi Federation ROBO Reference Architecture. White Paper. Authors: Bob Gropman

REDCENTRIC INFRASTRUCTURE AS A SERVICE SERVICE DEFINITION

Proposal for Virtual Private Server Provisioning

Managing your Red Hat Enterprise Linux guests with RHN Satellite

SolidFire SF3010 All-SSD storage system with Citrix CloudPlatform Reference Architecture

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

ColdFusion 10 in the Amazon AWS Cloud. Sven Ramuschkat tecracer GmbH

CloudPlatform (powered by Apache CloudStack) Version 4.2 Administrator's Guide

Core and Pod Data Center Design

Hardware Performance Optimization and Tuning. Presenter: Tom Arakelian Assistant: Guy Ingalls

Last time. Data Center as a Computer. Today. Data Center Construction (and management)

An Analysis of Container-based Platforms for NFV

Private Cloud for WebSphere Virtual Enterprise Application Hosting

Amazon Elastic Beanstalk

Data Center Op+miza+on

Overview. The OnApp Cloud Platform. Dashboard APPLIANCES. Used Total Used Total. Virtual Servers. Blueprint Servers. Load Balancers.

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy

Optimize VDI with Server-Side Storage Acceleration

NetScaler: A comprehensive replacement for Microsoft Forefront Threat Management Gateway

ZEN LOAD BALANCER EE v3.02 DATASHEET The Load Balancing made easy

IBM Bluemix. The Digital Innovation Platform. Simon

Cloud Services. RackIT. MailIT. CloudIT. WebIT. StoreIT. NameIT. Datacenter Colocation Services. Advanced Services. Virtual Colocation

Powerful Dedicated Servers

Cloud Computing Trends

Virtualization of the MS Exchange Server Environment

Software Define Storage (SDs) and its application to an Openstack Software Defined Infrastructure (SDi) implementation

SOFTWARE DEFINED STORAGE IN ACTION

Academic Calendar for Faculty

Docker : devops, shared registries, HPC and emerging use cases. François Moreews & Olivier Sallou

SERVICE SCHEDULE PULSANT ENTERPRISE CLOUD SERVICES

DAS (Direct Attached Storage)

SOLUTION BRIEF Citrix Cloud Solutions Citrix Cloud Solution for On-boarding

Big Data With Hadoop

MEDIAROOM. Products Hosting Infrastructure Documentation. Introduction. Hosting Facility Overview

Best Practices for Python in the Cloud: Lessons

BASICS OF SCALING: LOAD BALANCERS

Operating Systems Virtualization mechanisms

Outline. MCSA: Server Virtualization

CloudPlatform (powered by Apache CloudStack) Version Administrator's Guide

WINDOWS AZURE DATA MANAGEMENT

Microsegmentation Using NSX Distributed Firewall: Getting Started

Large-Scale Web Applications

112 Linton House Union Street London SE1 0LH T: F:

Application Performance Management for Enterprise Applications

Marco Mantegazza WebSphere Client Technical Professional Team IBM Software Group. Virtualization and Cloud

Table of Contents. Overview... 1 Introduction... 2 Common Architectures Technical Challenges with Magento ChinaNetCloud's Experience...

WHITEPAPER. One Cloud For All Your Critical Business Applications.

Introduction to Cloud Computing

Security and Billing for Azure Pack. Presented by 5nine Software and Cloud Cruiser

Big Fast Data Hadoop acceleration with Flash. June 2013

Getting More Performance and Efficiency in the Application Delivery Network

Installing and Configuring Websense Content Gateway

An Oracle White Paper December Oracle Virtual Desktop Infrastructure: A Design Proposal for Hosted Virtual Desktops

Transcription:

1

Infrastructure overview Marlon Dutra Production Engineer, Traffic October, 2013 2

Physical infra 3

Data centers 4

Edge locations 5

Prineville, OR 6

Organization Suites Clusters services front end back end etc etc 7

Triplet racks 8

Thousands of them... 9

Clusters Just a big group of servers in a network topology No special software coordination We call logical clusters as tiers (to avoid miscommunication) 10

Servers Very efficient servers Designed in house (opencompute.org) Vanity free, open cabinets, no paint No fancy boxes, manuals, CDs, etc 10G network card Few hardware variances cpu, memory, storage, iops... 11

opencompute.org 12

Logical infra 13

Cloud management We don t use virtual machines We don t care about servers or OSes We do care about services VMs are meant to share resources We want the opposite of that Every 1-2% matters, a lot 14

Cloud management [2] Remote hardware control Console, restart, power on/off, etc Same base OS everywhere Chef for host setup Automatic provisioning, via PXE We provision thousands of servers in a few hours. All plug and play. 15

Cloud management [3] We buy fully assembled triplet racks Connect the rack switch to cluster switches Connect main and backup power Walk away In 1-2 hours, we can SSH into the hosts 16

Service management Services packaged with all dependencies They can run anywhere Everything built to scale Services must run in multiple machines, data centers, etc Binaries deployed with bittorrent No bottlenecks in the distribution 17

Service management [2] Services run with LXC (Linux containers) chroot for filesystem isolation Process namespace isolation Routing isolation Similar to FreeBSD jails 18

Shared pool of servers Utilization example 250 instances of service A (not shared, multiple racks and clusters) 100 instances of service B (can be shared, needs 1 cpu, 4g memory) 700 instances of service C (can be shared, needs 2 cpu, 16g memory) The automatic scheduler takes care of the allocation Not everything can use a shared pool, of course (e.g. databases) 19

Service management [3] A broken server is not a big deal The scheduler moves the services somewhere else Auto remediation system for common issues Canary ability for services and configs 20

Inter service communication Apache Thrift http://thrift.apache.org/ Tip: always avoid XML 21

Storage management Large objects (photos, videos...) BLOB store Computing nodes with lots of disks Small objects (text, numbers...) Databases (MySQL, HBASE, Hive, etc) Huge cache infra between apps and dbs All highly distributed and replicated Tip: never use disk arrays for big loads 22

Network management L3 everywhere Each rack has a /24 (IPv4) and a /64 (IPv6) Rack switches talk BGP-ECMP to CSWs CSWs talk BGP-ECMP to big routers... All the routing is BGP based 10g fiber links to each server Most services are behind load balancers tip: say goodbye to L2/VLANs 23

Traffic 24

Weekly cycle Egress Ingress Monday Tuesday Wednesday Thursday Friday Saturday Sunday 7 days 25

Daily cycle (global) 11 AM 3 PM 24 hours * Pacific time (UTC-8) 26

Daily cycle (global), mapped 27

Daily cycle (Brazil) 10 PM 1 PM 24 hours * Brasilia time (UTC-3) 28

Some numbers Peak HTTP/SPDY rps: ~12.5M Peak TCP conns: ~260M MAU Global: 1.15 billion MAU Brazil: 73 million (march 2013) 29

Cluster Network/LB topology Internet Datacenter DR DR DR CSW CSW CSW RSW RSW RSW L4LB L4LB L4LB L7LB L4LB L4LB L4LB L4LB L4LB L4LB L7LB L7LB L7LB L7LB L7LB L7LB L7LB L7LB L7LB L7LB L7LB L7LB L7LB L7LB L7LB L7LB L7LB L7LB L7LB WEB BGP/ECMP IPv4: /32s IPv6: /64s Network Traffic Traffic Web DSR/WRR 30

Proportion Singles to Tens Tens to Hundreds Thousands 31

cont. x 10 or more 32

Porto Alegre <-> Forest City, NC 75ms 33

Porto Alegre <-> Forest City, NC 75ms SYN TCP conn established: SYN+ACK 150 ms ACK 75ms 33

Porto Alegre <-> Forest City, NC 75ms SYN TCP conn established: SYN+ACK 150 ms ACK ClientHello 75ms ServerHello SSL session established: 450 ms ChangeCipherSpec ChangeCipherSpec 33

Porto Alegre <-> Forest City, NC 75ms SYN TCP conn established: SYN+ACK 150 ms ACK ClientHello 75ms ServerHello SSL session established: ChangeCipherSpec 450 ms ChangeCipherSpec GET Response Received 600 ms HTTP 1.1 200 33

Edge rack x 1 L4LB x 2 L7LB x 20 PHP 34

POA - GRU - Forest City, NC 60ms 15ms 35

POA - GRU - Forest City, NC 15ms 60ms Sessions established: 90 ms (vs 450 ms) 35

POA - GRU - Forest City, NC 15ms 60ms Sessions established: 90 ms (vs 450 ms) GET GET Request Received HTTP 1.1 200 Response Received: 240 ms HTTP 1.1 200 35

POA - GRU - Forest City, NC 60ms TCP Connect: 150ms SSL Session: 450ms HTTP Response: 600ms 15ms 36

POA - GRU - Forest City, NC 60ms TCP Connect: 150ms 30ms SSL Session: 450ms 90ms HTTP Response: 600ms 240ms 15ms 36

Intl RTT, before and after 37

Intl RTT, before and after 37

Conclusion 38

Tips Never have single point of failures Don't protect only against equipment failure Human failures are the worst ones Make data driven decision Invest on analytics and instrumentation More data, better decisions. Don't fly blind. 39

Tips [2] There's no right or wrong here This is just the way we solve our problem today This will probably be different next year or so, maybe tomorrow Your problem might need a different solution 40

You can push the buttons too http://www.facebook.com/careers 41

42

(c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0 43