datacenter networking

Similar documents
Data Center Use Cases and Trends

Building a small Data Centre

Core and Pod Data Center Design

Netflix Open Connect. 2 Terabits from 6 racks

Cisco Change Management: Best Practices White Paper

Towards Smart and Intelligent SDN Controller

Cisco s Massively Scalable Data Center

Release Notes for PicOS 2.4

MEDIAROOM. Products Hosting Infrastructure Documentation. Introduction. Hosting Facility Overview

Infrastructure overview. Marlon Dutra Production Engineer, Traffic

Introduction to Network Virtualization in IaaS Cloud. Akane Matsuo, Midokura Japan K.K. LinuxCon Japan 2013 May 31 st, 2013

SDN and Open Ethernet Switches Empower Modern Data Center Networks

CENIC s Network. The CSU s Private Cloud Fabric for Disaster Recovery and Shared Services

Intensive Hosting. Intensive Hosting Overview. Why Intensive Hosting?

Setting the Management IP Address

CHOOSING A RACKSPACE HOSTING PLATFORM

Cloud Networking: A Network Approach that Meets the Requirements of Cloud Computing CQ2 2011

Centralized Orchestration and Performance Monitoring

Network and Addressing Design for IPv6 Transition

Cloud Networking: A Novel Network Approach for Cloud Computing Models CQ1 2009

Architecting Data Center Networks in the era of Big Data and Cloud

Multi-Gigabit Intrusion Detection with OpenFlow and Commodity Clusters

Simplifying Data Center Network Architecture: Collapsing the Tiers

Implementing IPv6 at ARIN Matt Ryanczak

Juniper Networks Management Pack Documentation

Addressing Scaling Challenges in the Data Center

Introduction to Junos Space Network Director

Datacenter Rack Switch Redundancy Models Server Access Ethernet Switch Connectivity Options

Der Weg, wie die Verantwortung getragen werden kann!

Transform Your Business and Protect Your Cisco Nexus Investment While Adopting Cisco Application Centric Infrastructure

IPV6 SERVICES DEPLOYMENT

How Solace Message Routers Reduce the Cost of IT Infrastructure

Cisco Unified Computing Remote Management Services

Monitoring Remedy with BMC Solutions

America s Most Wanted a metric to detect persistently faulty machines in Hadoop

vsphere Private Cloud RAZR s Edge Virtualization and Private Cloud Administration

The Software Defined Hybrid Packet Optical Datacenter Network SDN AT LIGHT SPEED TM CALIENT Technologies

Global Headquarters: 5 Speen Street Framingham, MA USA P F

IBM Security QRadar SIEM & Fortinet FortiGate / FortiAnalyzer

Network Documentation & Netdot

Contents UNIFIED COMPUTING DATA SHEET. Virtual Data Centre Support.

The On-Demand Application Delivery Controller

Building A Cheaper Peering Router. (Actually it s more about buying a cheaper router and applying some routing tricks)

Copyright 2008 Link Technologies,Inc. A Proud Vendor Member of the

Building Nameserver Clusters with Free Software

Introduction to VMware EVO: RAIL. White Paper

Four Star Network Management. Jeff Allen WebTV Networks David Williamson Global Networking and Computing

Creating Web Farms with Linux (Linux High Availability and Scalability)

Syslog Analyzer ABOUT US. Member of the TeleManagement Forum

ScienceLogic vs. Open Source IT Monitoring

Vmware VSphere 6.0 Private Cloud Administration

Constructing your Private Cloud Strategies for IT Administrators & Decision Makers

Business white paper. Top ten reasons to automate your IT processes

Voxee Hosted VoIP Billing

Lab Configure IOS Firewall IDS

czwartek, 21 lutego 13

Network Management and Monitoring Software

Network Load Balancing

WHITE PAPER. Avoiding the Pitfalls when Transitioning into Managed Services. By Nick Cavalancia

mbits Network Operations Centrec

Description: Objective: Upon completing this course, the learner will be able to meet these overall objectives:

White Paper. Network Simplification with Juniper Networks Virtual Chassis Technology

Ethernet Wide Area Networking, Routers or Switches and Making the Right Choice

NetFlow/IPFIX Various Thoughts

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Enterprise IT is complex. Today, IT infrastructure spans the physical, the virtual and applications, and crosses public, private and hybrid clouds.

Arista and Leviton Technology in the Data Center

Diagnostics and Troubleshooting Using Event Policies and Actions

VXLAN: Scaling Data Center Capacity. White Paper

Help! My Big Expensive Router Is Really Expensive!

Vistara Lifecycle Management

VMware vcenter Log Insight Getting Started Guide

ICND2 NetFlow. Question 1. What are the benefit of using Netflow? (Choose three) A. Network, Application & User Monitoring. B.

How Microsoft IT Developed a Private Cloud Infrastructure

Cisco s Massively Scalable Data Center. Network Fabric for Warehouse Scale Computer

Cisco Network Switches Juniper Firewall Clusters

Carrier-grade VoIP platform with Kamailio at 1&1

Traditional v/s CONVRGD

NetScaler Logging Facilities

Simplifying the Data Center Network to Reduce Complexity and Improve Performance

Insurance Company Deploys UCS and Gains Speed, Flexibility, and Savings

Bare Metal Cloud. 1.0 Terminology. 3.0 Service Options. 2.0 Service Description

BROCADE FABRIC VISION TECHNOLOGY FREQUENTLY ASKED QUESTIONS

Program Summary. Criterion 1: Importance to University Mission / Operations. Importance to Mission

Enhancing Cisco Networks with Gigamon // White Paper

EMC SCALEIO OPERATION OVERVIEW

TECHNICAL WHITEPAPER. Author: Tom Kistner, Chief Software Architect. Table of Contents

Case Study of A Telecom Infrastructure Management Company

NTT - A global IPv6 deployment case study

Cisco SFS 7000D Series InfiniBand Server Switches

NETFORT LANGUARDIAN MONITORING WAN CONNECTIONS. How to monitor WAN connections with NetFort LANGuardian Aisling Brennan

MPLS WAN Explorer. Enterprise Network Management Visibility through the MPLS VPN Cloud

Diagnosing the cause of poor application performance

Pass Through Proxy. How-to. Overview:..1 Why PTP?...1

Meraki MX Family Cloud Managed Security Appliances

The Remote Infrastructure Management Platform

Virtual Machine in Data Center Switches Huawei Virtual System

MPLS-based Layer 3 VPNs

VMware vcenter Log Insight Getting Started Guide

NETWORKING FOR DATA CENTER CONVERGENCE, VIRTUALIZATION & CLOUD. Debbie Montano, Chief Architect dmontano@juniper.net

Transcription:

datacenter networking david swafford network engineer 8-OCT-2013 NANOG 59 dswafford@fb.com

+ 7 PB each month for photos alone (as of Oct. 2012) 1.15B people (MAUs) Source: Facebook internal data, June 2013 350M+ photos uploaded per day (on average in Q4 2012)

traffic growth

clusters a unit of compute

our clusters news feed advertising web messaging photos cache database

1 st generation clusters MPLS domain backbone Layer 3 Layer 2 cluster cluster n x 1Gb / first-gen 10Gb rack 1Gb ed uplinks

early challenges usable capacity Layer 2 scaling

2 nd generation MPLS domain backbone Layer 3 cluster cluster cluster cluster Layer 2 10Gb fiber rack 10Gb copper routed uplinks

BGP to the rack control scale no IGP

lots of intra-cluster traffic! Internet backbone web photos cluster cluster cluster cluster cluster cluster cluster cluster rack rack

the primary role of our backbone connects datacenters private / transit peering

growing pains backbone devices are too powerful for intra-dc needs

looking for a better way to scale

evolving backbone web news feed cluster cluster cluster cluster cluster cluster cluster cluster rack backend datacenter network rack

it works, why change? chassis break in obscure ways efficiency

3 rd generation Folded Clos datacenter-wide fabric fabric fabric fabric fabric fabric fabric fabric fabric rack

3 rd generation spine spine spine spine spine spine spine spine spine server pod fabric fabric fabric fabric rack fabric fabric fabric fabric

3 rd generation spine spine spine spine spine spine spine spine spine server pod rack server pod rack

scaling to a full datacenter pod spine pod plane pod pod spine

managing everything

approaching networking from a software mindset configuration C Programing auditing alerting remediating

frees up engineers to make greater impact learning automating sharing

deploying a cluster step engineer computer planning the port map physical installation generating configuration applying validating enabling for live traffic

FBAR engineers create audits and remediation scripts audits trigger alarms FBAR reacts to alarms

IX peering dropped BGP session syslog alarm attempt remediation, still down? No discard Yes escalate For more information, see Making Facebook Self-Healing https://www.facebook.com/note.php?note_id=10150275248698920

understanding the black box FABRIC INTERCONNECT SWITCH ASIC CP CPU L3 Tables L2 Tables PHY CTLR PHY CTLR PHY CTLR SFP SFP SFP SFP SFP SFP SFP SFP SFP SFP SFP SFP

troubleshooting the black box when a line-card goes crazy how do you even know about it? never trust the box!

monitoring everything all links / BGP sessions FIBs TCP retransmit stats

a culture of automation filter noise in software automate the repetitive engineers focus on real problems

our rack es

the problem install and forget configuration drift inconsistent IPv6 support

IPv6 everywhere! for real! every rack in 2013 all services in early 2014 why? IPv4 won t last forever Band-Aids are not fun to troubleshoot and it s really cool! 2a03:2880:2110:df07:face:b00c:0:1

rolling out IPv6 dual-stacked backbone and cluster es rack upgrades service migration

the old way to upgrade a cluster coordinate an outage window with affected service owners drain traffic upgrade

the difficulty lots of service owners news feed advertising web messaging photos cache database

we needed a change. why drain? why a single window? why is NetEng so heavily involved?

looking at an early attempt blacklist racks by hostnames, upgrade the rest

how we solved it dedicated racks? shift the responsibility to the service owner shared racks? schedule every rack under full automation

dedicated racks shift the responsibility? real world less time on all sides! empowered service owners moved from user to customer we re friends now!

focused on a smooth user experience single button upgrades detailed reporting from the start easy job management

shared racks schedule every rack? accurate timing and notification some services need to be drained why not? software will do the work anyway

where are we now? service owners handling rack upgrades! a network that constantly upgrades itself!

how? Aggiornamento! client / server model based on Thrift runs on Linux, written in Python, backed by MySQL integrates across all internal systems: impact analysis and notification scheduling job management

automate your day job! focus on the impossible!