Microsoft s Demon Datacenter Scale Distributed Ethernet Monitoring Appliance

Similar documents
Large-Scale Passive Monitoring using SDN

Large-Scale Passive Network Monitoring using Ordinary Switches

OpenFlow and Software Defined Networking presented by Greg Ferro. OpenFlow Functions and Flow Tables

Enhancing Cisco Networks with Gigamon // White Paper

BROADCOM SDN SOLUTIONS OF-DPA (OPENFLOW DATA PLANE ABSTRACTION) SOFTWARE

VXLAN: Scaling Data Center Capacity. White Paper

An Overview of OpenFlow

Scalable Network Monitoring with SDN-Based Ethernet Fabrics

Data Analysis Load Balancer

Data Center Infrastructure of the future. Alexei Agueev, Systems Engineer

Fiber Channel Over Ethernet (FCoE)

Packet Optimization & Visibility with Wireshark and PCAPs. Gordon Beith Director of Product Management VSS Monitoring

SDN Overview for UCAR IT meeting 19-March Presenter Steven Wallace Support by the GENI Program Office!

How To Orchestrate The Clouddusing Network With Andn

SDN. WHITE PAPER Intel Ethernet Switch FM6000 Series - Software Defined Networking. Recep Ozdag Intel Corporation

Cisco IOS Flexible NetFlow Technology

Open Source Network: Software-Defined Networking (SDN) and OpenFlow

Enhancing Cisco Networks with Gigamon // White Paper

SOFTWARE-DEFINED NETWORKING AND OPENFLOW

Software Defined Networking A quantum leap for Devops?

Software Defined Networking

Technical Bulletin. Enabling Arista Advanced Monitoring. Overview

OVERLAYING VIRTUALIZED LAYER 2 NETWORKS OVER LAYER 3 NETWORKS

Software Defined Networking

Affording the Upgrade to Higher Speed & Density

Tutorial: OpenFlow in GENI

SDN AND SECURITY: Why Take Over the Hosts When You Can Take Over the Network

Carrier/WAN SDN Brocade Flow Optimizer Making SDN Consumable

SOFTWARE-DEFINED NETWORKING AND OPENFLOW

SDN and OpenFlow. Naresh Thukkani (ONF T&I Contributor) Technical Leader, Criterion Networks

Distributed Monitoring Pervasive Visibility & Monitoring, Selective Drill-Down

Securing Local Area Network with OpenFlow

How To Make A Vpc More Secure With A Cloud Network Overlay (Network) On A Vlan) On An Openstack Vlan On A Server On A Network On A 2D (Vlan) (Vpn) On Your Vlan

Ethernet Fabric Requirements for FCoE in the Data Center

Cloud Networking Disruption with Software Defined Network Virtualization. Ali Khayam

Why Software Defined Networking (SDN)? Boyan Sotirov

20. Switched Local Area Networks

How To Understand and Configure Your Network for IntraVUE

Scalable Extraction, Aggregation, and Response to Network Intelligence

基 於 SDN 與 可 程 式 化 硬 體 架 構 之 雲 端 網 路 系 統 交 換 器

Network Agent Quick Start

Wedge Networks: Transparent Service Insertion in SDNs Using OpenFlow

Software Defined Networking

Application Performance Management - Deployment Best Practices Using Ixia- Anue Net Tool Optimizer

Flow Monitor Configuration. Content CHAPTER 1 MIRROR CONFIGURATION CHAPTER 2 RSPAN CONFIGURATION CHAPTER 3 SFLOW CONFIGURATION...

Outline. Institute of Computer and Communication Network Engineering. Institute of Computer and Communication Network Engineering

Enabling Visibility for Wireshark across Physical, Virtual and SDN. Patrick Leong, CTO Gigamon

Security+ Guide to Network Security Fundamentals, Fourth Edition. Chapter 6 Network Security

WHITE PAPER. Network Virtualization: A Data Plane Perspective

HP OpenFlow Protocol Overview

White Paper. SDN 101: An Introduction to Software Defined Networking. citrix.com

Ten Things to Look for in an SDN Controller

Introduction to Software Defined Networking (SDN) and how it will change the inside of your DataCentre

Software Defined Networking and the design of OpenFlow switches

Flow Monitor Configuration. Content CHAPTER 1 MIRROR CONFIGURATION CHAPTER 2 SFLOW CONFIGURATION CHAPTER 3 RSPAN CONFIGURATION...

Software-Defined Networking for the Data Center. Dr. Peer Hasselmeyer NEC Laboratories Europe

Ethernet-based Software Defined Network (SDN) Cloud Computing Research Center for Mobile Applications (CCMA), ITRI 雲 端 運 算 行 動 應 用 研 究 中 心

Reference to common tasks

What is SDN? And Why Should I Care? Jim Metzler Vice President Ashton Metzler & Associates

Scalable Network Monitoring with SDN-Based Ethernet Fabrics

Software Defined Networking (SDN) - Open Flow

Multiple Service Load-Balancing with OpenFlow

SSL Inspection Step-by-Step Guide. June 6, 2016

IxNetwork OpenFlow Solution

Programmable Networking with Open vswitch

Flow Analysis Versus Packet Analysis. What Should You Choose?

NetFlow Tracker Overview. Mike McGrath x ccie CTO mike@crannog-software.com

Wireshark Developer and User Conference

Understanding VLAN Translation/Rewrites using Switches and Routers

NfSen Plugin Supporting The Virtual Network Monitoring

Underneath OpenStack Quantum: Software Defined Networking with Open vswitch

Software Defined Networking What is it, how does it work, and what is it good for?

Open SDN for Network Visibility

L2-L7 BASED SERVICE REDIRECTION WITH SDN/OPENFLOW

How To Switch In Sonicos Enhanced (Sonicwall) On A 2400Mmi 2400Mm2 (Solarwall Nametra) (Soulwall 2400Mm1) (Network) (

Any-to-any switching with aggregation and filtering reduces monitoring costs

How To Write A Network Plan In Openflow V1.3.3 (For A Test)

Virtualized Network Services SDN solution for enterprises

2. Are explicit proxy connections also affected by the ARM config?

and reporting Slavko Gajin

Network Configuration Example

Smart Network Access System SmartNA 10 Gigabit Aggregating Filtering TAP

Bringing OpenFlow s Power to Real Networks

Best Practices for Network Monitoring How a Network Monitoring Switch Helps IT Teams Stay Proactive

A Case for Overlays in DCN Virtualization Katherine Barabash, Rami Cohen, David Hadas, Vinit Jain, Renato Recio and Benny Rochwerger IBM

SDN and Data Center Networks

NetFlow/IPFIX Various Thoughts

APPLICATION NOTE 211 MPLS BASICS AND TESTING NEEDS. Label Switching vs. Traditional Routing

Networking and High Availability

SDN Controller Requirement

Flow processing and the rise of the middle.

IP Addressing and Subnetting. 2002, Cisco Systems, Inc. All rights reserved.

Ensuring end-user quality in NFV-based infrastructures

Open Fabric SDN The Comprehensive SDN approach. Jake Howering, Director SDN Product Line Management Bithika Khargharia, PhD, Senior Engineer

Secure Access Complete Visibility


Network Security through Software Defined Networking: a Survey

Transcription:

Microsoft s Demon Datacenter Scale Distributed Ethernet Monitoring Appliance Rich Groves Principal Architect Microsoft GNS Bill Benetti Senior Service Engineer Microsoft MSIT 1

Before We Begin We are Network Engineers. This isn t a Microsoft product. We are here to share methods and knowledge. Hopefully we can all foster evolution in the industry. 2

Microsoft is a great place to work! We need experts like you. We have larger than life problems to solve. Networking is important and well funded. Washington is beautiful. 3

The Microsoft Demon Technical Team Rich Groves Bill Benetti Dylan Greene Justin Scott Ken Hollis Tanya Ollick Eric Chou 4

About Rich Groves Microsoft s Global Network Services NOUS Network of Unusual Scale Microsoft IT EOUS Enterprise of Unusual Scale Time Warner Cable Ar#st s Approxima#on Endace Made cards, systems, software for Snifferguys AOL Snifferguy MCI 5

The Traditional Network hierarchical tree optimized for north/ south traffic firewalls, load balancers, and WAN optimizers not much cross datacenter traffic lots of traffic localized in the top of rack Hierarchical Tree Structure Op#mized for N- S traffic 6

Analyzing the Traditional Network insert taps within the aggregation port mirror at the top of rack capture packets at the load balancer well understood but costly at scale 7

The Cloud Datacenter tuned for massive cross data center traffic appliances removed for software equivalents 8

Can you tap this cost effectively? 8,16,and 32x10g uplinks Tapping 32x10g ports requires 64 ports to aggregate. (Who can afford buying current systems for that?) ERSpan could be used, but it impacts production traffic. Even port mirrors are a difficult task at this scale. 9

Many attempts at making this work Capturenet - complex to manage - purpose built aggregation devices were far too expensive at scale - resulted in lots of gear gathering dust PMA - Passive Measurement Architecture - failed due to boring name - rebranded as PUMA by outside marketing consultant (Rich s eldest daughter) PUMA - lower cost than Capturenet - extremely feature rich - too costly at scale Pretty Pink PUMA - attempt at rebranding by Rich s youngest daughter - rejected by the team 10

Solution 1: Off the Shelf used 100% purpose built aggregation gear supported many higher end features (timestamping,slicing,etc) price per port is far too high not dense enough (doesn t even terminate one tap strip) high cost made tool purchases impossible no point without tools 11

Solution 2: Cascading Port Mirrors How mirror all attached monitor ports to next layer pre-filter by only mirroring interfaces you wish to see The Upside cost effective uses familiar equipment can be done using standard CLI commands in a config The Downside control traffic removed by some switches assumes you know where to find the data lack of granular control uses different pathways in the switch quantity of port mirror targets is limited I heard packets 1,2,3,4 I m not allowed to tell anyone about packet2 I heard packets 1,3,4 switch monitor ports switch host switch 12

Solution 3: Making a Big Hub How turn off learning flood on all ports unique outer VLAN tag per port using QinQ pre-filter based on ingress port through VLAN pruning Upside cost effective Downside Control traffic is still intercepted by the switch. Performance is non-deterministic. Some switches need SDK scripts to make this work. Data quality suffers. switch monitor ports switch switch 13

The End Well not really, but it felt like it. 14

Core Aggregator Functions terminates links Let s solve 80 percent of the problem: 5-tuple pre-filters terminates links duplication 5-tuple pre-filters forwarding without modification duplication low latency forwarding without modification low latency zero loss zero loss time stamps frame slicing do- able in merchant silicon switch chips costly due to lack of demand outside of the aggregator space 15

Reversing the Aggregator The Basic Logical Components terminate links of all types and a lot of them low latency and lossless N:1, 1:N duplication some level of filtering control plane for driving the device 16

What do these platforms have in common? Can you spot the commercial aggregator? 17

Introducing Merchant Silicon Switches Advantages of merchant silicon chips: more ports per chip (64x10g currently) much lower latency (due to fewer chip crossings) consume less power more reliable than traditional ASIC based multi-chip designs 18

Merchant Silicon Evolution Year 2007 2011 2013 2015 10G on single chip 24 64 128 256 Silicon Technology 130nm 65nm 40nm 28nm Interface speed evolu#on: 40G, 100G, 400G(?), 1Tbps This is a single chip. Amazingly dense switches are created using mul#ple chips. 19

Reversing the Aggregator terminate links of all types low latency and lossless N:1, 1:N duplication some level of filtering The Basic Logical Components control plane for driving the device 20

Port to Port Characteristics of Merchant Silicon Latency port to port (within the chip) Loss within the aggregator isn t acceptable. Such deterministic behavior makes a single chip system ideal as an aggregator. 21

Reversing the Aggregator terminate links of all types low latency and lossless The Basic Logical Components N:1, 1:N duplication some level of filtering control plane for driving the device 22

Duplication and Filtering Duplica#on line rate duplication in hardware to all ports facilitates 1:N, N:1, N:N duplication and aggregation Filtering line rate L2/L3/L4 filtering on all ports thousands of filters depending on the chip type 23

Reversing the Aggregator terminate links of all types low latency and lossless N:1, 1:N duplication some level of filtering The Basic Logical Components control plane for driving the device 24

Openflow as a Control Plane What is Openflow? remote API for control allows an external controller to manage L2/L3 forwarding and some header manipulation runs as an agent on the switch developed at Stanford 2007-2010 now managed by the Open Networking Foundation

Control Bus (Proprietary control protocol) Common Network Device Data Plane Supervisor! Control Plane Supervisor! Data Plane

Supervisor (OpenFlow Agent)! Priority Match Ac;on List 300 TCP.dst=80 Fwd:port 5 100 IP.dst= 192.8/16 Queue: 2 400 * DROP OpenFlow Controller" Supervisor (OpenFlow Agent)! Flow Table Control Bus Flow Table Controller Programs Switch s Flow Tables Priority Match Ac;on List 500 TCP.dst=22 TTL- -, Fwd:port 3 200 IP.dst= 128.8/16 Queue: 4 100 * DROP

Proactive Flow Entry Creation match xyz, rewrite VLAN, forward to port 15 Controller" match xyz, rewrite VLAN, forward to port 42 10.0.1.2! 10.0.1.2! 10.0.1.2!

Openflow 1.0 Match Primitives (Demon Related) Match Types ingress port src/dst MAC src/dst IP ethertype protocol src/dst port TOS VLAN ID VLAN Priority Action Types mod VLAN ID drop output controller

Flow Table Entries == if,then,else if ingress port=24 and ethertype=2048(ip) and dest IP=10.1.1.1 then dest mac=00:11:22:33:44:55 and output=port1 if ethertype=2054(arp) and src IP=10.1.1.1 then output=port2,port3,port4,port5,port6,port7,port8,port9,port10 if ethertype=2048(ip) and protocol=1(icmp) then controller

Openflow 1.0 Limitations lack of QinQ support lack of basic IPv6 support no deep IPv6 match support can redirect based on protocol number (ether-type) no layer 4 support beyond port number cannot match on TCP flags or payloads 31

Multi-Tenant Distributed Ethernet Monitoring Appliance Enabling Packet Capture and Analysis at Datacenter Scale monitor ports 4.8 Tbps of filtering capacity find the needle in the haystack more than 20X cheaper than off the shelf solu#ons filter mux filter Industry Standard CLI service service self serve using a RESTful API delivery Demon appliance tooling save valuable router resources using the Demon packet sampling offload filter and deliver to any Demonized datacenter even to hopboxes and Azure based on low- cost merchant silicon leveraging Openflow for modular scale and granular control

Filter Layer monitor ports terminates inputs from 1,10,40g ports filter mux filter ini#ally drops all traffic inbound service delivery service approximately 1000 L3/L4 Flows per switch filter switches have 60 filter interfaces facing monitor ports filter interfaces allow only inbound traffic through the use of high priority flow entries 4x10g infrastructure interfaces are used as egress toward the mux performs longest match filters high rate sflow sampling with no produc#on impact 33

Mux Layer monitor ports terminates 4x10g infrastructure ports from each filter switch performs shortest match filters filter mux filter service delivery service provides both service node and delivery connec#vity duplicates flows downstream if needed tooling introduces pre-service and post-service ports used to aggregate all filter switches directs traffic to either service node or delivery interfaces 34

Services Nodes monitor ports leverage higher end features on a smaller set of ports filter mux filter service service delivery connected to mux switch through pre-service and post-service ports possible uses: deeper filtering time stamping frame slicing encapsulation removal for tunnel inspection configurable logging higher resolution sampling encryption removal payload removal for compliance encapsulation of output for location independence performs optional functions that Openflow and merchant silicon cannot currently provide 35

Delivery Layer monitor ports 1:N and N:1 duplica#on data delivery to tools filter service mux service filter further filtering if needed delivery tooling introduces delivery interfaces which connect tools to Demon can optionally fold into mux switch depending on tool quantity and location 36

Advanced Controller Actions CLI controller" API" Demon applica#on API receives packets and octets of all flows created above used as rough trigger for automated packet captures duplicate LLDP, CDP, and ARP traffic to the controller at low priority to collect topology informa#on source Tracer documenta#on packets to describe the trace 37

Location Aware Demon Policy drops by default monitor ports policy created using CLI or API forward all traffic matching tcp dest 80 on port1 of filter1 to port 1 of delivery1 Demon app creates flows though controller API filter1 filter controller pushes a flow entry to filter1,mux,and delivery to output using available downstream links high priority flow takes precedence service mux service controller" API" demon applica#on traffic gets to the wireshark system delivery CLI API user 38

Location Independent Demon Policy drops by default on all ingress interfaces monitor ports policy created using CLI or API if TCP dst port 80 on any ingress port on any filter switch then add location meta-data and deliver to delivery1 filter1 ingress Vlan tag is rewrioen filter Ingress VLAN tag is rewritten to add substrate locale info and uniqueness to duplicate packets. Traffic gets to Wireshark. high priority flow is created on all switches service mux service controller" API" Demon applica#on delivery CLI API user 39

Inserting a Service Node monitor ports policy created using CLI or API forward all traffic matching tcp dest 80 on port1 of filter1 to port 1 of delivery1 and use service node timestamping filter1 mux uses service node as egress for flow filter flows created per policy on the filter and mux to use the service node as egress traffic gets to Wireshark flows created based on policy service mux service controller" API" Demon applica#on delivery CLI API #mestamp is added to frame and sent toward mux mux sends service node sourced traffic to delivery switch user 40

Advanced Use Case 1: Closed Loop Data Collection sflow samples sourced from all interfaces filter1 monitor ports mux filter controller" API" Demon applica#on sflow exports to collector Problem subnets are observed through behavioral analysis. sflow collector executes Demon policy via the API to send all traffic from these subnets to a capture device tracer packets are fired toward the capture device describing the reason and ticket number of the event service service CLI API delivery only meaningful captures are taken sflow collector 41

Advanced Use Case 2: Infrastructure Cries for Help monitor ports A script is written for the load balancer describing a failstate, DDOS signature, or other performance degradation. filter1 filter The load balancer executes an HTTP sideband connection creating a Demon policy based on the scripted condition. service mux service CLI controller" API" Demon applica#on API Tracer packets are fired at the capture server detailing the reason for this event. produc#on network delivery load balancer 42

Summary The use of single chip merchant silicon switches and Openflow can be an adequate replacement for basic tap/mirror aggregation at a fraction of the cost. An open API allows for the use of different tools for different tasks. Use of an Openflow controller enables new functionality that the industry has never had in a commercial solution. 43

Thanks Q&A Thanks for attending! 44