Scalable Network Monitoring with SDN-Based Ethernet Fabrics Prashant Gandhi VP, Product Management & Strategy Big Switch Networks 1
Agenda Trends in Network Monitoring SDN s Role in Network Monitoring Monitoring Fabric based on SDN & Bare-metal switching Customer Use Cases 2
Why Network Monitoring? Physical Workloads Virtual Workloads Monitoring Tools Production Network Net Mon SLA Mon Sec Mon App Mon Data Recorder VOIP Mon Every organization needs to Monitor Enterprises, Service Providers, Public Sector, Cloud 3
Customer Requirements Physical Workloads Virtual Workloads Monitoring Tools Net Mon App Mon Production Network Sec Mon VOIP Mon SLA Mon Data Recorde r Bandwidth: 10G, 40G Scale: 100s of Ports Flexibility: Any Tool to Any Tap Multi-tenancy: Multiple IT Teams Cost Optimized: Lower CapEx and OpEx 4
Gen-1: Tap & Tool Silo Tools 1/10GE Network Probe / Recorder 1/10GE Performance Monitoring Appliance Security Appliance Physical & Virtual Workloads Manual Connections Complex Silo operation May 2014 Big Switch Networks (www.bigswitch.com) 5
Gen-2: Limited Tap Aggregation Tools Physical & Virtual Workloads Complex Limited-scope Operation Higher cost 6
Gen-3: SDN-based Monitoring Fabrics SDN Controller 1G/10G/40G SDN-based Ethernet Monitoring Fabric based on Bare-metal Switches 1G/ 10G/ 40G Physical & Virtual Workloads Monitoring Fabric s as Service Nodes Tool Farm 7
SDN s Role in Network Monitoring 8
Learnings from HyperScale DCs Bare Metal - HW / SW disaggregation - No vendor lock-in - Much lower CapEx SDN - No complex protocols on HW - Massive simplification w/ SDN Controller - Fast speed of change - Much lower OpEx Modern Network Architecture - Agility - Choice - Lower TCO 9
SDN 2.0 Architectural Evolution Accelerate Production-grade SDN and Bare-metal deployments SDN 1.0: Fragmented SW stack Automation Tool SDN Controller OpenFlow APIs SDN App North-bound APIs SDN 2.0: Converged SW Stack Automation Tool North-bound APIs SDN App SDN Controller OpenFlow & Extensions (Thick) NetOS OF Traditional Switch HW Too many moving parts for SW (many SW vendors) OF agent from HW vendor varied implementations Limited access to switch ASIC access & Switch HW (Thin) Switch Light OS Bare Metal Switch HW SW solution from single vendor (exactly like the hypervisor/server model) Full access to switch ASIC and Switch HW Logically Centralized / Hierarchically implemented Control-Plane 10
Gen-3: SDN-based Monitoring Fabrics SDN Controller 1G/10G/40G Network Monitoring Fabric based on SDN and Bare-metal Switches 1G/ 10G/ 40G Physical & Virtual Workloads Monitoring Fabric s as Service Nodes Tool Farm 11
Monitoring Fabric based on SDN and Bare-metal Switches 12
Gen-3: Monitoring Fabrics Controller 1G/10G/40G 1G/ 10G/ 40G Physical & Virtual Workloads Monitoring Fabric s as Service Nodes Tool Farm 13
Monitoring Fabric: Components Filter Ports (Tap and facing ports) Controller Monitoring Fabric Delivery Ports (Tool facing ports) Controller (SW) Single pane of glass VM or appliance Built-in GUI, CLI, REST Policy management Fabric (forwarding) management Switch control & management Role-based Access Control Trouble-shooting, fault detection Clustering for High Availability Switches Hardware: Bare-metal switch OS: Switch Light No complex protocols Auto installation via ONIE Ports Filter, Service, Delivery 14
Policy Example 1 Controller Tool Farm F1 D1 Policy P1: Filter Port: F1 Delivery port: D1 Match packets with source ip=10.1.1.x/24 All Packets that do NOT match the rule are DROPPED (filtering operation) Production Network Monitoring Fabric s as Service Nodes 15
Policy Example 2 Controller Tool Farm D1 D2 Policy P2: Filter Port: F2 Delivery port: D1, D2, D3 Match packets with source ip=10.1.1.x/24 F2 D3 All packets matching the rule are replicated and sent to the designated tools (as per policy) Production Network Monitoring Fabric s as Service Nodes 16
Service Chaining of s Controller Tool Farm s as Service Nodes for adv. packet processing: Time-stamping De-duplication packet slicing Service Chaining: Multiple s can be logically chained on a per-policy basis for sophisticated flow processing Production Network Monitoring Fabric s as Service Nodes 17
Tool Scaling Controller Tool Farm Tool Load-balancing: Scale tool bandwidth Production Network Monitoring Fabric s as Service Nodes 18
Monitoring VM-to-VM Traffic vswitch Enable R vswitch Enable R Physical Network Same Monitoring Fabric is leveraged for monitoring VM-to-VM traffic R-Span Traffic Monitoring Fabric Prod Traffic Tools 19
Multi-tenant Operation Monitoring as a Service Self-service monitoring for each group Role-based authorization and privileges Local and/or remote authentication Tenant-Aware GUI, CLI and REST API TACACS+ 20
Event-Triggered Monitoring Programmatic creation of policies based on an event using REST APIs Normal packet Packet of Interest Controller Invoke REST API of the Monitoring Fabric Wireshark (Capture) Dynamically provision / activate / update the policy Traffic of interest is now replicated to the capture tool too. Monitoring Fabric Snort (IDS) 21
Filter Ports (Tap and facing ports) Monitoring Fabric: Functionality Controller Monitoring Fabric Delivery Ports (Tool facing ports) Rich Feature Set 7-tuple policies (L2 L4) IPv6 support Fine-grain Role-Based Access Control Intelligent Policy Resolution VM-to-VM monitoring Programmatic control Service chaining of s Operational Simplicity Auto-Installation Fabric Management & Programmability Enhanced GUI Workflows Scalable Architecture Tool scaling (via load balancing) Fabric scaling (scale-out) Policy scaling (via optimization) 22
Customer Use Cases 23
Customer Use Cases Large Web 2.0 Datacenter: Network ops, security and compliance teams all share the same taps LTE Operator: 4G LTE network monitoring for trouble-shooting and compliance Large Hi-Tech Company: Self-service production tapping for software developers Santa Clara, CA USA April-May 2014 24
Customer Testimonial FYI, we just had a the other day. We had a customer facing issue that s been going on for a month. We thought it was an issue with the ISP. Being able to take a capture off the Core device, we were able to prove it was an issue in our own infra. to identify once we had access to the data. - Network Administrator in a Fortune 50 Company 25
Thank You! 26