LHCONE Site Connections Michael O Connor moc@es.net ESnet Network Engineering Asia Tier Center Forum on Networking Daejeon, South Korea September 23, 2015
Outline Introduction ESnet LHCONE Traffic Volumes LHCONE BGP Reachability Customer Edge Routing Symmetry Site Connection Guidelines Lessons Learned From Early Adopters BNL VRF & Policy Routing Example Alternative Architectures for Destination Routing DTNs in the Science Enclave SDN Based Policy Routing Engine LHCONE BGP Prefix Filtering Conventions Cross Domain Troubleshooting PerfSONAR Measurement in LHCONE New Site Connection Data Michael O'Connor ESnet moc@es.net
Introduction The LHCONE VRF Overlay Network is intended to be built on shared network infrastructure. 100Gbps technologies have reduced bandwidth costs making virtual network overlays cheap, quick and scalable, expect this paradigm to grow. The reduced cost, increased speed and the community centric governance models of overlay networks greatly enhance the Globally Distributed Compute Model relied on by international scientific collaborations. LHCONE will evolve, connecting is to join in shaping that evolution. LHCONE is not like other Internet uplinks, planning and care are essential for a smooth integration. Michael O'Connor ESnet moc@es.net
ESnet Traffic Volumes LHCONE represents more than 20% of ESnet accepted traffic
ESnet LHCONE Reachability Graph https://twiki.cern.ch/twiki/pub/lhcone/lhconevrf/lhcone-esnet-paths.pdf Michael O'Connor ESnet moc@es.net
ESnet LHCONE Reachability Graph ESnet Core Hub Locations https://twiki.cern.ch/twiki/pub/lhcone/lhconevrf/lhcone-esnet-paths.pdf Michael O'Connor ESnet moc@es.net
ESnet LHCONE Reachability Graph ESnet Core Hub Locations NSP & ESnet Customer Peerings https://twiki.cern.ch/twiki/pub/lhcone/lhconevrf/lhcone-esnet-paths.pdf Michael O'Connor ESnet moc@es.net
ESnet LHCONE Reachability Graph ESnet Core Hub Locations NSP & ESnet Customer Peerings Regional NSPs LHCONE Collaborating Sites https://twiki.cern.ch/twiki/pub/lhcone/lhconevrf/lhcone-esnet-paths.pdf Michael O'Connor ESnet moc@es.net
Customer Edge Routing Symmetry A site will become multi-homed when it connects to LHCONE, if it is not already. When firewalls lack state for a connection, they typically drop packets. Multi-homed enterprise networks that employ firewalls may encounter issues when transactions egress on one uplink and ingress on another. Routing asymmetry results if hosts not belonging to an LHCONE net-block are permitted to egress packets onto LHCONE. Ideally, the LHCONE contact surface for a site is a cluster of Data Transfer Nodes (DTNs). It is not appropriate to advertise a sites full CIDR allocation into LHCONE. Michael O'Connor ESnet moc@es.net
Site Connection Guidelines Guidelines to ensure route symmetry at the customer edge Determine the appropriate scientific LAN address ranges that will participate in LHCONE, ideally a set of DTNs on a /24 CIDR block. Working with your NSP, advertise these ranges into LHCONE. Ensure that only hosts in appropriate scientific LAN address ranges can forward packets into the LHCONE network, otherwise it will be dropped. Accept all BGP route prefixes advertised by the LHCONE community, or LHCONE defined BGP filtering techniques. Routing preference: Prefer LHCONE paths above general R&E IP paths or any other path containing a state-full firewall. Filter your own egress packets, counting drops due to routing asymmetry before they are silently dropped by your LHCONE NSP.
Lessons Learned from the Early Adopters Common mistakes made by sites early in the evolution of the LHCONE Treating LHCONE like any other Internet connection Allowing non-lhcone traffic into LHCONE Not preferring LHCONE on egress to LHCONE sites Common course of (unfortunate) events: 1. Define LHCONE LAN segments, communicate them to NSP 2. Establish LHCONE BGP peering 3. Experience loss of connectivity to LHC sites, CERN, Tier1s, etc. 4. Disconnect from LHCONE for a couple weeks 5. Attempt to define their entire enterprise as an LHCONE LAN segment 6. Get denied and go back to the drawing board 7. Implement policy/source routing for LHCONE LANs 8. Establish LHCONE BGP peering Michael O'Connor ESnet moc@es.net
Brookhaven Lab Policy Routed Perimeter With 100G Diversity Complex policy routing results when LHCONE LANS are mixed into the campus VRF 100G Port With VLANs BNL checks all egress traffic against the LHCONE source route policy, if the source is from a Campus LAN (not LHCONE) it takes a conditional default to ESnet general IP. LHCONE sources egress using destination routing toward the best path.
Policy Routed Egress If possible standard destination based routing is recommended. Juniper Filter Based Forwarding is used to source route the base routing table LHCONE DTN(S) DUAL-HOMED Inet.0 FBF LHCONE Management The campus Default egress routing is implemented in a VRF CAMPUS LAN MIXED LHC AND NON-LHC Firewall Campus VRF General Internet Policy Based Source Routing is a complex form of static routing that is configured on a per router basis,
Policy Routed General IP Egress Juniper Filter Based Forwarding is used to source route the base routing table LHCONE DTN(S) DUAL-HOMED Inet.0 FBF LHCONE Management Source or destination do not match LHCONE AUP CAMPUS LAN MIXED LHC AND NON-LHC Firewall Campus VRF General Internet Mixing the Campus and LHCONE LANs required multi-stage routing for generic IP traffic.
LHCONE Site Example Destination Routing LHCONE DTN(S) DUAL-HOMED VRF1 LHCONE Management IBGP VRF1 No BGP export of LHCONE routes to VRF2 CAMPUS LANS Firewall VRF2 General Internet If the LHCONE can be separated into a VR or VRF then standard destination based routing can be used
Multiple Overlay Networks Scaling For Additional Overlay Networks LHCONE DTN(S) DUAL-HOMED VRF1 LHCONE Management IBGP MESH CAMPUS LANS Firewall VRF2 General Internet Management 2 nd Overlay LAN VRF1 2 nd Overlay WAN The additional overhead of protocol configuration pays back in scalability and powerful routing policy control.
Science Enclave DTNs Border Router perfsonar Firewall WAN Enterprise perfsonar perfsonar Filesystem Security Controls 10GE 10GE DTN Security Controls Data Transfer Path User Login/Shell Access Path Compute Data Access Path Science Enclave Science DMZ Switch/Router 10GE 10GE DTN 10GE DTN 10GE DTN DTN 10GE 10GE Sealed DTNs (Globus only, no shell access) 10GE 10GE HEAD 10GE Cluster Head/Login Nodes Filesystem 10GE Cluster compute nodes HEAD 17 9/23/2015 Contributed by Eli Dart ESnet Science Engagement
SDN Policy Routing Appliance Proposed LHCONE NSP Cust Edge VRF Firewall VRF AUP Source routing SDN appliance switches egress packets based on the LHCONE AUP AUP SDN MIXED LAN LHCONE Mixed LAN Firewall bypass with customer edge VRF
BGP Filtering End Site Configuration Steps In order to take advantage of LHCONE BGP Filtering a site needs to: 1. Tag ALL BGP routes exported to LHCONE with a consistent set of supported BGP Filtering community tags. 2. For every use of the Do Not Advertise to ASN community, a corresponding Reject From ASN import policy is required for symmetry. Import Export The LHCONE BGP Filtering feature enables a compute center to control only the export if their routes. While essential to using this feature, import policy remains a local responsibility. Michael O'Connor ESnet moc@es.net
LHCONE NSP Operational BGP Communities Community 65001:ASN 65002:ASN 65003:ASN 65010:ASN Meaning Prepend Local ASN 1X to Export to Peer ASN Prepend Local ASN 2X to Export to Peer ASN Prepend Local ASN 3X to Export to Peer ASN Do not Export to Peer ASN Michael O'Connor ESnet moc@es.net
LHCONE BGP Filtering Guidelines Service Definition document V1.0 The BGP filtering service is intended to be used by an LHCONE end site to prevent the distribution of their BGP route prefixes to another LHCONE end-site. 1. An individual BGP community tag will be used for each and every remote end site that is filtered. 2. A site will tag ALL of the route prefixes it exports into LHCONE uniformly. 3. NSP ASNs are NOT valid for use in LHCONE BGP Filtering communities. 4. NSPs will only provision this service at their customer edge and will NOT provision it on internal LHCONE NSP/NSP BGP peerings. 5. NSPs only filter prefixes for their directly attached customers on export to those customers. Otherwise they pass LHCONE BGP Filtering communities along without modification. https://twiki.cern.ch/twiki/pub/lhcone/lhconevrf/lhco NEBGPFilteringServiceDefinition.pdf Michael O'Connor ESnet moc@es.net
Troubleshooting LHCONE does not have a centralized NOC. Compute facilities should work through their LHCONE NSP and possibly regional network providers where necessary. Operations model of the LHCONE benefits from it s close similarity to common routed R&E networking standards and operating procedures. Information: Ticket chain Detailed topology Measurement infrastructure. 22 9/23/2015 ESnet Template Examples
Cross Domain Ticket Chain Cross domain troubleshooting will generate trouble tickets in each domain. A series of trouble tickets will be created, generally one per NSP and these numbers will be shared in upstream and downstream communications while troubleshooting the issue. Maintaining a series of tickets, one for each NSP in the troubleshooting path. 23 9/23/2015 ESnet Template Examples
Detailed Topology Data Within each of our respective networks, we rely on topology and measurement data of various kinds to diagnose, locate and address issues using efficient internal communication and managed processes. Detailed topology contribution from each participating NSP Contains VRF edge router and interface information Facilitates focused looking-glass queries Enables more specific requests that better fit general NOC processes Reduce the need for LHCONE specific operating procedures 24 9/23/2015 ESnet Template Examples
Measurement Measurement information is essential for identifying and locating performance issues. perfsonar Performance baseline bandwidth & latency Changes in performance over time NSP provisioned LHCONE only measurement nodes Looking Glasses SNMP interface metrics Load Errors Discards 25 9/23/2015 ESnet Template Examples
WLCG PerfSONAR Dashboard PerfSONAR will provide essential monitoring and metrics as Asian networks Join LHCONE
Latency 3x10GE vs 100GE OWAMP graph of ESnet/GEANT PS Monitoring Points Example using PerfSONAR for troubleshooting Trans-Atlantic throughput Pre insertion, Bi-modal latency distribution across 3x10GE LAG. Post insertion converged latency across ANA-100GE single circuit, 4/16.
Throughput 3x10GE vs 100GE BWCTL graph of ESnet/GEANT PS Monitoring Points Pre insertion, tri-modal throughput distribution across 3x10GE LAG. Post insertion converged throughput across ANA-100GE single circuit, 4/16
New Connector Information Customer Contact Information: Names, NOC, email, phone Site Information: Name of Institution, experiment, ASN Interconnection Details: Location/demarc Bandwidth, VLAN, MTU Shared/Private List of route prefixes that will be advertised into LHCONE. How will your site ensure symmetric routing with LHCONE? Required testing process. Michael O'Connor ESnet moc@es.net
Questions? Michael O Connor ESnet Network Engineer moc@es.net 631 344-7410 30 9/23/2015 ESnet Template Examples