White Paper February 2010 Implementing ATCA Serving Gateways for LTE Networks By Karl Wale, Director of PLM and Sridharan Natarajan, Lead Engineer The ever-increasing demand for wireless broadband by mobile data users is constantly forcing telecom standards bodies to look for newer specifications which can deliver multi-megabit throughput and lower latency levels. In response, 3GPP developed Long Term Evolution which brings with it not only higher throughput but also a flatter architecture with lower latencies and an all-ip infrastructure promising reduced operating expenses. LTE s packetoptimized radio access technology, along with an Evolved Packet Core (EPC) network suitable for always on operation, lowers the cost per bit and coexists nicely with other existing access technologies. One key challenge of LTE, however, is how core network components will keep up with massive increases in access link throughput (e.g., seven times HSPA data rates). How evolved core network equipment manufacturers (NEPs) rise to this challenge will be critical for determining subscriber satisfaction with their LTE service experiences. To focus on the challenge, this article discusses the best practices approach for building an ATCA Serving Gateway node, the very backbone of the LTE network user plane. CONTENTS LTE Network Architecture & the Serving Gateway Application pg. 1 Serving Gateway Platform Requirements pg. 2 Building an ATCA Serving Gateway pg. 2 Fault-Tolerant ATCA Platform Architecture pg. 3 Application Failover and Trillium Protocol Integration pg. 5 Load Balancing & Deep Packet Inspection pg. 6 Conclusion pg. 7 LTE Network Architecture & the Serving Gateway Application An LTE network consists of the network elements enodeb (or Base Station, part of E-UTRAN) and Access Gateway to support control and user plane access to LTE User Equipment (i.e., wireless devices). Access Gateway functionality is supported by the Mobility Management Entity (MME) to manage the control plane, the Serving Gateway (SGW) for the user plane, and the PDN Gateway for access to the Internet. EPC nodes are also connected to legacy systems (GERAN and UTRAN) so that LTE systems can co-exist with existing access technologies and facilitate seamless handovers. See Figure 1. The SGW terminates user plane access for the enodeb, routes user plane traffic, performs accounting and monitoring of user data, and acts as a local mobility anchor point for handovers.
2 Here is a summary of SGW functionality: User Plane Packet routing & transfer functions Uplink & Downlink charging per UE, PDN & QCI Accounting on user & QCI granularity for interoperator charging Setting end-marker to the transmission to assist in enodeb reordering function Transport-level packet marking in the uplink & downlink based on the QCI of the associated EPS bearer Lawful intercept Control Plane User session establishment & management ECM-IDLE mode DL packet buffering & triggering network triggered service request Mobility management (Mobility anchor for Inter enodeb handovers & inter-3gpp handovers) Resource Management Hardware system management Configuration management Call data logging Performance statistics Fault management Serving Gateway Platform Requirements An SGW platform requires many capabilities: Optimization for Packet (Bearer Plane) Processing: Since the SGW is designed to do user plane functionality for higher bandwidth systems, the platform on which the SGW resides should be optimized for packet processing. Deep Packet Inspection (DPI): An SGW requires DPI capability from the platform to support lawful intercept, policy control, and QoS enforcement to manage access to services and available bit rate during times of congestion. DPI can also support functions such as targeted advertising. Figure 1. 3G and LTE Networks Carrier-Grade Reliability: A field-proven, highlyavailable architecture is needed to eliminate data/ control PDU loss with no switch-over delay. Computing Power: An SGW platform needs substantial compute power for the control plane signaling between MME, SGSN, and PGW. Scalability: An SGW platform should be scalable so that capacity may be increased easily and robust enough to handle high load conditions. In addition, it should also be possible to co-locate functions in a single shelf (i.e., a single chassis may contain both MME and SGW functions). Quality of Service (QoS): QoS can be included as part of the DPI service control and enforcement functions. IP Security, Threat Management & Intrusion Detection/Prevention: Being part of an all-ip LTE network node, the SGW requires these securityrelated functionalities. Because mobile users are expected to hold carriers responsible for security breaches (much more so than users on wireline broadband connections), it is vital for wireless operators to ensure that subscribers are protected from malware reaching their handsets. Building an ATCA Serving Gateway There is a growing requirement from NEPs for preintegrated, system-level products that combine standards-based ATCA hardware, system and element management software, and high availability (HA) middleware. Pre-integration saves developers significant time and engineering costs, thereby
3 allowing NEPs to get to market much faster than they could have even a few years ago and often in less than 12 months. Purchasing pre-integrated platforms also facilitates a common infrastructure approach which allows NEPs to extend their solutions to include multiple network elements. In fact, in response to the current economic downturn, the trend now is for NEPs to outsource even more of what they formerly did themselves. It is common today to see requests for pre-integrated application software and higher-layer management tools including complete unified management software and Element Management Systems (EMS). In terms of applications, the requirements range from protocol stacks preintegrated with HA middleware to complete functional elements such as the SGW application modules shown in Figure 2. What s more, NEPs and service providers are starting to recognize that Deep Packet Inspection (DPI) may be applied to many functions within SGW nodes to help increase Average Revenue per User (ARPU) for capabilities such as tiered services, congestion management, and security. As a result, DPI is a rapidly growing area for which wireless developers are starting to request significant hardware, software, and integration support. For example, DPI modules that can detect application protocols and apply traffic shaping enable NEPs to integrate sophisticated policy control and enforcement directly into the SGW or PDN Gateway, which is often preferred over delivering such functionality with separate devices. Figure 3 shows a typical ATCA system-level solution for an SGW. Fault-Tolerant ATCA Platform Architecture Figure 4 shows the functional connectivity of an ATCA-based SGW, in this case one based on Radisys pre-integrated ATCA platform. With a standardsbased, bladed approach like ATCA it is relatively straightforward for NEPs to scale the solution from small to medium to large with common platform sizes being 2-slot (2U or 3U), 6-slot (5U or 6U), and 14-slot (12U). A carrier-class ATCA system consists of compute blades, packet processing blades, switches, system Figure 2. Serving Gateway Interfaces & Protocols Figure 3. ATCA-based Serving Gateway Solution Stack controllers, and shelf manager blades. Redundancy is employed with all platform components to avoid any single point of failure, and payload blades support 1+1 (active / standby), N+1 (N active with one standby spare), and N+M (N active with M standby spares)
4 configurations. HA middleware and applications checkpoint across blades and clusters, while power and configuration management is controlled via shelf managers and Essential Services software. The system controller is where HA middleware runs with a persistent storage database that contains all configuration information, HA middleware, unified management interface, HPI services, and Essential Services. The system controller blade maintains the application model per the Application Management Framework (AMF) and runs in active-standby mode whereby all database information is replicated across the active and standby blades. The EMS interface to the ATCA system can be an SNMP agent, Command Line Interface (CLI), Web/ XML, or NETCONF; at a minimum, SNMP and CLI are present. The EMS/NMS will contact the SNMP agent of the system controller to manage the hardware and software components in the chassis. The middleware redirects hardware management requests coming from the EMS to the HPI implementation which has connectivity to the shelf manager. As a result, the EMS can manage all hardware components in the chassis. Figure 4. Serving Gateway Fault-Tolerant Architecture Compute blades are where the SGW control plane user application(s) run and contain control plane protocols (egtp-c), application and protocol MIBs, middleware, and a platform management interface. The compute blades are configured in a 1+1 redundancy model. Packet processor blades are where the SGW user plane application(s) and DPI applications run. They contain the user plane protocol (egtp-u), application and protocol MIBs, middleware, and an Essential Services interface. The packet processor blades are configured with N+M redundancy. The hub switch and network uplinks are configured for 1+1 redundancy. The Radisys ATCA platform in particular is supported by what is known as Layer 2 High Availability (L2HA) for checkpointing network configuration. L2HA provides hub switch status checkpointing and failover, bonding drivers, and dual star connectivity to avoid link loss on the base and fabric aspects of the switch. Figure 5. ATCA HA System Architecture The shelf manager is an important chassis-related function which manages hardware elements including fans, power supplies, and blades. It manages the blades through an Intelligent Platform Management Interface (IPMI) which runs over the IPM Bus, and all ATCA blades have an IPM controller for performing activities as directed by the shelf manager. If the shelf manager cannot recover from a fault condition, the system controller will treat this as un-recoverable fault and failover to the standby.
5 Application Failover and Trillium Protocol Integration HA middleware runs on the system controller blade and monitors applications running on active blades through the middleware agent running on each blade. Failures are detected by exchanging heartbeats between nodes, and upon failure of an active node the standby will take over and become the active node. Checkpointing services in HA middleware are used to share data and parameters between active and standby components. HA middleware provides services to create checkpoints, manage the lifecycle of checkpoints, and establish mechanisms for active components to write the latest state and for standby components to read the latest state. Protocol stacks such as Radisys Trillium protocol software product line consists of sophisticated protocol layers with operating system abstraction and relay called inter-process communication. Since most stacks do not natively comply with recent Service Availability Forum (SAF) specifications on their own, functions are written to provide support for AMF- AIS application programming interfaces (known as an AMF interface engine) and IMM-AIS application programming interfaces (known as a Stack Manager, or SM interface engine). In the case of Trillium stacks ported to SAF-compliant HA middleware, the AMF interface engine handles life cycle requests such as instantiation, suspension, and termination as well as the high availability states like active and standby. Instead of propagating the request directly to the stack, the AMF interface engine also takes care of other housekeeping functions and informs the patented Trillium DFT/HA (Distributed Fault-Tolerant / High Availability) core component to take care of high availability requests. Note too that the Trillium DFT/HA core consists of elements that manage the HA state of the Trillium layers and message delivery between these layers. Such functionality resides in the system controller blade because it controls Trillium stacks running on multiple payload blades and also follows the same Figure 6. Application Redundancy and Failover Architecture Figure 7. Trillium Stack Integration into HA Middleware lifecycle as the middleware in the system controller. A Trillium Protocol Specific Function (PSF) is used to update the protocol-specific state information to the standby unit, and a Trillium Load Distribution Function (LDF) takes care of distributing traffic across available resource sets.
6 Load Balancing & Deep Packet Inspection User data traffic is diverted to the egtp-u server running in the packet processor blade by the load balancer in the hub switch. The load balancer must extract the packets from GTP tunnels and consistently direct sessions to the same blade, processor, and hardware thread on the packet processor blade(s). It is vitally important to maintain the associations between respective GTP flows to the serving blade(s). The GTP-u server de-tunnels the user packets, inspects the user traffic, and then classifies, authenticates, forwards, or blocks as determined by policy control parameters. Extracted user packet are passed to a DPI engine for further processing and action, such as policy control, intrusion detection and prevention (IPS/IDS), lawful intercept, charging (service-based charging as well capacity-based charging), bandwidth shaping, etc. All such functions may be implemented within or in conjunction with the SGW itself. DPI servers can run on compute-centric resources or on dedicated packet processor blades such as Radisys ATCA-PP50s which deliver up to 10G per blade. Fast path software increases performance dramatically, especially for small packet sizes, and an experienced professional services team can extract optimized performance for nearly any application type. In addition to the classic 5-tuple lookup, DPI firewalls have application awareness and understand protocols such as HTTP, SMTP, POP3, IMAP, and FTP, and also recognize the actual applications that rely on those protocols (e.g., BitTorrent, Skype, webmail services, etc). DPI IPS/IDS systems range from connection-oriented intrusions and Denial of Service (DoS) attacks to dynamic, content-based threats such as viruses, worms, trojans, spyware, and phishing. DPI systems are powerful and allow operators to classify traffic by multiple parameters including subscriber, service, application, origin/destination, IP address, and more. Traffic classification can also be based on usage and/or subscriber session traffic patterns. Based on the network traffic classification Figure 8. Sample ATCA Load Balancing Architecture and policies, DPI platforms allow the service provider to control traffic flows including blocking, rate shaping, and redirection of packets to different destinations. DPI platforms can also limit the data rate of users, block access, account and prioritize traffic, etc. For example, with lawful intercept it is necessary to identify a Voice over IP (VoIP) session in a high-speed packet stream, copy the VoIP packets, and then send them to a lawful intercept point for monitoring. In VoIP networks this process is further complicated by the fact that SIP messages can follow a different path in the network than the associated RTP content. With new communication media such as social networking and instant messaging, there is an increasing need for DPI to recognize specific applications and extract session information (e.g., calling party, called party, duration, etc.) to meet legal intercept requirements. Policy management functions supported by DPI systems are charging (whether per-access billing or flat rate), authentication of access (e.g., service access entitlement), QoS prioritization of latency-critical traffic (e.g., prioritization over VoIP, P2P, or general web-browsing traffic), and congestion management based on parameters such as service level or other subscriber-specific flags, time of day, etc.
7 Conclusion ATCA is well-suited to implementing the Serving Gateway node for LTE networks. Through an outsourced platform acquisition strategy, NEPs can attain the key SGW platform functionality required while minimizing time-to-market and development expense. And, by incorporating the latest system technologies such as load balancing and DPI, the SGW represents a fresh opportunity to increase new revenue streams while managing subscriber traffic according to operator-driven priorities. An experienced wireless and DPI solution provider such as Radisys can help NEPs with such ATCA platform integration and, in the process, free up in-house resources to focus on new application development and unique market differentiation. References 1. 3GPP TS 23.401 GPRS enhancements on EUTRAN access 2. 3GPP TS 36.300 E-UTRAN overall description 3. 3GPP TS 29.274 Evolved GPRS Tunneling Protocol for control plane 4. 3GPP TS 29.281 GPRS Tunneling Protocol User Plane Corporate Headquarters 5435 NE Dawson Creek Drive Hillsboro, OR 97124 USA 503-615-1100 Fax 503-615-1121 Toll-Free 800-950-0044 www.radisys.com info@radisys.com 2011 Radisys Corporation. Radisys is a registered trademark of Radisys Corporation. Convedia, Microware and OS-9 are registered trademarks of Radisys Corporation. Promentum, and Procelerant are trademarks of Radisys Corporation. *All other trademarks are the properties of their respective owners. February 2010