OpenStack Swift SwiftStack Global Cluster Deployment Guide Table of Contents Planning Creating Regions Regions Connectivity Requirements Private Connectivity Bandwidth Sizing VPN Connectivity Proxy Read Affinity Region/Zone Based Read Affinity Proxy Write Affinity Dedicated Replication Network Configuring the Network * Regional Use Cases Two Regions, Three Replicas Two Regions, Four Replicas Two Regions, Three/Four Replicas, DR Only Three Regions, Three Replicas Migrating to Multiple Regions Important Things to Consider Page 1 of 11
Planning The concept of Global Clusters allow SwiftStack users the ability to have data spread across multiple disparate geographic regions. We attempt to make this feature as easy as possible, while providing the customizations necessary in order for you to best fulfill your regional needs. While adding and configuring the new Global Cluster features is easy, the resulting data transfer challenges must be thought of ahead of time, and addressed for a successful deployment. Creating Regions Creation of new regions for the purposes of Global Clusters is done via the Cluster Management interface in the SwiftStack controller. By default, a cluster has only one region, the Default Region. Additional regions can be added, and existing regions can be renamed. From a design perspective, regions add a new layer of control for the placement of end objects. The concept of a region was added specifically to support building global clusters. For the purposes of this document, regions can be thought of as: Geographically separate from each other Having high latency connection between individual regions Page 2 of 11
Swift s unique-as-possible data placement algorithm will ensure that data is placed across the available regions in the same fashion it is placed across zones, nodes, and drives. Within each region, zones should be thought of as separate failure domains. In many cases, zones will be individual racks of data or groups of racks of data. Upon a change of regions and zones, you will need to deploy your changes to the cluster, much like any other change within the SwiftStack controller. Regions Connectivity Requirements An individual proxy server must be able to see the storage nodes in all regions of a cluster. Typically one of two methods is used to ensure this is possible: 1. Private Connectivity - site-to-site via MPLS or a private Ethernet circuit 2. VPN Connectivity - a standalone VPN controller through an Internet connection. Note: Customers are responsible for establishing and managing either connectivity option. Both methods require that the routing information, via static routes or a learned routing protocol, be configured on the storage and proxy nodes to support data transfer between regions. Private Connectivity Regional deployments leveraging private connectivity such as Multiprotocol Label Switching (MPLS) or private Ethernet will require the addition of routers on each end of the connection. The specific model and implementation of the routers ts left to the customer to manage. Page 3 of 11
Bandwidth Sizing Customers will need to specify bandwidth utilized across the private circuit. The size of bandwidth needed will depend on a number of factors: Amount of cross-region replication that will be taking place; too small of a circuit will result in bottlenecks for replication traffic. Expected number of operations per second. Depending on the setup, proxies generally need to successfully write one or more copies of the object across the region before returning a success. Because the connection is already high latency, these writes will take longer than if contained in a single region. Bandwidth must be sized to support the expected number of operations per second otherwise increased latency may be experienced by an end user of the system. Cost. A private circuit will increase in cost based on the distance between regions. VPN Connectivity Rather than a private circuit, a regional deployment may use a Virtual Private Network (VPN) Page 4 of 11
over the public Internet to connect the private networks together. The VPN allows the proxy servers and storage nodes to communicate with each other between regions as though they were on a private network together despite being tunneled through the Internet. Downsides to this approach over the private circuit approach include how: Traffic through the VPN uses the bandwidth supporting the Swift cluster, including replication traffic, which reduces the amount of bandwidth available for incoming read and write operations. As a result, the Internet circuit servicing the region may need to be upsized dramatically. Quality of Service cannot be provided over the Internet, so traffic between regions may suffer from varying amounts of latency and jitter. The specific model and implementation of the VPN and routers is left to the customer to manage. This includes third-party hardware appliance VPNs using IPSec, or software VPNs such as OpenVPN. Page 5 of 11
Proxy Read Affinity For Swift object reads (GETs), the proxy servers randomly determine which storage node receives the request. When some of the storage nodes are in a different region, this can cause the GET requests to suddenly become much more latent. To help mitigate this, SwfitStack offers a regional Read Affinity configuration option. Region/Zone Based Read Affinity SwiftStack supports region based read affinities which allows preferred regions and zones, when available, to have higher preference for reads. Once enabled, Swift prioritizes the storage nodes located within the zone and region nearest to where the request comes in. Page 6 of 11
Proxy Write Affinity By default, Swift confirms a majority of object writes (PUTs) before returning success. In a three replica setup, this means two replicas must be written successfully. With global clusters, this may not be desired because one of the replicas may be through a lower-speed high-latent link. New settings allow configurability of proxy writes. A typical setup will use the write affinity to maintain the writes within a region, and then rely on Swift replication to later push data across the other regions. However different use cases will rely on different configurations. The proxy write affinity is flexible but users should remain aware that they are trading data distribution for data throughput. Per the Swift admin guide: The write_affinity setting is useful only when you don't typically read objects immediately after writing them. For example, consider a workload of mainly backups: if you have a bunch of machines in NY that periodically write backups to Swift, then odds are that you don't then immediately read those backups in SF. If your workload doesn't look like that, then you probably shouldn't use write_affinity. Write affinity can be enabled in the same location as read affinity Page 7 of 11
Dedicated Replication Network Because replication traffic can cause considerable spikes in overall network traffic flow, Swift now supports setup and configuration of a separate network dedicated solely to replication. Utilization of this feature is completely optional. Configuring the Network Page 8 of 11
Page 9 of 11
Regional Use Cases Many uses cases exist in order to meet a range of objectives. Below is a short list of common use cases, however given the flexibility of global cluster deployment this is by no means an exhaustive list. Two Regions, Three Replicas Asynchronous Offsite Real-Time Offsite Two Regions, Four Replicas When running two regions, it is generally recommended to have two replicas in each location to ensure availability and durability. If region 1 would be temporarily down, or in the case of a major disaster in region 1, only having one replica in region 2 would introduce a risk of data loss if a drive on which the third replica lives. Therefore, in the (unlikely) case of a catastrophic failure of region 1 and a complete drive failure in region 2, data loss would occur. Hence, even though the risk of data loss is very low, running 2x2 replicas would protect against this unlikely, but not inconceivable case. Two Regions, Three/Four Replicas, DR Only This setup has no proxy server in the DR zone, so data is redundant but not necessarily highly available. Three Regions, Three Replicas Fetch newest ability Page 10 of 11
Migrating to Multiple Regions There are a number of strategies that can be used to help minimize the amount of time required for data to be settled once the the decision has been made to expand Swift data across multiple regions, either for a new or for an existing Swift cluster. The best way to migrate into a multiple region cluster depends on many factors, such as latency between the two regions, amount of data already in the cluster and the intended use-case. To plan for a multi-region setup, it is recommended that you contact SwiftStack directly to discuss your plans. With a SwiftStack Swift cluster there is functionality built into the controller, which will allow a managed migration from one local cluster to a global cluster, including increasing the replica count from the default three to four replicas, and doing so in a gradual manner. Geographic Location for Initial Replications Because regional data will be replicated across a (relatively) lower speed link, it is desirable to stand up the new region hardware relatively close to the existing region hardware. This allows the initial regional replication to make use of as high speed link as possible. After regional replication has occurred, the servers for the new region can be shut down and shipped to the new regional site. This practice may be easier in theory than in reality, but it is something to be considered. Important Things to Consider An on-premise controller in a single region may be problematic if that region goes down, as it won t be reachable to make changes to the region that is still up. If the same object, with different contents, is written to multiple proxies across multiple regions, it s very likely data will be inconsistent, and that the client may receive back different responses on multiple GETs to a proxy. This will become eventually consistent on replication, but that obviously takes time. Customer may have to use the fetch newest ability built into the Swift API in order to guarantee the newest object. If data comes into your primary region faster than it can be replicated across the WAN to a secondary region, then it will continue piling up over time. The only recourse for this is to disable proxy write affinity and allow incoming data to traverse the slower WAN link for a period of time until replication catches up. Page 11 of 11