rackspace.com/cloud/private
Rackspace Private Cloud v9 Installation RPC v9.0 (2015-09-10) Copyright 2015 Rackspace All rights reserved. This documentation is intended for Rackspace customers who are interested in installing an Open- Stack-powered private cloud according to the recommendations of Rackspace. ii
Table of Contents 1. Preface... 1 1.1. About Rackspace Private Cloud Software... 1 1.2. Rackspace Private Cloud configuration... 1 1.3. Rackspace Private Cloud support... 1 2. Overview... 3 2.1. Ansible... 3 2.2. Linux Containers (LXC)... 3 2.3. Host layout... 4 2.4. Host networking... 6 2.5. OpenStack Networking... 10 2.6. Installation requirements... 13 2.7. Installation workflow... 13 3. Deployment host... 15 3.1. Installing the operating system... 15 3.2. Configuring the operating system... 15 3.3. Installing source and dependencies... 15 3.4. Configuring Secure Shell (SSH) keys... 16 4. Target hosts... 17 4.1. Installing the operating system... 17 4.2. Configuring Secure Shell (SSH) keys... 17 4.3. Configuring the operating system... 18 4.4. Configuring LVM... 18 4.5. Configuring the network... 18 4.5.1. Reference architecture... 19 4.5.2. Configuring the network on a target host... 21 5. Deployment configuration... 27 5.1. Prerequisites... 27 5.2. Configuring target host networking... 27 5.3. Configuring target hosts... 30 5.4. Configuring service passwords... 33 5.5. Configuring the hypervisor (optional)... 33 5.6. Configuring the Image Service (optional)... 33 5.7. Configuring the Block Storage service (optional)... 35 5.7.1. Configuring Block Storage Service for multiple NetApp backends (optional)... 36 5.8. Configure the Block Storage Service with NFS protocols (optional).... 37 5.9. Creating Block Storage availability zones (optional)... 38 6. Foundation playbooks... 39 6.1. Running the foundation playbook... 39 6.2. Troubleshooting... 39 7. Infrastructure playbooks... 41 7.1. Running the infrastructure playbook... 41 7.2. Verifying infrastructure operation... 42 8. OpenStack playbooks... 43 8.1. Utility Container Overview... 43 8.2. Running the OpenStack playbook... 44 8.3. Verifying OpenStack operation... 45 9. Rackspace Private Cloud monitoring... 47 iii
9.1. Service and response... 47 9.2. Hardware monitoring... 47 9.3. Software monitoring... 47 9.4. CDM monitoring... 48 9.5. Running monitoring playbooks... 48 10. Operations... 50 10.1. Adding a compute host... 50 10.2. Galera cluster maintenance... 51 10.2.1. Removing nodes... 51 10.2.2. Starting a cluster... 51 10.3. Galera cluster recovery... 53 10.3.1. Single-node failure... 53 10.3.2. Multi-node failure... 53 10.3.3. Complete failure... 55 10.3.4. Restoring from backup after a complete failure... 55 10.3.5. Rebuilding a container... 57 11. Additional resources... 60 11.1. Document change history... 60 iv
List of Figures 2.1. Host Layout Overview... 6 2.2. Network components... 7 2.3. Container network architecture... 9 2.4. Bare/Metal network architecture... 10 2.5. Networking agents containers... 11 2.6. Compute hosts... 12 2.7. Installation workflow... 14 3.1. Installation workflow... 15 4.1. Installation workflow... 17 4.2. Target hosts for infrastructure, networking, and storage services... 23 4.3. Target hosts for Compute service... 24 5.1. Installation workflow... 27 6.1. Installation workflow... 39 7.1. Installation workflow... 41 8.1. Installation workflow... 43 v
1. Preface Rackspace Private Cloud Software has been developed by Rackspace as a way to quickly install an OpenStack private cloud, configured as recommended by Rackspace OpenStack specialists. 1.1. About Rackspace Private Cloud Software Rackspace Private Cloud Software uses Ansible to create an OpenStack cluster on Ubuntu Linux. The installation process provides a familiar approach for Linux system administrators, and the environment can be updated easily without downloading and installing a new ISO. 1.2. Rackspace Private Cloud configuration Rackspace Private Cloud Software uses Ansible and Linux Containers (LXC) to install and manage OpenStack Icehouse with the following services: Identity (keystone) Image Service (glance) Compute (nova) Networking (neutron) Block Storage (cinder) Orchestration (heat) Dashboard (horizon) RPC also provides the following infrastructure, monitoring, and logging services to support OpenStack: Galera with MariaDB RabbitMQ Memcached Rsyslog Logstash Elasticsearch with Kibana 1.3. Rackspace Private Cloud support Rackspace offers 365x24x7 support for Rackspace Private Cloud Software. If you are interested in purchasing Escalation Support or Core Support for your cloud, or taking advantage of our training offerings, contact us at: <opencloudinfo@rackspace.com>. 1
You can also visit the RPC community forums. The forum is open to all RPC users and is moderated and maintained by Rackspace personnel and OpenStack specialists: https://community.rackspace.com/products/f/45 For more information about Rackspace Private Cloud, please visit the Rackspace Private Cloud pages: Software and Reference Architecture Support Resources For any other information regarding Rackspace Private Cloud Software, refer to the Rackspace Private Cloud release notes. 2
2. Overview Rackspace Private Cloud (RPC) v9 Software uses a combination of Ansible and Linux Containers (LXC) to install and manage OpenStack Icehouse. This chapter discusses the following topics: The technology used by Rackspace Private Cloud Software The environment and network architecture Requirements to install Rackspace Private Cloud Software The installation process workflow 2.1. Ansible RPC v9 Software uses a combination of Ansible and Linux Containers (LXC) to install and manage OpenStack Icehouse. Ansible provides an automation platform to simplify system and application deployment. Ansible manages systems using Secure Shell (SSH) instead of unique protocols that require remote daemons or agents. Ansible uses playbooks written in the YAML language for orchestration. For more information, see Ansible - Intro to Playbooks. In this guide, Rackspace refers to the host running Ansible playbooks as the deployment host and the hosts on which Ansible installs RPC as the target hosts. A recommended layout for installing RPC involves five target hosts in total: three infrastructure hosts, one compute host, and one logging host. RPC software also supports one or more optional storage hosts. All hosts require at least four 10 Gbps network interfaces. In Rackspace datacenters, hosts can use an additional 1 Gbps network interface for service network access. More information on setting up target hosts can be found in Section 2.3, Host layout [4]. For more information on physical, logical, and virtual network interfaces within hosts see Section 2.4, Host networking [6]. 2.2. Linux Containers (LXC) Containers provide operating-system level virtualization by enhancing the concept of chroot environments, which isolate resources and file systems for a particular group of processes without the overhead and complexity of virtual machines. They access the same kernel, devices, and file systems on the underlying host and provide a thin operational layer built around a set of rules. The Linux Containers (LXC) project implements operating system level virtualization on Linux using kernel namespaces and includes the following features: Resource isolation including CPU, memory, block I/O, and network using cgroups. 3
Selective connectivity to physical and virtual network devices on the underlying physical host. Support for a variety of backing stores including LVM. Built on a foundation of stable Linux technologies with an active development and support community. Useful commands: List containers and summary information such as operational state and network configuration: # lxc-ls --fancy Show container details including operational state, resource utilization, and veth pairs: # lxc-info --name container_name Start a container: # lxc-start --name container_name Attach to a container: # lxc-attach --name container_name Stop a container: # lxc-stop --name container_name 2.3. Host layout The recommended layout contains a minimum of five hosts (or servers). Three infrastructure hosts One compute host One logging host To use the optional Block Storage (cinder) service, a sixth host is required. Block Storage hosts require an LVM volume group named cinder-volumes. See Section 2.6, Installation requirements [13] and Section 4.4, Configuring LVM [18] for more information. The hosts are called target hosts because Ansible deploys the RPC environment within these hosts. The RPC environment also requires a deployment host from which Ansible orchestrates the deployment process. One of the target hosts can function as the deployment host. 4
At least one hardware load balancer must be included to manage the traffic among the target hosts. Infrastructure target hosts contain the following services: Infrastructure: Galera RabbitMQ Memcached Logging OpenStack: Identity (keystone) Image Service (glance) Compute management (nova) Networking (neutron) Orchestration (heat) Dashboard (horizon) Compute target hosts contain the following services: Compute virtualization Logging Logging target hosts contain the following services: Rsyslog Logstash Elasticsearch with Kibana (Optional) Storage target hosts contain the following services: Block Storage scheduler Block Storage volumes 5
Figure 2.1. Host Layout Overview 2.4. Host networking The combination of containers and flexible deployment options requires implementation of advanced Linux networking features such as bridges and namespaces. Bridges provide layer 2 connectivity (similar to switches) among physical, logical, and virtual network interfaces within a host. After creating a bridge, the network interfaces are virtually "plugged in" to it. RPC software uses bridges to connect physical and logical network interfaces on the host to virtual network interfaces within containers on the host. Namespaces provide logically separate layer 3 environments (similar to routers) within a host. Namespaces use virtual interfaces to connect with other namespaces including the host namespace. These interfaces, often called veth pairs, are virtually "plugged in" between namespaces similar to patch cables connecting physical devices such as switches and routers. Each container has a namespace that connects to the host namespace with one or more veth pairs. Unless specified, the system generates random names for veth pairs. The relationship between physical interfaces, logical interfaces, bridges, and virtual interfaces within containers is shown in Figure 2.2, Network components [7]. 6
Figure 2.2. Network components Target hosts can contain the following network bridges: LXC internal lxcbr0: Mandatory (automatic) Provides external (typically internet) connectivity to containers. Automatically created and managed by LXC. Does not directly attach to any physical or logical interfaces on the host because iptables handles connectivity. Attaches to eth0 in each container. Container management br-mgmt: Mandatory Provides management of and communication among infrastructure and OpenStack services. Manually created and attaches to a physical or logical interface, typically a bond0 VLAN subinterface. Also attaches to eth1 in each container. Storage br-storage: Optional 7
Provides segregated access to block storage devices between Compute and Block Storage hosts. Manually created and attaches to a physical or logical interface, typically a bond0 VLAN subinterface. Also attaches to eth2 in each associated container. OpenStack Networking tunnel/overlay br-vxlan: Mandatory Provides infrastructure for VXLAN tunnel/overlay networks. Manually created and attaches to a physical or logical interface, typically a bond1 VLAN subinterface. Also attaches to eth10 in each associated container. OpenStack Networking provider br-vlan: Mandatory Provides infrastructure for VLAN and flat networks. Manually created and attaches to a physical or logical interface, typically bond1. Also attaches to eth11 in each associated container. Does not contain an IP address because it only handles layer 2 connectivity. Figure 2.3, Container network architecture [9] provides a visual representation of network components for services in containers. 8
Figure 2.3. Container network architecture The RPC software installs the Compute service in a bare metal environment rather than within a container. Figure 2.4, Bare/Metal network architecture [10] provides a visual representation of the unique layout of network components on a Compute host. 9
Figure 2.4. Bare/Metal network architecture 2.5. OpenStack Networking OpenStack Networking (neutron) is configured to use a DHCP agent, L3 Agent and Linux Bridge agent within a networking agents container. Figure 2.5, Networking agents containers [11] shows the interaction of these agents, network components, and connection to a physical network. 10
Figure 2.5. Networking agents containers 11
The Compute service uses the KVM hypervisor. Figure 2.6, Compute hosts [12] shows the interaction of instances, Linux Bridge agent, network components, and connection to a physical network. Figure 2.6. Compute hosts 12
2.6. Installation requirements Deployment host: Required items: Ubuntu 14.04 LTS (Trusty Tahr) or compatible operating system that meets all other requirements. Secure Shell (SSH) client supporting public key authentication. Synchronized network time (NTP) client. Python 2.7 or later. Target hosts: Required items: Ubuntu Server 14.04 LTS (Trusty Tahr) 64-bit operating system, with Linux kernel version 3.13.0-34-generic or later. SSH server supporting public key authentication. Synchronized NTP client. Optional items: For hosts providing Block Storage (cinder) service volumes, a Logical Volume Manager (LVM) volume group named cinder-volumes. LVM volume group named lxc to store container file systems. If the lxc volume group does not exist, containers in the root file system will automatically be installed. Note 2.7. Installation workflow Each container creates a 5 GB logical volume. Plan storage accordingly to support the quantity of containers on each target host. This diagram shows the general workflow associated with RPC installation. 13
Figure 2.7. Installation workflow 14
3. Deployment host Figure 3.1. Installation workflow The RPC software installation process requires one deployment host. The deployment host contains Ansible and orchestrates the RPC installation on the target hosts. One of the target hosts, preferably one of the infrastructure variants, can be used as the deployment host. To use a deployment host as a target host, follow the steps in Chapter 4, Target hosts [17] on the deployment host. This guide assumes separate deployment and target hosts. 3.1. Installing the operating system Install the Ubuntu Server 14.04 (Trusty Tahr) LTS 64-bit operating system on the deployment host with at least one network interface configured to access the Internet or suitable local repositories. 3.2. Configuring the operating system Install additional software packages and configure NTP. 1. Install additional software packages if not already installed during operating system installation: # apt-get install aptitude build-essential git ntp ntpdate \ openssh-server python-dev sudo 2. Configure NTP to synchronize with a suitable time source. 3.3. Installing source and dependencies Install the source and dependencies for the deployment host. 1. Clone the repository into the /opt directory: 15
# cd /opt # git clone -b TAG https://github.com/openstack/openstack-ansible.git Replace TAG with the current stable release tag. 2. Install pip 1.5.6 and dependencies: # curl -O https://bootstrap.pypa.io/get-pip.py # python get-pip.py \ --find-links="http://mirror.rackspace.com/rackspaceprivatecloud/ python_packages/icehouse" \ --no-index 3. Install Ansible and dependencies: # pip install -r /opt/openstack-ansible/requirements.txt Note The command described above will install the correct version of Ansible. Do not use the apt package manager. If Ansible is installed using the latter, uninstall it before performing this step. 3.4. Configuring Secure Shell (SSH) keys Ansible uses Secure Shell (SSH) with public key authentication for connectivity between the deployment and target hosts. To reduce user interaction during Ansible operations, key pairs should not include passphrases. However, if a passphrase is required, consider using the ssh-agent and ssh-add commands to temporarily store the passphrase before performing Ansible operations. 16
4. Target hosts Figure 4.1. Installation workflow The RPC software installation process requires at least five target hosts that will contain the OpenStack environment and supporting infrastructure. On each target host, perform the following tasks: Naming target hosts. Install the operating system. Generate and set up security measures. Update the operating system and install additional software packages. Create LVM volume groups. Configure networking devices. 4.1. Installing the operating system Install the Ubuntu Server 14.04 (Trusty Tahr) LTS 64-bit operating system on the target host with at least one network interface configured to access the Internet or suitable local repositories. Note On target hosts without local (console) access, Rackspace recommends adding the Secure Shell (SSH) server packages to the installation. 4.2. Configuring Secure Shell (SSH) keys Ansible uses Secure Shell (SSH) for connectivity between the deployment and target hosts. 1. Copy the contents of the public key file on the deployment host to the /root/.ssh/ authorized_keys on each target host. 2. Test public key authentication from the deployment host to each target host. SSH should provide a shell without asking for a password. 17
4.3. Configuring the operating system Check the kernel version, install additional software packages, and configure NTP. 1. Check the kernel version. It should be 3.13.0-34-generic or later. 2. Install additional software packages if not already installed during operating system installation: # apt-get install bridge-utils debootstrap ifenslave lsof \ lvm2 ntp ntpdate openssh-server sudo tcpdump vlan 3. Add the appropriate kernel modules to the etc/modules file to enable VLAN and bond interfaces: # echo 'bonding' >> /etc/modules # echo '8021q' >> /etc/modules 4. Configure NTP to synchronize with a suitable time source. 5. Reboot the host to activate the changes. 4.4. Configuring LVM 1. To use the optional Block Storage (cinder) service, create an LVM volume group named cinder-volumes on the Block Storage host. A metadata size of 2048 must be specified during physical volume creation. For example: # pvcreate --metadatasize 2048 physical_volume_device_path # vgcreate cinder-volumes physical_volume_device_path 2. Optionally, create an LVM volume group named lxc for container file systems. If the lxc volume group does not exist, containers will be automatically installed into the file system under /var/lib/lxc by default. 4.5. Configuring the network Although Ansible automates most deployment operations, networking on target hosts requires manual configuration because it can vary dramatically per environment. For demonstration purposes, these instructions use a reference architecture with example network interface names, networks, and IP addresses. Modify these values as needed for the particular environment. The reference architecture for target hosts contains the following mandatory components: A bond0 interface using two physical interfaces. For redundancy purposes, avoid using more than one port on network interface cards containing multiple ports. The example configuration uses eth0 and eth2. Actual interface names can vary depending on hardware and drivers. Configure the bond0 interface with a static IP address on the host management network. 18
A bond1 interface using two physical interfaces. For redundancy purposes, avoid using more than one port on network interface cards containing multiple ports. The example configuration uses eth1 and eth3. Actual interface names can vary depending on hardware and drivers. Configure the bond1 interface without an IP address. Container management network subinterface on the bond0 interface and br-mgmt bridge with a static IP address. The OpenStack Networking VXLAN subinterface on the bond1 interface and br-vxlan bridge with a static IP address. The OpenStack Networking VLAN br-vlan bridge on the bond1 interface without an IP address. The reference architecture for target hosts can also contain the following optional components: Storage network subinterface on the bond0 interface and br-storage bridge with a static IP address. For more information, see OpenStack Ansible Networking. 4.5.1. Reference architecture After establishing initial host management network connectivity using the bond0 interface, modify the /etc/network/interfaces file as described in the following procedure. Procedure 4.1. Modifying the network interfaces file 1. Physical interfaces: # Physical interface 1 auto eth0 iface eth0 inet manual bond-master bond0 bond-primary eth0 # Physical interface 2 auto eth1 iface eth1 inet manual bond-master bond1 bond-primary eth1 # Physical interface 3 auto eth2 iface eth2 inet manual bond-master bond0 # Physical interface 4 auto eth3 iface eth3 inet manual bond-master bond1 2. Bonding interfaces: # Bond interface 0 (physical interfaces 1 and 3) auto bond0 iface bond0 inet static 19
bond-slaves none bond-mode active-backup bond-miimon 100 bond-downdelay 200 bond-updelay 200 address HOST_IP_ADDRESS netmask HOST_NETMASK gateway HOST_GATEWAY dns-nameservers HOST_DNS_SERVERS # Bond interface 1 (physical interfaces 2 and 4) auto bond1 iface bond1 inet manual bond-slaves none bond-mode active-backup bond-miimon 100 bond-downdelay 250 bond-updelay 250 If not already complete, replace HOST_IP_ADDRESS, HOST_NETMASK, HOST_GATEWAY, and HOST_DNS_SERVERS with the appropriate configuration for the host management network. 3. Logical (VLAN) interfaces: # Container management VLAN interface iface bond0.container_mgmt_vlan_id inet manual vlan-raw-device bond0 # OpenStack Networking VXLAN (tunnel/overlay) VLAN interface iface bond1.tunnel_vlan_id inet manual vlan-raw-device bond1 # Storage network VLAN interface (optional) iface bond0.storage_vlan_id inet manual vlan-raw-device bond0 Replace *_VLAN_ID with the appropriate configuration for the environment. 4. Bridge devices: # Container management bridge auto br-mgmt iface br-mgmt inet static bridge_stp off bridge_waitport 0 bridge_fd 0 # Bridge port references tagged interface bridge_ports bond0.container_mgmt_vlan_id address CONTAINER_MGMT_BRIDGE_IP_ADDRESS netmask CONTAINER_MGMT_BRIDGE_NETMASK dns-nameservers CONTAINER_MGMT_BRIDGE_DNS_SERVERS # OpenStack Networking VXLAN (tunnel/overlay) bridge auto br-vxlan iface br-vxlan inet static bridge_stp off bridge_waitport 0 bridge_fd 0 # Bridge port references tagged interface 20
bridge_ports bond1.tunnel_vlan_id address TUNNEL_BRIDGE_IP_ADDRESS netmask TUNNEL_BRIDGE_NETMASK # OpenStack Networking VLAN bridge auto br-vlan iface br-vlan inet manual bridge_stp off bridge_waitport 0 bridge_fd 0 # Bridge port references untagged interface bridge_ports bond1 # Storage bridge (optional) auto br-storage iface br-storage inet static bridge_stp off bridge_waitport 0 bridge_fd 0 # Bridge port reference tagged interface bridge_ports bond0.storage_vlan_id address STORAGE_BRIDGE_IP_ADDRESS netmask STORAGE_BRIDGE_NETMASK Replace *_VLAN_ID, *_BRIDGE_IP_ADDRESS, and *_BRIDGE_NETMASK, *_BRIDGE_DNS_SERVERS with the appropriate configuration for the environment. 4.5.2. Configuring the network on a target host This example uses the following parameters to configure networking on a single target host. The sample interface configurations are intended to illustrate the scope and design of the network. When deploying outside of Rackspace data centers, settings should be configured according to the data center's network hardware. See Figure 4.2, Target hosts for infrastructure, networking, and storage services [23] and Figure 4.3, Target hosts for Compute service [24] for a visual representation of these parameters in the architecture. VLANs: Host management: Untagged/Native Container management: 10 Tunnels: 30 Storage: 20 Networks: Host management: 10.240.0.0/22 Container management: 172.29.236.0/22 Tunnel: 172.29.240.0/22 Storage: 172.29.244.0/22 21
Addresses: Host management: 10.240.0.11 Host management gateway: 10.240.0.1 DNS servers: 69.20.0.164 69.20.0.196 Container management: 172.29.236.11 Tunnel: 172.29.240.11 Storage: 172.29.244.11 22
Figure 4.2. Target hosts for infrastructure, networking, and storage services 23
Figure 4.3. Target hosts for Compute service Contents of the /etc/network/interfaces file: # Physical interface 1 auto eth0 iface eth0 inet manual bond-master bond0 bond-primary eth0 # Physical interface 2 auto eth1 iface eth1 inet manual bond-master bond1 bond-primary eth1 # Physical interface 3 auto eth2 iface eth2 inet manual bond-master bond0 # Physical interface 4 auto eth3 iface eth3 inet manual bond-master bond1 24
# Bond interface 0 (physical interfaces 1 and 3) auto bond0 iface bond0 inet static bond-slaves none bond-mode active-backup bond-miimon 100 bond-downdelay 200 bond-updelay 200 address 10.240.0.11 netmask 255.255.252.0 gateway 10.240.0.1 dns-nameservers 69.20.0.164 69.20.0.196 # Bond interface 1 (physical interfaces 2 and 4) auto bond1 iface bond1 inet manual bond-slaves none bond-mode active-backup bond-miimon 100 bond-downdelay 250 bond-updelay 250 # Container management VLAN interface iface bond0.10 inet manual vlan-raw-device bond0 # OpenStack Networking VXLAN (tunnel/overlay) VLAN interface iface bond1.30 inet manual vlan-raw-device bond1 # Storage network VLAN interface (optional) iface bond0.20 inet manual vlan-raw-device bond0 # Container management bridge auto br-mgmt iface br-mgmt inet static bridge_stp off bridge_waitport 0 bridge_fd 0 # Bridge port references tagged interface bridge_ports bond0.10 address 172.29.236.11 netmask 255.255.252.0 dns-nameservers 69.20.0.164 69.20.0.196 # OpenStack Networking VXLAN (tunnel/overlay) bridge auto br-vxlan iface br-vxlan inet static bridge_stp off bridge_waitport 0 bridge_fd 0 # Bridge port references tagged interface bridge_ports bond1.30 address 172.29.240.11 netmask 255.255.252.0 # OpenStack Networking VLAN bridge auto br-vlan iface br-vlan inet manual 25
bridge_stp off bridge_waitport 0 bridge_fd 0 # Bridge port references untagged interface bridge_ports bond1 # Storage bridge (optional) auto br-storage iface br-storage inet static bridge_stp off bridge_waitport 0 bridge_fd 0 # Bridge port reference tagged interface bridge_ports bond0.20 address 172.29.244.11 netmask 255.255.252.0 In non-rackspace data centers, the service network configuration should be commented out of /etc/rpc_deploy/rpc_user_config.yml. The commented-out section should appear as follows: # Cidr used in the Service network # snet: 172.29.248.0/22 #- network: # group_binds: # - glance_api # - nova_compute # - neutron_linuxbridge_agent # type: "raw" # container_bridge: "br-snet" # container_interface: "eth3" # ip_from_q: "snet" 26
5. Deployment configuration Figure 5.1. Installation workflow Ansible references a handful of files containing mandatory and optional configuration directives. These files must be modified to define the target environment before running the Ansible playbooks. Perform the following tasks: Configure Target host networking to define bridge interfaces and networks. Configure a list of target hosts on which to install the software. Configure virtual and physical network relationships for OpenStack Networking (neutron). Configure passwords for all services. (Optional) Configure the hypervisor. (Optional) Configure Block Storage (cinder) to use NetApp backends. (Optional) Create Block Storage availability zones. 5.1. Prerequisites Copy the contents of the /opt/openstack-ansible/etc/rpc_deploy directory to the /etc/rpc_deploy directory. # cp -R /opt/openstack-ansible/etc/rpc_deploy /etc 5.2. Configuring target host networking Modify the /etc/rpc_deploy/rpc_user_config.yml file to configure networking. 1. Configure the IP address ranges associated with each network in the cidr_networks section: 27
cidr_networks: # Container management network container: CONTAINER_MGMT_CIDR # Tunnel network tunnel: TUNNEL_CIDR # Storage network (optional) storage: STORAGE_CIDR Replace *_CIDR with the appropriate IP address range in CIDR notation. For example, 203.0.113.0/24. Note Use the same IP address ranges as the underlying physical network interfaces or bridges configured in Section 4.5, Configuring the network [18]. For example, if the container network uses 203.0.113.0/24, the CONTAINER_MGMT_CIDR should also use 203.0.113.0/24. The default configuration includes the optional storage and service networks. To remove one or both of them, comment out the appropriate network name. 2. Configure the existing IP addresses in the used_ips section: used_ips: - EXISTING_IP_ADDRESSES Replace EXISTING_IP_ADDRESSES with a list of existing IP addresses in the ranges defined in the previous step. This list should include all IP addresses manually configured on target hosts in the Section 4.5, Configuring the network [18], internal load balancers, service network bridge, and any other devices to avoid conflicts during the automatic IP address generation process. Note Add individual IP addresses on separate lines. For example, to prevent use of 203.0.113.101 and 201: used_ips: - 203.0.113.101-203.0.113.201 Add a range of IP addresses using a comma. For example, to prevent use of 203.0.113.101-201: used_ips: - 203.0.113.101, 203.0.113.201 3. Configure load balancing in the global_overrides section: 28
global_overrides: # Internal load balancer VIP address internal_lb_vip_address: INTERNAL_LB_VIP_ADDRESS # External (DMZ) load balancer VIP address external_lb_vip_address: EXTERNAL_LB_VIP_ADDRESS # Load balancer hostname lb_name: LB_HOSTNAME # Container network bridge device management_bridge: "MGMT_BRIDGE" # Tunnel network bridge device tunnel_bridge: "TUNNEL_BRIDGE" Replace INTERNAL_LB_VIP_ADDRESS with the internal IP address of the load balancer. Infrastructure and OpenStack services use this IP address for internal communication. Replace EXTERNAL_LB_VIP_ADDRESS with the external, public, or DMZ IP address of the load balancer. Users primarily use this IP address for external API and web interfaces access. Replace LB_HOSTNAME with the hostname of the load balancer that resolves to the external, public, or DMZ IP address of the load balancer. Replace MGMT_BRIDGE with the container bridge device name, typically br-mgmt. Replace TUNNEL_BRIDGE with the tunnel/overlay bridge device name, typically brvxlan. 4. Configure optional networks in the provider_networks subsection: provider_networks: - network: group_binds: - glance_api - cinder_api - cinder_volume - nova_compute type: "raw" container_bridge: "br-storage" container_interface: "eth2" ip_from_q: "storage" Note The default configuration includes the optional storage and service networks. To remove one or both of them, comment out the entire associated stanza beginning with the - network: line. 5. Configure OpenStack Networking tunnel/overlay network in the provider_networks subsection: 29
provider_networks: - network: group_binds: - neutron_linuxbridge_agent container_bridge: "br-vxlan" container_interface: "eth10" ip_from_q: "tunnel" type: "vxlan" range: "TUNNEL_ID_RANGE" net_name: "vxlan" Replace TUNNEL_ID_RANGE with the tunnel ID range. For example, 1:1000. 6. Configure OpenStack Networking provider networks in the provider_networks subsection: provider_networks: - network: group_binds: - neutron_linuxbridge_agent container_bridge: "br-vlan" container_interface: "eth11" type: "flat" net_name: "vlan" - network: group_binds: - neutron_linuxbridge_agent container_bridge: "br-vlan" container_interface: "eth11" type: "vlan" range: VLAN_ID_RANGE net_name: "vlan" Replace VLAN_ID_RANGE with the VLAN ID range for each VLAN provider network. For example, 1:1000. Create a similar stanza for each additional provider network. 5.3. Configuring target hosts Modify the /etc/rpc_deploy/rpc_user_config.yml file to configure the target hosts. Warning Do not assign the same IP address to different target hostnames. Unexpected results may occur. Each IP address and hostname must be a matching pair. To use the same host in multiple roles, for example infrastructure and networking, specify the same hostname and IP in each section. Use short hostnames rather than fully-qualified domain names (FQDN) to prevent length limitation issues with LXC and SSH. For example, a suitable short hostname for a compute host might be: 123456-Compute001. 1. Configure a list containing at least three infrastructure target hosts in the infra_hosts section: 30
infra_hosts: 603975-infra01: ip: INFRA01_IP_ADDRESS 603989-infra02: ip: INFRA02_IP_ADDRESS 627116-infra03: ip: INFRA03_IP_ADDRESS 628771-infra04:... Replace *_IP_ADDRESS with the IP address of the br-mgmt container management bridge on each infrastructure target host. Use the same netblock as bond0 on the nodes, for example: infra_hosts: 603975-infra01: ip: 10.240.0.80 603989-infra02: ip: 10.240.0.81 627116-infra03: ip: 10.240.0.184 2. (Optional) Add container affinity to infrastructure hosts. By default, the installation process deploys one instance of each container type per host. Affinity enables deployment of different quantities of each container type per host. For example, deploying two Memcached containers on an infrastructure host: infra_hosts: 603975-infra01: affinity: memcached_container: 2 Affinity also supports deploying zero instances of a container type per host. For example, deploying zero Galera containers on an infrastructure host: infra_hosts: 603975-infra01: affinity: galera_container: 0 For more information, see the list of container types that support affinity [32]. 3. Configure a list containing at least one network target host in the network_hosts section: network_hosts: 602117-network01: ip: NETWORK01_IP_ADDRESS 602534-network02:... Replace *_IP_ADDRESS with the IP address of the br-mgmt container management bridge on each network target host. 4. Configure a list containing at least one compute target host in the compute_hosts section: 31
compute_hosts: 900089-compute001: ip: COMPUTE001_IP_ADDRESS 900090-compute002:... Replace *_IP_ADDRESS with the IP address of the br-mgmt container management bridge on each compute target host. 5. Configure a list containing at least one logging target host in the log_hosts section: log_hosts: 900088-logging01: ip: LOGGER1_IP_ADDRESS 903877-logging02:... Replace *_IP_ADDRESS with the IP address of the br-mgmt container management bridge on each logging target host. 6. Configure a list containing at least one optional storage host in the storage_hosts section: storage_hosts: 100338-storage01: ip: STORAGE01_IP_ADDRESS 100392-storage02:... Replace *_IP_ADDRESS with the IP address of the br-mgmt container management bridge on each storage target host. Each storage host also requires additional configuration to define the backend driver. Note The default configuration includes an optional storage host. To install without storage hosts, comment out the stanza beginning with the storage_hosts: line. The following container types support affinity in typical deployments: cinder_api_container galera_container glance_container glance_container heat_apis_container heat_engine_container horizon_container memcached_container nova_api_ec2_container nova_api_metadata_container 32
nova_api_os_compute_container nova_cert_container nova_conductor_container nova_scheduler_container nova_spice_console_container 5.4. Configuring service passwords Change the default password for all services in the /etc/rpc_deploy/ user_variables.yml file. Consider using Ansible Vault to increase security by encrypting this file. Note that the following options configure passwords for the web interfaces: keystone_auth_admin_password configures the admin tenant password for both the OpenStack API and dashboard access. kibana_password configures the password for Kibana web interface access. The openstack-ansible repository provides a script to generate random passwords for each service. For example: # cd /opt/openstack-ansible/scripts # pw-token-gen.py --file /etc/rpc_deploy/user_variables.yml To regenerate existing passwords, add the --regen flag. 5.5. Configuring the hypervisor (optional) By default, the KVM hypervisor is used. If you are deploying to a host that does not support KVM hardware acceleration extensions, select a suitable hypervisor type such as qemu or lxc. To change the hypervisor type, uncomment and edit the following line in the /etc/ rpc_deploy/user_variables.yml file: # nova_virt_type: kvm 5.6. Configuring the Image Service (optional) In an all-in-one deployment with a single infrastructure node, the Image Service uses the local file system on the target host to store images. In a Rackspace deployment with three infrastructure nodes, the Image Service must use Cloud Files or NetApp. The following procedure describes how to modify the /etc/rpc_deploy/user_variables.yml file to enable Cloud Files usage. 1. Change the default store to use swift, the underlying architecture of Cloud Files: glance_default_store: swift 33
2. Set the appropriate authentication URL: For US Rackspace cloud accounts: rackspace_cloud_auth_url: https://identity.api.rackspacecloud.com/v2.0 For UK Rackspace cloud accounts: rackspace_cloud_auth_url: https://lon.identity.api.rackspacecloud.com/v2.0 3. Set Rackspace cloud account credentials by locating the RAX_CLOUD_TENANT_ID: a. Log into mycloud.rackspace.com as the desired user. b. Locate and click on the Account: $USERNAME link in the upper right section of the screen. c. Copy the Account Number shown. 4. Set the remaining Rackspace cloud account credentials with the RAX_CLOUD_TENANT_ID: rackspace_cloud_tenant_id: RAX_CLOUD_TENANT_ID rackspace_cloud_username: RAX_CLOUD_USER_NAME rackspace_cloud_password: RAX_CLOUD_PASSWORD 5. Change the glance_swift_store_endpoint_type from the default internalurl settings to public if needed. Glance services are typically backed by Rackspace cloud files in the Rackspace Data Centre. If the OpenStack environment needs to run outside the datacentre, adjust the key value: glance_swift_store_endpoint_type: publicurl 6. Replace RAX_CLOUD_* with the appropriate Rackspace cloud account credential components. 7. Define the store name: glance_swift_store_container: STORE_NAME Replace STORE_NAME with the store name in Cloud Files. If the store doesn't exist, a new store will be created. 8. Define the store region: glance_swift_store_region: STORE_REGION Replace STORE_REGION with one of the following region codes: DFW, HKG, IAD, LON, ORD, SYD. Note UK Rackspace cloud accounts must use the LON region. US Rackspace cloud accounts can use any region except LON. 34
5.7. Configuring the Block Storage service (optional) By default, the Block Storage service uses the LVM backend. To use a NetApp storage appliance backend, edit the /etc/rpc_deploy/rpc_user_config.yml file and configure each storage node that will use it: Note Ensure that the NAS Team enables httpd.admin.access. 1. Add the netapp stanza under the cinder_backends stanza for each storage node: cinder_backends: netapp: The options in subsequent steps fit under the netapp stanza. Note The backend name is arbitrary and becomes a volume type within the Block Storage service. 2. Configure the storage family: netapp_storage_family: STORAGE_FAMILY Replace STORAGE_FAMILY with ontap_7mode for Data ONTAP operating in 7-mode or ontap_cluster for Data ONTAP operating as a cluster. 3. Configure the storage protocol: netapp_storage_protocol: STORAGE_PROTOCOL Replace STORAGE_PROTOCOL with iscsi for iscsi or nfs for NFS. For the NFS protocol, you must also specify the location of the configuration file that lists the shares available to the Block Storage service: nfs_shares_config: SHARE_CONFIG Replace SHARE_CONFIG with the location of the share configuration file. For example, /etc/cinder/nfs_shares. 4. Configure the server: netapp_server_hostname: SERVER_HOSTNAME Replace SERVER_HOSTNAME with the hostnames for both netapp controllers. 5. Configure the server API port: netapp_server_port: PORT_NUMBER Replace PORT_NUMBER with 80 for HTTP or 443 for HTTPS. 35
6. Configure the server credentials: netapp_login: USER_NAME netapp_password: PASSWORD Replace USER_NAME and PASSWORD with the appropriate values. 7. Select the NetApp driver: volume_driver: cinder.volume.drivers.netapp.common.netappdriver 8. Configure the volume backend name: volume_backend_name: BACKEND_NAME Replace BACKEND_NAME with a suitable value that provides a hint for the Block Storage scheduler. For example, NETAPP_iSCSI. 9. Check that the rpc_user_config.yml configuration is accurate: storage_hosts: xxxxxx-infra01: ip: 172.29.236.16 container_vars: cinder_backends: limit_container_types: cinder_volume netapp: netapp_storage_family: ontap_7mode netapp_storage_protocol: nfs netapp_server_hostname: 111.222.333.444 netapp_server_port: 80 netapp_login: rpc_cinder netapp_password: password volume_driver: cinder.volume.drivers.netapp.common.netappdriver volume_backend_name: NETAPP_NFS For netapp_server_hostname, specifiy the IP address of the Data ONTAP server. Include iscsi or NFS for the netapp_storage_family depending on the configuration. Add 80 if using HTTP or 443 if using HTTPS for netapp_server_port. The cinder-volume.yml playbook will automatically install the nfs-common file across the hosts, transitioning from an LVM to a NetApp backend. 5.7.1. Configuring Block Storage Service for multiple NetApp backends (optional) The Block Storage service supports multiple backends. To use multiple NetApp backends, edit the /etc/rpc_deploy/rpc_user_config.yml file and configure each storage node to the designated NetApp backend: 1. For each NetApp backend, add a NetApp stanza name under the cinder_backends stanza for the storage node. For example: storage_hosts: xxxxxx-infra01: 36
ip: 172.29.236.16 container_vars: cinder_backends: limit_container_types: cinder_volume netapp1:... xxxxxx-infra02: ip: 172.29.236.17 container_vars: cinder_backends: limit_container_types: cinder_volume netapp2:... 2. For each netapp backend, set the options as described in the Section 5.7, Configuring the Block Storage service (optional) [35] procedure, steps 2-9. 3. Check that the rpc_user_config.yml configuration is accurate, for example: storage_hosts: xxxxxx-infra01: ip: 172.29.236.16 container_vars: cinder_backends: limit_container_types: cinder_volume netapp1: netapp_storage_family: ontap_7mode netapp_storage_protocol: nfs netapp_server_hostname: 111.222.333.444 netapp_server_port: 80 netapp_login: rpc_cinder netapp_password: password volume_driver: cinder.volume.drivers.netapp.common.netappdriver volume_backend_name: NETAPP_NFS xxxxxx-infra02: ip: 172.29.236.17 container_vars: cinder_backends: limit_container_types: cinder_volume netapp2: netapp_storage_family: ontap_7mode netapp_storage_protocol: nfs netapp_server_hostname: 111.222.333.555 netapp_server_port: 80 netapp_login: rpc_cinder netapp_password: password volume_driver: cinder.volume.drivers.netapp.common.netappdriver volume_backend_name: NETAPP_NFS 5.8. Configure the Block Storage Service with NFS protocols (optional). If the NetApp backend is configured to use an NFS storage protocol, edit /etc/ rpc_deploy/rpc_user_config.yml, and configure the NFS client on each storage node that will use it. 37
1. Add the nfs_client stanza under the container_vars stanza for each storage node: container_vars: nfs_client: 2. Configure the location of the file that lists shares available to the block storage service. This configuration file must include nfs_shares_config: nfs_shares_config: SHARE_CONFIG Replace SHARE_CONFIG with the location of the share configuration file. For example, /etc/cinder/nfs_shares. 3. Configure one or more NFS shares: shares: - { ip: NFS_HOST, share: NFS_SHARE } Replace NFS_HOST with the IP address or hostname of the NFS server, and the NFS_SHARE with the absolute path to an existing and accessible NFS share. 5.9. Creating Block Storage availability zones (optional) Multiple availability zones can be created to manage Block Storage storage hosts. Edit the /etc/rpc_deploy/rpc_user-config.yml file to set up availability zones. 1. For each cinder storage host, configure the availability zone under the container_vars stanza: cinder_storage_availability_zone: CINDERAZ Replace CINDERAZ with a suitable name. For example cinderaz_2 2. If more than one availability zone is created, configure the default availability zone for scheduling volume creation: cinder_default_availability_zone: CINDERAZ_DEFAULT Replace CINDERAZ_DEFAULT with a suitable name. For example, cinderaz_1. The default availability zone should be the same for all Cinder storage hosts. Note If the cinder_default_availability_zone is not defined, the default variable will be used. 38
6. Foundation playbooks Figure 6.1. Installation workflow The main Ansible foundation playbook prepares the target hosts for infrastructure and OpenStack services and performs the following operations: Perform deployment host initial setup Build containers on target hosts Restart containers on target hosts Install common components into containers on target hosts 6.1. Running the foundation playbook 1. Change to the /opt/openstack-ansible/rpc_deployment directory. 2. Run the host setup playbook, which runs a series of sub-playbooks: $ ansible-playbook -e @/etc/rpc_deploy/user_variables.yml \ playbooks/setup/host-setup.yml Confirm satisfactory completion with zero items unreachable or failed: PLAY RECAP ********************************************************************... deployment_host : ok=18 changed=11 unreachable=0 failed=0 6.2. Troubleshooting Q: How do I resolve the following error after running a playbook? failed: [target_host] => (item=target_host_horizon_container-69099e06) => {"err": "lxc-attach: No such file or directory - failed to open 39
'/proc/12440/ns/mnt'\nlxc-attach: failed to enter the namespace\n", "failed": true, "item": "target_host_horizon_container-69099e06", "rc": 1} msg: Failed executing lxc-attach. A: The lxc-attach sometimes fails to execute properly. This issue can be resolved by running the playbook again. 40
7. Infrastructure playbooks Figure 7.1. Installation workflow The main Ansible infrastructure playbook installs infrastructure services and performs the following operations: Install Memcached Install Galera Install RabbitMQ Install Rsyslog Install Elasticsearch Install Logstash Install Kibana Install Elasticsearch command-line utilities Configure Rsyslog 7.1. Running the infrastructure playbook 1. Change to the /opt/openstack-ansible/rpc_deployment directory. 2. Run the infrastructure setup playbook, which runs a series of sub-playbooks: $ ansible-playbook -e @/etc/rpc_deploy/user_variables.yml \ playbooks/infrastructure/infrastructure-setup.yml Confirm satisfactory completion with zero items unreachable or failed: PLAY RECAP ********************************************************************... 41
deployment_host : ok=27 changed=0 unreachable=0 failed=0 7.2. Verifying infrastructure operation Verify the database cluster and Kibana web interface operation. Procedure 7.1. Verifying the database cluster 1. Determine the Galera container name: $ lxc-ls grep galera infra1_galera_container-4ed0d84a 2. Access the Galera container: $ lxc-attach -n infra1_galera_container-4ed0d84a 3. Run the MariaDB client, show cluster status, and exit the client: $ mysql -u root -p MariaDB> show status like 'wsrep_cluster%'; +--------------------------+--------------------------------------+ Variable_name Value +--------------------------+--------------------------------------+ wsrep_cluster_conf_id 3 wsrep_cluster_size 3 wsrep_cluster_state_uuid bbe3f0f6-3a88-11e4-bd8f-f7c9e138dd07 wsrep_cluster_status Primary +--------------------------+--------------------------------------+ MariaDB> exit The wsrep_cluster_size field should indicate the number of nodes in the cluster and the wsrep_cluster_status field should indicate primary. Procedure 7.2. Verifying the Kibana web interface 1. With a web browser, access the Kibana web interface using the external load balancer IP address defined by the external_lb_vip_address option in the /etc/ rpc_deploy/rpc_user_config.yml file. The Kibana web interface uses HTTPS on port 8443. 2. Authenticate using the username kibana and password defined by the kibana_password option in the /etc/rpc_deploy/user_variables.yml file. 42
8. OpenStack playbooks Figure 8.1. Installation workflow The main Ansible OpenStack playbook installs OpenStack services and performs the following operations: Install common components Create utility container that provides utilities to interact with services in other containers Install Identity (keystone) Generate service IDs for all services Install the Image Service (glance) Install Orchestration (heat) Install Compute (nova) Install Networking (neutron) Install Block Storage (cinder) Install Dashboard (horizon) Reconfigure Rsyslog 8.1. Utility Container Overview The utility container provides a space where miscellaneous tools and other software can be installed. Tools and objects can be placed in a utility container if they do not require a dedicated container or if it is impractical to create a new container for a single tool or object. Utility containers can also be used when tools cannot be installed directly onto a host. For example, the tempest playbooks are installed on the utility container since tempest testing does not need a container of its own. For another example of using the utility container, see Section 8.3, Verifying OpenStack operation [45]. 43
8.2. Running the OpenStack playbook 1. Change to the /opt/openstack-ansible/rpc_deployment directory. 2. Run the OpenStack setup playbook, which runs a series of sub-playbooks: $ ansible-playbook -e @/etc/rpc_deploy/user_variables.yml \ playbooks/openstack/openstack-setup.yml Note The openstack-common.yml sub-playbook builds all OpenStack services from source and takes up to 30 minutes to complete. As the playbook progresses, the quantity of containers in the "polling" state will approach zero. If any operations take longer than 30 minutes to complete, the playbook will terminate with an error. changed: [target_host_glance_container-f2ebdc06] changed: [target_host_heat_engine_container-36022446] changed: [target_host_neutron_agents_container-08ec00cd] changed: [target_host_heat_apis_container-4e170279] changed: [target_host_keystone_container-c6501516] changed: [target_host_neutron_server_container-94d370e5] changed: [target_host_nova_api_metadata_container-600fe8b3] changed: [target_host_nova_compute_container-7af962fe] changed: [target_host_cinder_api_container-df5d5929] changed: [target_host_cinder_volumes_container-ed58e14c] changed: [target_host_horizon_container-e68b4f66] <job 802849856578.7262> finished on target_host_heat_engine_container-36022446 <job 802849856578.7739> finished on target_host_keystone_container-c6501516 <job 802849856578.7262> finished on target_host_heat_apis_container-4e170279 <job 802849856578.7359> finished on target_host_cinder_api_container-df5d5929 <job 802849856578.7386> finished on target_host_cinder_volumes_container-ed58e14c <job 802849856578.7886> finished on target_host_horizon_container-e68b4f66 <job 802849856578.7582> finished on target_host_nova_compute_container-7af962fe <job 802849856578.7604> finished on target_host_neutron_agents_container-08ec00cd <job 802849856578.7459> finished on target_host_neutron_server_container-94d370e5 <job 802849856578.7327> finished on target_host_nova_api_metadata_container-600fe8b3 <job 802849856578.7363> finished on target_host_glance_container-f2ebdc06 <job 802849856578.7339> polling, 1675s remaining <job 802849856578.7338> polling, 1675s remaining <job 802849856578.7322> polling, 1675s remaining <job 802849856578.7319> polling, 1675s remaining 44
Note Setting up the compute hosts takes up to 30 minutes to complete, particularly in environments with many compute hosts. As the playbook progresses, the quantity of containers in the "polling" state will approach zero. If any operations take longer than 30 minutes to complete, the playbook will terminate with an error. ok: [target_host_nova_conductor_container-2b495dc4] ok: [target_host_nova_api_metadata_container-600fe8b3] ok: [target_host_nova_api_ec2_container-6c928c30] ok: [target_host_nova_scheduler_container-c3febca2] ok: [target_host_nova_api_os_compute_container-9fa0472b] <job 409029926086.9909> finished on target_host_nova_api_os_compute_container-9fa0472b <job 409029926086.9890> finished on target_host_nova_api_ec2_container-6c928c30 <job 409029926086.9910> finished on target_host_nova_conductor_container-2b495dc4 <job 409029926086.9882> finished on target_host_nova_scheduler_container-c3febca2 <job 409029926086.9898> finished on target_host_nova_api_metadata_container-600fe8b3 <job 409029926086.8330> polling, 1775s remaining Confirm satisfactory completion with zero items unreachable or failed: PLAY RECAP **********************************************************************... deployment_host : ok=44 changed=11 unreachable=0 failed=0 8.3. Verifying OpenStack operation Verify basic operation of the OpenStack API and dashboard. Procedure 8.1. Verifying the API The utility container provides a CLI environment for additional configuration and testing. 1. Determine the utility container name: $ lxc-ls grep utility infra1_utility_container-161a4084 2. Access the utility container: $ lxc-attach -n infra1_utility_container-161a4084 3. Source the admin tenant credentials: $ source openrc 4. Run an OpenStack command that uses one or more APIs. For example: $ keystone user-list 45
+----------------------------------+----------+---------+-------+ id name enabled email +----------------------------------+----------+---------+-------+ 090c1023d0184a6e8a70e26a5722710d admin True 239e04cd3f7d49929c7ead506d118e40 cinder True e1543f70e56041679c013612bccfd4ee cinderv2 True bdd2df09640e47888f819057c8e80f04 demo True 453dc7932df64cc58e36bf0ac4f64d14 ec2 True 257da50c5cfb4b7c9ca8334bc096f344 glance True 6e0bc047206f4f5585f7b700a8ed6e94 heat True 187ee2e32eec4293a3fa243fa21f6dd9 keystone True dddaca4b39194dc4bcefd0bae542c60a neutron True f1c232f9d53c4adabb54101ccefaefce nova True fdfbda23668c4980990708c697384050 novav3 True 744069c771d84f1891314388c1f23686 s3 True 4e7fdfda8d14477f902eefc8731a7fdb swift True +----------------------------------+----------+---------+-------+ Procedure 8.2. Verifying the dashboard 1. With a web browser, access the dashboard using the external load balancer IP address defined by the external_lb_vip_address option in the /etc/rpc_deploy/ rpc_user_config.yml file. The dashboard uses HTTPS on port 443. 2. Authenticate using the username admin and password defined by the keystone_auth_admin_password option in the /etc/rpc_deploy/ user_variables.yml file. 46
9. Rackspace Private Cloud monitoring Rackspace Cloud Monitoring Service allows Rackspace Private Cloud (RPC) customers to monitor system performance, and safeguard critical data. 9.1. Service and response When a threshold is reached or functionality fails, the Rackspace Cloud Monitoring Service generates an alert, which creates a ticket in the Rackspace ticketing system. This ticket moves into the RPC support queue. Tickets flagged as monitoring alerts are given highest priority, and response is delivered according to the Service Level Agreement (SLA). Refer to the SLA for detailed information about incident severity levels and corresponding response times. Specific monitoring alert guidelines can be set for the installation. These details should be arranged by a Rackspace account manager. 9.2. Hardware monitoring Hardware monitoring is available only for customers whose clouds are hosted within a Rackspace data center. Customers whose clouds are hosted in their own data centers are responsible for monitoring their own hardware. For clouds hosted within a Rackspace data center, Rackspace will provision monitoring support for the customer. Rackspace Support assists in handling functionality failure, running system health checks, and managing system capacity. Rackspace Cloud Monitoring Service will notify Support when a host is down, or when hardware fails. 9.3. Software monitoring For software monitoring, polling time is determined by the maas_check_period setting in /etc/rpc_deploy/user_variables.yml, which defaults to 60 seconds. Rackspace Private Cloud Monitoring Service has two kinds of checks: Local: These agent.plugin checks are performed against containers. The checks poll the API and gather lists of metrics. These checks will generate a critical alert after three consecutive failures. Local checks are performed on the following services: Compute (nova) Block Storage (cinder) Identity (keystone) Networking (neutron) Orchestration (heat) 47
Image Service (glance): The check connects to the glance registry and tests status by calling an arbitrary URL. Dashboard (horizon): The check verifies that the login page is available and uses the credentials from openrc-maas to log in. Galera: The check connects to each member of a Galera cluster and verifies that the members are fully synchronized and active. RabbitMQ: The check connects to each member of a RabbitMQ cluster and gathers statistics from the API. Memcached: The check connects to a Memcached server. Global: These remote.http checks poll the load-balanced public endpoints, such as a public nova API. If a service is marked as administratively down, the check will skip it. These checks will generate a critical alert after one failure. Global checks are performed on the following services: Compute (nova) Block Storage (cinder) Identity (keystone) Networking (neutron) Image Service (glance) Orchestration (heat) 9.4. CDM monitoring The maas_cdm playbook configures CDM monitoring for the following services and generates alerts at the specified thresholds. CPU Idle: < 10% Memory used: > 95% Disk space used: > 95% 9.5. Running monitoring playbooks The monitoring playbooks install hardware and software monitoring tools, and consist of the following operations: Install the monitoring CLI package and Rackspace monitoring agent Install MaaS checks and alarms for services inside containers 48
Install MaaS checks and alarms for global service monitoring Install checks and alarms for Dell hardware running omreport in Rackspace data centers only Procedure 9.1. Running the playbooks 1. On the deployment host, change to the /opt/openstack-ansible/rpc_deployment directory. 2. Run the monitoring setup playbook. This installs the CLI package and monitoring agent on all physical hosts and creates a dedicated keystone user for monitoring. $ ansible-playbook -e @/etc/rpc_deploy/user_variables.yml \ playbooks/monitoring/raxmon-setup.yml 3. Install the monitoring checks and alarms for containerized services. $ ansible-playbook -e @/etc/rpc_deploy/user_variables.yml \ playbooks/monitoring/maas_local.yml 4. Install the monitoring checks and alarms for global service monitoring. $ ansible-playbook -e @/etc/rpc_deploy/user_variables.yml \ playbooks/monitoring/maas_remote.yml 5. Install the monitoring checks and alarms for CDM monitoring. $ ansible-playbook -e @/etc/rpc_deploy/user_variables.yml \ playbooks/monitoring/maas_cdm.yml 49
10. Operations The following operations apply to environments after initial installation. 10.1. Adding a compute host Use the following procedure to add a compute host to an operational cluster. 1. Configure the host as a target host. See Chapter 4, Target hosts [17] for more information. 2. Edit the /etc/rpc_deploy/rpc_user_config.yml file and add the host to the compute_hosts stanza. Note 3. Run the add host playbook: If necessary, also modify the used_ips stanza. $ ansible-playbook -e @/etc/rpc_deploy/user_variables.yml \ playbooks/add_host.yml 4. Run the Compute playbook: $ ansible-playbook -e @/etc/rpc_deploy/user_variables.yml \ playbooks/openstack/nova-all.yml 5. Run the Networking playbook: $ ansible-playbook -e @/etc/rpc_deploy/user_variables.yml \ playbooks/openstack/neutron-all.yml 6. Run the Rsyslog configuration playbook: $ ansible-playbook -e @/etc/rpc_deploy/user_variables.yml \ playbooks/infrastructure/rsyslog-config.yml 7. Run the MaaS playbooks: $ ansible-playbook -e @/etc/rpc_deploy/user_variables.yml \ playbooks/monitoring/raxmon-all.yml $ ansible-playbook -e @/etc/rpc_deploy/user_variables.yml \ playbooks/monitoring/maas_local.yml $ ansible-playbook -e @/etc/rpc_deploy/user_variables.yml \ playbooks/monitoring/maas_remote.yml 50
$ ansible-playbook -e @/etc/rpc_deploy/user_variables.yml \ playbooks/monitoring/maas_cdm.yml # If the cloud is using OpenStack Object Storage: $ ansible-playbook -e @/etc/rpc_deploy/user_variables.yml \ playbooks/monitoring/swift_maas.yml 10.2. Galera cluster maintenance Routine maintenance includes gracefully adding or removing nodes from the cluster without impacting operation and also starting a cluster after gracefully shutting down all nodes. 10.2.1. Removing nodes In the following example, all but one node was shut down gracefully: $ ansible galera_container -m shell -a "mysql -h localhost\ -e 'show status like \"%wsrep_cluster_%\";'" node3_galera_container-3ea2cbd3 FAILED rc=1 >> ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) node2_galera_container-49a47d25 FAILED rc=1 >> ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) node4_galera_container-76275635 success rc=0 >> Variable_name Value wsrep_cluster_conf_id 7 wsrep_cluster_size 1 wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1 wsrep_cluster_status Primary Compare this example output with the output from the multi-node failure scenario where the remaining operational node is non-primary and stops processing SQL requests. Gracefully shutting down the MariaDB service on all but one node allows the remaining operational node to continue processing SQL requests. When gracefully shutting down multiple nodes, perform the actions sequentially to retain operation. 10.2.2. Starting a cluster Gracefully shutting down all nodes destroys the cluster. Starting or restarting a cluster from zero nodes requires creating a new cluster on one of the nodes. 1. The new cluster should be started on the most advanced node. Run the following command to check the seqno value in the grastate.dat file on all of the nodes: $ ansible galera_container -m shell -a "cat /var/lib/mysql/grastate.dat" node2_galera_container-49a47d25 success rc=0 >> # GALERA saved state version: 2.1 uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1 seqno: 31 cert_index: 51
node3_galera_container-3ea2cbd3 success rc=0 >> # GALERA saved state version: 2.1 uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1 seqno: 31 cert_index: node4_galera_container-76275635 success rc=0 >> # GALERA saved state version: 2.1 uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1 seqno: 31 cert_index: In this example, all nodes in the cluster contain the same positive seqno values because they were synchronized just prior to graceful shutdown. If all seqno values are equal, any node can start the new cluster. $ /etc/init.d/mysql start --wsrep-new-cluster This command results in a cluster containing a single node. The wsrep_cluster_size value shows the number of nodes in the cluster. node2_galera_container-49a47d25 FAILED rc=1 >> ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111) node3_galera_container-3ea2cbd3 FAILED rc=1 >> ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) node4_galera_container-76275635 success rc=0 >> Variable_name Value wsrep_cluster_conf_id 1 wsrep_cluster_size 1 wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1 wsrep_cluster_status Primary 2. Restart MariaDB on the other nodes and verify that they rejoin the cluster. node2_galera_container-49a47d25 success rc=0 >> Variable_name Value wsrep_cluster_conf_id 3 wsrep_cluster_size 3 wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1 wsrep_cluster_status Primary node3_galera_container-3ea2cbd3 success rc=0 >> Variable_name Value wsrep_cluster_conf_id 3 wsrep_cluster_size 3 wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1 wsrep_cluster_status Primary node4_galera_container-76275635 success rc=0 >> 52
Variable_name Value wsrep_cluster_conf_id 3 wsrep_cluster_size 3 wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1 wsrep_cluster_status Primary 10.3. Galera cluster recovery 10.3.1. Single-node failure If a single node fails, the other nodes maintain quorum and continue to process SQL requests. 1. Run the following Ansible command to determine the failed node: $ ansible galera_container -m shell -a "mysql -h localhost\ -e 'show status like \"%wsrep_cluster_%\";'" node3_galera_container-3ea2cbd3 FAILED rc=1 >> ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111) node2_galera_container-49a47d25 success rc=0 >> Variable_name Value wsrep_cluster_conf_id 17 wsrep_cluster_size 3 wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1 wsrep_cluster_status Primary node4_galera_container-76275635 success rc=0 >> Variable_name Value wsrep_cluster_conf_id 17 wsrep_cluster_size 3 wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1 wsrep_cluster_status Primary In this example, node 3 has failed. 2. Restart MariaDB on the failed node and verify that it rejoins the cluster. 3. If MariaDB fails to start, run the mysqld command and perform further analysis on the output. As a last resort, rebuild the container for the node. 10.3.2. Multi-node failure When all but one node fails, the remaining node cannot achieve quorum and stops processing SQL requests. In this situation, failed nodes that recover cannot join the cluster because it no longer exists. 1. Run the following Ansible command to show the failed nodes: $ ansible galera_container -m shell -a "mysql \ -h localhost -e 'show status like \"%wsrep_cluster_%\";'" 53
node2_galera_container-49a47d25 FAILED rc=1 >> ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111) node3_galera_container-3ea2cbd3 FAILED rc=1 >> ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111) node4_galera_container-76275635 success rc=0 >> Variable_name Value wsrep_cluster_conf_id 18446744073709551615 wsrep_cluster_size 1 wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1 wsrep_cluster_status non-primary In this example, nodes 2 and 3 have failed. The remaining operational server indicates non-primary because it cannot achieve quorum. 2. Run the following command to rebootstrap the operational node into the cluster. $ mysql -e "SET GLOBAL wsrep_provider_options='pc.bootstrap=yes';" node4_galera_container-76275635 success rc=0 >> Variable_name Value wsrep_cluster_conf_id 15 wsrep_cluster_size 1 wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1 wsrep_cluster_status Primary node3_galera_container-3ea2cbd3 FAILED rc=1 >> ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111) node2_galera_container-49a47d25 FAILED rc=1 >> ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111) The remaining operational node becomes the primary node and begins processing SQL requests. 3. Restart MariaDB on the failed nodes and verify that they rejoin the cluster. $ ansible galera_container -m shell -a "mysql \ -h localhost -e 'show status like \"%wsrep_cluster_%\";'" node3_galera_container-3ea2cbd3 success rc=0 >> Variable_name wsrep_cluster_conf_id 17 wsrep_cluster_size 3 Value wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1 wsrep_cluster_status Primary node2_galera_container-49a47d25 success rc=0 >> Variable_name Value wsrep_cluster_conf_id 17 wsrep_cluster_size 3 wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1 wsrep_cluster_status Primary 54
node4_galera_container-76275635 success rc=0 >> Variable_name Value wsrep_cluster_conf_id 17 wsrep_cluster_size 3 wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1 wsrep_cluster_status Primary 4. If MariaDB fails to start on any of the failed nodes, run the mysqld command and perform further analysis on the output. As a last resort, rebuild the container for the node. 10.3.3. Complete failure If all of the nodes in a Galera cluster fail (do not shutdown gracefully), then the integrity of the database can no longer be guaranteed and should be restored from backup. Run the following command to determine if all nodes in the cluster have failed: $ ansible galera_container -m shell -a "cat /var/lib/mysql/grastate.dat" node3_galera_container-3ea2cbd3 success rc=0 >> # GALERA saved state version: 2.1 uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1 seqno: -1 cert_index: node2_galera_container-49a47d25 success rc=0 >> # GALERA saved state version: 2.1 uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1 seqno: -1 cert_index: node4_galera_container-76275635 success rc=0 >> # GALERA saved state version: 2.1 uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1 seqno: -1 cert_index: All the nodes have failed if mysqld is not running on any of the nodes and all of the nodes contain a seqno value of -1. Note If any single node has a positive seqno value, then that node can be used to restart the cluster. However, because there is no guarantee that each node has an identical copy of the data, it is not recommended to restart the cluster using the --wsrep-new-cluster command on one node. 10.3.4. Restoring from backup after a complete failure In the event of a complete failure, restore to the latest backup. 55
1. Use the following commands to stop and destroy the Galera containers. $ ansible galera_container -m shell -a "lxc-stop -n {{ inventory_hostname }}" $ ansible galera_container -m shell -a "lxc-destroy -n {{ inventory_hostname }}" 2. Run the host setup playbook to rebuild the Galera containers. $ ansible-playbook -e @/root/rpc_deploy/user_variables.yml / playbooks/setup/host-setup.yml -l node1,node2,node3 3. Run the infrastructure playbook to setup Galera within the containers. $ ansible-playbook -e @/root/rpc_deploy/user_variables.yml / playbooks/infrastructure/infrastructure-setup.yml -l / node1_galera_container_########,node2_galera_container_########, node3_galera_container######## 4. Stop MariaDB service within all Galera containers. $ ansible galera_container -m service -a "name=mysql state=stopped" 5. Remove entire contents of MariaDB directories within all Galera containers. $ ansible galera_container -m file -a "path=/var/lib/mysql/ state=absent ignore" 6. Determine which backup directory contains latest MariaDB backup. $ ansible galera_container -m shell -a / "ll /openstack/backup/{{ inventory_hostname }}/holland_backups/ rpc_support/newest" 7. Copy the latest identified backup from previous step to each Galera container. $ ansible galera_container -m copy -a / "content=/openstack/backup/{{ inventory_hostname }}/holland_backups/ rpc_support/########_######/backup.tar.gz dest=/etc/mysql/backup.tar.gz" 8. Make a target backup directory called xtrabackup_backupfiles. $ ansible galera_container -m file -a / "path=/etc/mysql/xtrabackup_backupfiles state=directory" 9. Unarchive contents of backup.tar.gz into target directory. $ ansible galera_container -m shell -a / "zcat /etc/mysql/backup.tar.gz tar -xif - -C /etc/mysql/ xtrabackup_backupfiles" 10. Prepare the backup files with xtrabackup utility. 56
$ ansible galera_container -m shell -a / "xtrabackup --prepare /etc/mysql/xtrabackup_backupfiles" 11. Copy the backup files with the innobackupex utility. $ ansible galera_container -m shell -a / "innobackupex --copy-back /etc/mysql/xtrabackup_backupfiles/" 12. Start MariaDB service on node1. $ ansible galera_container -m service -a / "name=mysql state=started args=--wsrep_cluster_address=gcomm://" -l node1_galera_container-######## 13. Join MariaDB service on node2 Galera container to Galera cluster. $ ansible galera_container -m service -a / "name=mysql state=started" -l node2_galera_container-######## 14. Join MariaDB service on node3 Galera container to Galera cluster. $ ansible galera_container -m service -a / "name=mysql state=started" -l node3_galera_container-######## 10.3.5. Rebuilding a container Recovering from a failure may require rebuilding one or more containers. Follow these instructions to rebuild a container. 1. Disable the failed node on the load balancer. Note Do not rely on the load balancer health checks to disable the node. If the node is not disabled, the load balancer sends SQL requests to the node before the node rejoins the cluster. This can cause data inconsistencies. 2. The following commands destroy the container and remove MariaDB data stored outside of the container. In this example, node three failed: $ lxc-stop -n node3_galera_container-3ea2cbd3 $ lxc-destroy -n node3_galera_container-3ea2cbd3 $ rm -rf /openstack/node3_galera_container-3ea2cbd3/* 3. Run the host setup playbook to rebuild the container specifically on node three: $ ansible-playbook -e @/root/rpc_deploy/user_variables.yml \ playbooks/setup/host-setup.yml -l node3 -l node3_galera_container-3ea2cbd3 57
Note The playbook also restarts all other containers on the node. 4. Run the infrastructure playbook to configure the container specifically on node three: $ ansible-playbook -e @/root/rpc_deploy/user_variables.yml \ playbooks/infrastructure/infrastructure-setup.yml \ -l node3_galera_container-3ea2cbd3 Note The new container runs a single-node Galera cluster. This is a dangerous state because the environment contains more than one active database with potentially divergent data. $ ansible galera_container -m shell -a "mysql \ -h localhost -e 'show status like \"%wsrep_cluster_%\";'" node3_galera_container-3ea2cbd3 success rc=0 >> Variable_name wsrep_cluster_conf_id 1 wsrep_cluster_size 1 Value wsrep_cluster_state_uuid da078d01-29e5-11e4-a051-03d896dbdb2d wsrep_cluster_status Primary node2_galera_container-49a47d25 success rc=0 >> Variable_name Value wsrep_cluster_conf_id 4 wsrep_cluster_size 2 wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1 wsrep_cluster_status Primary node4_galera_container-76275635 success rc=0 >> Variable_name Value wsrep_cluster_conf_id 4 wsrep_cluster_size 2 wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1 wsrep_cluster_status Primary 5. Restart MariaDB in the new container and verify that it rejoins the cluster: $ ansible galera_container -m shell -a "mysql \ -h localhost -e 'show status like \"%wsrep_cluster_%\";'" node2_galera_container-49a47d25 success rc=0 >> Variable_name wsrep_cluster_conf_id 5 wsrep_cluster_size 3 Value wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1 wsrep_cluster_status Primary node3_galera_container-3ea2cbd3 success rc=0 >> Variable_name Value wsrep_cluster_conf_id 5 58
wsrep_cluster_size 3 wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1 wsrep_cluster_status Primary node4_galera_container-76275635 success rc=0 >> Variable_name Value wsrep_cluster_conf_id 5 wsrep_cluster_size 3 wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1 wsrep_cluster_status Primary 6. Enable the failed node on the load balancer. 59
11. Additional resources These additional resources are designed help you learn more about the Rackspace Private Cloud Software and OpenStack. If you are an advanced user and are comfortable with APIs, the OpenStack API documentation is available in the OpenStack API Documentation library. OpenStack API Quick Start Programming OpenStack Compute API OpenStack Compute Developer Rackspace Private Cloud Knowledge Center OpenStack Manuals OpenStack API Reference OpenStack - nova Developer Documentation OpenStack - glance Developer Documentation OpenStack - keystone Developer Documentation OpenStack - horizon Developer Documentation OpenStack - cinder Developer Documentation 11.1. Document change history This version replaces and obsoletes all previous versions. The following table shows the document revision history: Revision date July 7, 2015 June 23, 2015 May 8, 2015 March 25, 2015 January 30, 2015 January 7, 2015 December 5, 2015 November 26, 2014 November 7, 2014 October 31, 2014 September 25, 2014 August 28, 2014 Document version Rackspace Private Cloud v9.0.11 Software release Rackspace Private Cloud v9.0.10 Software release Rackspace Private Cloud v9.0.9 Software release Rackspace Private Cloud v9.0.7 Software release Rackspace Private Cloud v9.0.6 Software release Rackspace Private Cloud v9.0.5 Software release Rackspace Private Cloud v9.0.4 Software release Rackspace Private Cloud v9.0.3 Software release Rackspace Private Cloud v9.0.2 Software release Rackspace Private Cloud v9.0.1 Software release Rackspace Private Cloud v9 Software General Availability release Rackspace Private Cloud v9 Software Limited Availability release 60