VSM Introduction & Major Features Wang, Yaguang Ferber, Dan March 2015 1
Agenda Overview Architecture & Key Components Major Features 2
Overview 3
VSM Overview VSM (Virtual Storage Manager) is an open source ceph management tool developed by Intel, and announced on 2014 Nov s openstack Paris summit. It is designed to help make the day to day management of Ceph easier for storage administrators. Home page: https://01.org/virtual-storage-manager Code Repository: https://github.com/01org/virtual-storage-manager Issue Tracking: https://01.org/jira/browse/vsm Mailing list: http://vsm-discuss.33411.n7.nabble.com/ 4
VSM Overview VSM is designed for bundled ceph storage appliance, it creates ceph cluster for management and monitoring. So far it runs on Centos 6.5 with Ceph firefly and openstack havana icehouse. Allows Easier Ceph Creation and Management Creates Ceph cluster Manages Ceph Monitors Ceph Web-based UI Administrator-friendly interface for management and monitoring VSM API Interfaces For automation 5
Typical VSM-Managed Cluster VSM Controller Dedicated server or server instance Server Nodes Are members of VSM-managed Ceph cluster OpenStack-Administered Network VSM agent runs on every server in VSM-managed cluster Network Configuration VSM Controller OpenStack Admin Client Node RADOS Client Node RADOS Client Node RADOS Ceph public subnet Carries data traffic between clients and Ceph cluster servers Ceph public - 10GbE or InfiniBand Administration subnet Carries administrative communications between VSM controller and agents Also administrative comms between Ceph daemons Server Node VSM Agent Server Node VSM Agent Monitor Administration GbE Server Node Server Node VSM Agent VSM Agent Monitor Monitor Ceph cluster subnet Carries data traffic between Ceph storage nodes replication and rebalancing OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OpenStack admin (optional) SSD SSD SSD SSD Optionally connected to VSM via SSH connection Allows VSM to tell OpenStack about Ceph storage pools Ceph cluster - 10GbE or InfiniBand 6
Architecture & Key components 7
VSM Architecture Web UI python-vsmclient API scheduler RabbitMQ conductor storage agent agent agent storage storage 8
Key Components: Overview Python-vsmclient This is a client for the vsm API, it consists of a Python API (the vsmclient module), a command-line script (vsm). Each implements 100% of the vsm API. Vsm A major module for ceph management Vsm-dashboard web based management interface for VSM. Vsm-deploy The ceph deployment tool kit provided by VSM. 9
Key Components: relationships Vsm component Vsm-deploy REST 3rdparty Vsm-dashboard CLI REST API vsmclient REST Openstack Keystone VSM SSH Nova Ceph SSH Cinder 10
U Key Components: interaction flow VSM Architecture External Party VSM UI Portal (on VSM Controller Node) VSM Controller Node VSM Agents (On Ceph Storage/Monitor Nodes) Ceph (On Ceph Storage/Monitor Nodes) OpenStack Nova Controller a. HTTPs authenticate with username/password, get token 2.Authenticate via REST Keystone (Authentication Center. Listen on ports 5000 &35357) 11.(0)HTTPS authenticate & Register via REST SSH Channel OpenStack b. Https request w/ token MySQL DB (Listen on ports 3306) External Client 9. Socket Connection (Via SQLAlchemy+MariaDB Client) VSM Agent 1 Ceph Command Call 1. HTTPS with Username/Password Apache/Django (VSM Dashboard) 5.(c.)Https Authenticate via REST VSM Conductor (Manage DB Access) VSM Agent 2 Ceph 3.Library Call 8.RPC Message Ceph Command Call Operator Web Browser VSM Web Client Proxy 4.HTTPS REST VSM API (Expose API. Listen on port 8778) 6.RPC Message Rabbit MQ (Listen on port 5672) 7. RPC Message VSM Scheduler (Schedule and dispatch tasks to Agent, Conductor) 10.RPC Message VSM Software Ceph 3 rd Party Software 3rdParty Software Customized Authentication Messages 11
Major Features
Major Features Cluster Management - Managing monitors or servers - Replacing failed disks/servers - Managing storage pools (replicated, erasure-coded or cache tiering) Configuration Management - Creates and controls Ceph cluster configuration through pre-defined manifests - Operator-selectable configuration options are defined and validated by OEM or System Integrator Cooperation - REST API & CLI - Pools for OpenStack Openness VSM - Apache V2 license. - Written in python - Bundle (CentOS 6.5, Ceph Firefly 80.x, Openstack Havana/Icehouse) Cluster Monitoring - Cluster Health and OSD/PG and Monitors status on dashboard - Capacity utilization for Storage Groups and Pools Capacity Management - Grouping storage devices by function or performance - Monitoring capacity for storage group and pool 13
What it does Web-based UI Internet Administrator-friendly interface for management and monitoring Configuration management Storage Groups aggregate similar drives Zones aggregate drives within failure domains Data Center Data Center Firewall Capacity Management Pools segregate allocation by application OpenStack Nova Controller Server(s) Client Server Capacity utilization by storage group and pool Cluster Management Manage capacity growth Manage Server & disk maintenance Cluster Monitoring Operator API HTTP SSH VSM Controller (Controller Node Server) Socket Cluster Server (OSDs/Monitor/MDS)) VSM Agent (on each Ceph node) Capacity and performance OSD, PG, and monitor state VSM APIs Management framework = Consistent configuration Operator-friendly interface for management & monitoring Software interface supports automation 14
What it is VSM Controller Software Runs on dedicated server (or server instance) Connects to Ceph cluster through VSM agent Connects to OpenStack Nova controller (optional) via SSH Never touches clients or client data VSM Agent Software Runs on every server in the Ceph cluster Relays server configuration & status information to VSM controller 15
Managing Servers and Disks Servers can host more than one type of drive, drives with similar performance characteristics are identified by Storage Class. Drives with the same Storage Class are grouped together in Storage Groups Storage Groups are paired with specific Storage Classes. VSM monitors Storage Group capacity utilization, warns at threshold. Storage Classes and Storage Groups are defined in the cluster manifest file 7200_RPM_HDD 10K_RPM_HDD 15K_RPM_HDD Capacity = 7200_RPM_HDD Performance = 10K_RPM_HDD High Performance = 15K_RPM_HDD Storage class Capacity Performance High Performance Drives are identified by Storage Class in the server manifest file Storage group 16
Managing Failure Domains Servers can be grouped into failure domains. In VSM, failure domains are indented by zones. Zone 1 Zone 2 Zone 3 Zones are placed under each Storage Group Drives in each zone are placed in their respective storage group In the example at right, six servers are placed in three different zones. VSM creates three zones under each storage group, and places the drives in their respective storage groups and zones. Zones are defined in the cluster manifest file Capacity Zone 1 Zone 2 Zone 2 Performance Zone 1 Zone 2 Zone 2 Zone membership is defined in the server manifest file 7200_RPM_HDD (Capacity) 10K_RPM_HDD (Perfromance) 17 One Zone with server-level replication 17
VSM Controller: Cluster Manifest VSM Controller runs on dedicated server or server instance VSM controller uses the cluster manifest file to define the cluster meta information, like storage classes, storage groups, zones, subnet, and also cluster settings, like capacity threshold for warning, ec and cache tier settings. Meta information The cluster manifest defines the storage classes, storage groups and zones to be used in the ceph cluster, also it defines the subnets for management, public or cluster. Cluster Setting The cluster manifest provides a few settings to help tune cluster behaviors, like capacity thresholds, heartbeat intervals. Also the default settings for ec pool and cache tiering are also covered. 18
VSM Controller: Cluster Manifest File [storage_class] 7200_rpm_sata 10krpm_sas ssd ssd_cached_7200rpm_sata ssd_cached_10krpm_sas [storage_group] #format: [storage group name] ["user friendly storage group name"] [storage class] high_performance "High_Performance_SSD" ssd Storage groups capacity "Economy_Disk" 7200_rpm_sata performance "High_Performance_Disk" 10krpm_sas value_performance "High_Performance_Disk_with_ssd_cached_Acceleration" ssd_cached_10krpm_sas value_capacity "Capacity_Disk_with_ssd_cached_Acceleration" ssd_cached_7200rpm_sata [cluster] cluster_a [file_system] xfs [management_addr] 192.168.123.0/24 [ceph_public_addr] 192.168.124.0/24 [ceph_cluster_addr] 192.168.125.0/24 [zone] Zone0 zone1 Cluster name Storage classes defined Network configuration zone Data disk file system Cluster Manifest File Resides on the VSM controller server. Tells VSM how to organize storage devices, how the network is configured, and other management details [settings] storage_group_near_full_threshold 65 storage_group_full_threshold 85 ceph_near_full_threshold 75 ceph_full_threshold 90 pg_count_factor 100 heartbeat_interval 6 osd_heartbeat_interval 7 osd_heartbeat_grace 21 [ec_profiles] #format: [profile-name] [path-to-plugin] [plugin-name] [pg_num value] [json format key/value] #the key/value strings should not have spacesdefault_profile /usr/lib64/ceph/erasure-code Ec pool profiles jerasure 3 {"k":2,"m":1,"technique":"reed_sol_van"}#profile_name2 /usr/lib64/ceph/erasure-code jerasure 6 {"k":4,"m":2,"technique":"reed_sol_van"} [cache_tier_defaults] ct_hit_set_count 1 ct_hit_set_period_s 3600 ct_target_max_mem_mb 1000000 ct_target_dirty_ratio 0.4 ct_target_full_ratio 0.8 ct_target_max_objects 1000000 ct_target_min_flush_age_m 10 ct_target_min_evict_age_m 20 settings Cache tier default parameters 19
VSM Agent: Server Manifest VSM Agent runs on every server managed by VSM VSM Agent uses the server manifest file to identify and authenticate with the VSM controller, and describe server configuration Discovery and authentication To be added to a cluster, the server manifest file must contain the IP address of the VSM controller, and a valid authentication key The authentication key is generated by vsm controller in advance, to be applied into server manifest before launching agent daemon. Server Configuration VSM relies on the server manifest to identify and classify data devices and associated journals. VSM does not have knowledge of how storage devices have been partitioned. Devices and partitions are specified by path to ensure that paths remain constant in the event of a device removal or failure. 20
VSM Agent: Server Manifest [vsm_controller_ip] #10.239.82.168 [role] storage monitor [auth_key] token-tenant [7200_rpm_sata] #format [sata_device] [journal_device] %osd-by-path-1% %journal-by-path-1% %osd-by-path-2% %journal-by-path-2% %osd-by-path-3% %journal-by-path-3% %osd-by-path-4% %journal-by-path-4% Address of VSM Controller Include storage if server will host OSD daemons Include monitor if server will host monitor daemons Authentication key provided by authentication key tool on VSM controller node. Server Manifest File Resides on each server that VSM manages. Defines how storage is configured on each server Identifies other roles (Ceph daemons) that should be run on the server Authenticates servers to VSM controller Storage Class 7200_rpm_sata: Specifies path to four 7200 RPM drives and their associated journal drives/partitions [10krpm_sas] #format [sas_device] [journal_device] %osd-by-path-5% %journal-by-path-5% %osd-by-path-6% %journal-by-path-6% %osd-by-path-7% %journal-by-path-7% %osd-by-path-7% %journal-by-path-7% Storage Class 10krpm_sas: Specifies path to four 10K RPM drives and their associated journal drives/partitions [ssd] #format [ssd_device] [journal_device] [ssd_cached_7200rpm_sata] #format [intel_cache_device] [journal_device] [ssd_cached_10krpm_sas] #format [intel_cache_device] [journal_device] No drives associated with these Storage Class 21
VSM Operation Map Getting Started Log In Create Cluster Navigation Dashboard Overview Managing Capacity Storage Group Status Manage Pools Creating Storage Pools RBD Status Monitoring Cluster Health Dashboard Overview OSD Status Monitor Status PG Status MDS Status Managing Servers Managing Storage Devices Working with OpenStack Managing VSM Manage Servers Manage Devices OpenStack Access Manage VSM Users Add & Remove Servers Add & Remove Monitors Stop & Start Servers Restart OSDs Remove OSDs Restore OSDs Managing Pools Manage VSM Configuration 22
Live Demo 23
Thank You!